Sunday, May 28, 2006

Pygame 1.8 release gets closer.

The last pygame release was August 16th 2005. So there are a lot of bug fixes and a few features in development that need to get out there into a release. With the new SDL being released last week it seems like a good time to get a new pygame out.

Here is the release plan from the last release. We will follow it in the same way again for this new release.

At the moment we are in the apply patches, fix bugs, get in any last minute features phase.

This weekend I made pygame.image.save be able to save as .png and .jpg files. Since pygame already optionally links against libpng and libjpg to load .jpg, and .png it shouldn't be too much hassle for people, and only a tiny bit of extra code. So if pygame is linked against libpng, and libjpg it should be able to write .jpg and .png images.

Saving as jpg/png is an often requested feature for people doing screen shots, caches, and level editors amongst other uses. Hopefully these save functions will eventually make it into SDL.

That's the last feature that I want to put in pygame for the release. Microphone, and line in support will have to wait for SDL 1.3. Since a sound api change is needed to work it in nicely. See bug #10 in the SDL bugzilla 'audio input API'.

I've been slowly going through the doc comments on the pygame.org websites documentation. This is where anyone browsing the documentation can make a comment on each function, class or module. There were around 100 or so comments from people. Ranging from 'hey there' to useful tips, typo corrections, documentation clarity ideas, and bug reports.

Since a misleading or unhelpful comment is worse than a code bug, it is very important that these comments get worked into the main pygame documentation.

However since some comments are bug reports it might take quite a bit longer than I thought to work through all of the doc comments.

Saturday, May 13, 2006

Best and easiest approach for CPython speed ups. Processor specific C modules to distutils. mmx, sse, 3dnow etc.

For pygame which uses SDL some parts are written in assembler. These parts detect which cpu they are running on and then use those mmx/sse/3dnow optimized assembly routines if those processor features are available.

However much of pygame and SDL is still in C, not ASM. So compiling using processor specific features does give a very nice speed up for the C parts.

For some parts a 33% speedup or more can be gained. Just by changing compilation flags. I think this is the best and easiest approach to getting parts of CPython to be sped up. Below are my notes, and thoughts about the topic. Mostly this is based around my experimentation with pygame compilation. This does not take into consideration python compilation itself, or any of its modules. However a similar methodology could be applied to speed up pythons modules too. Note that recompiling python itself to match your processor can give a large speedup too.


So, I have started experimenting changing distutils to compile modules multiple times with different flags for processor specific things. eg. amodule_mmx.so amodule_sse.so etc. This is just for pygame at the moment, not for python modules in general.

I am thinking of doing three or six sets of modules... to keep the disk usage down. An athlon 3dnow version, a p3 SSE version, a pentium mmx version, etc. All of the pygame C modules are around 430KB. However if I only do the ones that are generally cpu intensive it goes down to 230KB. So an extra 690KB to 1380KB uncompressed. Compressed it is about 348KB to 696KB extra.

So if I add that to the windows installer, or to the .deb packages it's not too bad. For people who compile it for their machine I'm thinking of just making it compile only the one which matches their machine. Which means I'll need to put cpu detection code into the compile phase.

For people who distribute their own py2exed version, they'll have the option of including extra cpu specific modules, or keeping their download a little smaller. Or even including more cpu specific modules if they want.

However I still need to finish the cpu detection code. I'm going to base it off SDL code. Since pygame requires SDL anyway, and I'm sure that code is widely tested. I'm thinking of having a python wrapper function which detects the cpu features, then tries to import the relevant processor specific .so. Then it keeps trying to import ones best first until it finds one. So if no processor specific ones are found, then it uses the default one.

Another issue to think about is if specific cpu instruction sets give enough of a performance boost or not. So if pentium mmx is almost as fast or faster than P4 SSE2 code, then I might as well not include the P4 SSE2 version of the module. This will require profiling to figure out.

So profiling more of pygame is needed to get better results. For example a script which loads lots of jpg and png files. Something which blits lots of stuff. etc etc. Each of these profiling tests should output timing data in a standard way. So that they can be run automatically, and then submit their data. I think I'll setup a web page to collect this data. So if people are able to help, they can select to submit data from their machine.

Also needed is better automated testing coverage of pygame. So that I can test that the recompiled executables run correctly. This is especially needed since python uses experimental compilation flags on some platforms. ie gcc -O3. -O3 has some known to be potentially buggy optimizations included. I have come across situations where compiling python extension modules is buggy because of this use. Also some of the processor specific optimizations are not as widely used or tested.

Another downside to this is that distutils changes with python versions. So any 'monkey patches' to it need to be tested with new python versions. This needs to be done for windows pygame anyway. Since it patches distutils for the Mingw MYSYS compilation environment. Eventually once this technique is perfected patches will hopefully make it back into distutils of python. Meaning less work for each python release. However since pygame needs to be recompiled, and tested for new python versions anyway, this isn't a major issue.

The funny thing is though... I'm not even sure if the python distribution on windows works with older cpus. Since they have compiled python with the newer C library. This means it won't run on some older computers without them getting the C runtime. That's with py2exe versions of the programs anyway.

This is just going to be for x86 win/*nix/*bsd machines using GCC for now. Not for Mac machines, because I don't have one to test with, and the binary situation there is already weird enough. I will also not use the intel, microsoft, or VectorC compilers to start with. Even though they are better at some optimizations than GCC. That is an exercise I'll leave until later.


Hopefully this should allow pygame users to not worry as much about optimization. Or at least people will be able to put more sprites on the screen before worrying ;) It will also make many existing games run faster or use less resources whilst running.

Thursday, May 04, 2006

Pygame, Blending blits, mac and windows clipboard/copy paste working.

An update of what has been happening recently with pygame.

Thanks to a kind person on the mailing list the macOSX version of the clipboard code was tested. It was a long process but eventually. A one week debugging cycle because I don't have a mac. HAHA. I also got around to testing out the windows version of the code on my win XP box. It worked fine there.

Some blending functions have gone into CVS. So you can blit with the modes ADD,SUB,MIN,MAX,MULT. This allows more you to do lighting type effects, and is very useful for particle systems. These are the missing things that SDL doesn't have which are often asked for. The code was based on Phil Hasseys code which he wrote for 32bit software surfaces. He's also done some other code for lines etc. So I'll want to incorporate those changes so you can draw lines that blend too.

I also need to add those blending modes to the surface.fill() method. That'll be very useful for fade to black/fade to white. Or fading text in and out.

There have also been a few patches submitted, which I have done unittests for. Pygame doesn't have too many unittests. But now I am creating one each time a change is made. To test that the bug is actually fixed, and to test if the feature added works right.

Oh, one interesting point of note. I used the TinyCC for my pygame development. It can interpret C code! How cool is that. It is also heaps faster at compiling than gcc. So for development it's really nice for when doing C code. It's even faster at interpreting code than python is. So no more slow compiles for me!!! Tinycc apt getable too.

TinyCC also has a library which can be used for code generation. I reckon weave might be very nice with this :)

However gcc is better for finished code because it generates faster code and is more complete. But yah for rapid prototyping!

Favourite student slave labour tasks for python.

These are my favourite python ideas from the google summer of code.

Ones that I think will benefit the most people, and affect the most change for the better. Ok, just ones I reckon would be cool.

  • Implement ctypes support for GCC ARM platforms. The underlying issue is lack of closure API support for ARM in libffi. A patch available at [WWW] http://handhelds.org/~pb/arm-libffi.dpatch, that should be hopefully a good starting point. ctypes CVS has a libffi_arm_wince directory, which also seems to support closure API.
  • Make a python+pygame plugin for IE, netscape. CodingProjectIdeas/PythonWebPlugin

  • Psyco for MacOSX. PPC, and universal binary versions. (also psyco for ARM would be cool!!)

  • Research how to get python support into all the cheap webhosts.

  • Security audit of python. Using as many automated processes as possible.
  • Python speed up. Reduce memory usage, speedup startup time. The two main speed regressions of the 2.0, 2.1,2.2,2.3,2.4 releases. (438,453,499,771,880) syscalls vs 106 for latest perl. 0m0.031s, 0m0.029s, 0m0.037s, 0m0.059s, 0m0.057s real time to start vs 0m0.007s for the latest perl.
  • Code coverage. (there are a few suggestions for code coverage).
  • Improve thread performance. (reducing memory usage will probably help thread performance the most)

The memory usage, and start up times would be great to improve. As I think these are the things which make people avoid using processes. Processes give us protection, and are a great way at keeping parts separate. However if processes are so expensive to use in python, then other means will be used instead.

A security audit would be wonderful. I'm not sure if a security audit has been done on python, and I'm sure it would lead to a bunch of bug fixes. So a security audit is much in line with using code coverage of tests. I guess a lint, and pychecker would be other good things to add into the python build process.