Monday, June 25, 2007

php 5.x is four years old. php 4 is still used. Pygame with py3k?

A cautionary tale for py3k planners from the php situation with php 4, and php 5.

php4 is still so popular and widely used, even after four years.

py3k should really study the php situation in order to avoid repeating those mistakes.

I don't think python has that much legacy code out there compared to php, but I still think there is enough that python2.4, 2.5, and 2.6 will be around with us for ages.

This page lists the reasons for php4 still hanging around. php is often installed by the web hosts, not by individuals. This is the main benefit for php versus python. Php is already installed. However as it is hard to run both php4 and php5 at the same time guess which one gets installed? Php 4 gets installed, because of all the legacy applications which require php 4. Since there are 20-1000 web pages on a single host, if only 5% require php4 then that's what will be installed. Some major webhosts have just upgraded to php4 this year!!! (Media temple) Many open source projects still support php4 since that is what most people are running. Most of them have finally been upgraded to support php 5, but still work best with php4.

Being able to run both php4 and php5 at the same time would have helped this situation a lot. This can be done with fcgi, but most hosts use mod_php. mod_php, like mod_python can only run one version of python at a time.

php 5.x has cleaned up a *lot* of things in php 4.x so I think the pain was worth it in the end, for some. For others it has created an extra four years of work, with still more work coming. I think py3k will have a bunch of things cleaned up to make it worthwhile too.

Many games written with pygame have little to no unittests. Even the best programmers who certainly are not 'dumb asses' do not write unittests for their games. It will be annoying to break pretty much all of the games on py3k, but py2.x will still be around to run those. Unlike with webhosts, most python game people can choose which python they use.

Pygame has a limited test suite. However we do have many programs to test against, with a large set of people who report bugs. The unittests for pygame are also slowly improving, so hopefully by the time of py3k, there will be plenty of unittests to help with the transition.

Will it be possible to support both 2.2, 2.3, 2.4, 2.5, 2.6 and py3k at the same time? At the moment we support >= 2.2 with python2.6 not having been even put in alpha yet, and py3k a long way away. We could probably drop support for python 2.2, and python 2.3 if needed. I don't think debian will do another stable release before py3k comes out, so python 2.4 will still have to be supported then.

Does anyone know how the C API will be for py3k? Is it going to be massively different? Every python release so far has broken the C api compatibility in some small way, so that's to be expected with py3k. I'm guessing it won't be all that much different though.

Or maybe py3k is the time to clean up the C API too? I haven't read anything about the C api, in the py3k plans, so I would like to know.

Friday, June 22, 2007

qhtml, qurl, qsql, qjs

These are some easily remembered functions for quoting used in website programming.

Using the idea of consistency allows you to remember how to quote things. Just use the function with a q in front of it.

url quoting, html quoting, sql quoting, and javascript quoting are things that web developers do almost every day in some frameworks.

Please consider using these short cut functions in your web frame work. Maybe if quoting was easier to use then people would use it more often.

It's probably the next best step compared to quoting by default, and not needing to quote at all.

They should be top level functions as well so they are easy to use.

* Note, I also use qhtm as a shortcut for qhtml. Just like how htm and html are used for webservers. qjavascript would probably be another good alias - but not one I use. I don't use a qxml, since when writing xml I use more verbose forms for constructing xml - but maybe it would make sense too?

Wednesday, June 20, 2007

Abstract Base Class - a poor name. Role is better.

For someone new to programming, or maybe from a non-maths background 'Abstract Base Class' sounds foreign, and weird. Whereas Role rolls off the tongue - giving me language elegance goose bumps.

How many people can guess what an Abstract Base Class is from the name? Even given it's context of 'relating to objects'. I think more people could guess from the name Role.

Reading about Abstract Base Classes over time always gave me troubles. Even reading the PEP now I find it hard to answer the question 'What is an Abstract Base Class'. A simple question don't you think?

Maybe the name is too abstract.

When discussing objects to people I could talk about classes, then about instances, inheritance, and interfaces. When I begin to talk about Abstract Base Classes then blank looks jump out. Maybe it's that the name is so long. The abbreviation into the acronym ABC makes even less sense - as it is using an existing term and changing its meaning. Not that the world needs another acronym being used.

Is Role a good name for what Abstract Base Classes are? Or is there a better name for them?

I think this question needs to be answered:
'An Abstract Base Class is [...]'.

This is the part of the PEP 3119 which describes what Abstract Base Classes are.

"This PEP proposes a particular strategy for organizing these tests known as Abstract Base Classes, or ABC. ABCs are simply Python classes that are added into an object's inheritance tree to signal certain features of that object to an external inspector. Tests are done using isinstance(), and the presence of a particular ABC means that the test has passed.

In addition, the ABCs define a minimal set of methods that establish the characteristic behavior of the type. Code that discriminates objects based on their ABC type can trust that those methods will always be present. Each of these methods are accompanied by an generalized abstract semantic definition that is described in the documentation for the ABC. These standard semantic definitions are not enforced, but are strongly recommended."

From a dictionary, the meaning of Role:
1. a part or character played by an actor or actress.
2. proper or customary function: the teacher's role in society.
3. Sociology. the rights, obligations, and expected behavior patterns associated with a particular social status.

I think the word Role should be used instead of Abstract Base Classes. It's shorter, and makes much more sense.

If not Role, then anything but Abstract Base Classes.

PEP 3133 is another PEP which uses the term Role - that has been rejected.

Tuesday, June 19, 2007

Pygame weekly mini sprint 2007/06/20

I spent some time looking at the FastRenderGroups code by DR0ID.

I sent a first review to the mailing list about FastRenderGroup. So we can discuss some of the things we need to do to it before getting into pygame. It has a number of features missing from the current pygame sprite code. Including being adaptable when updating the whole screen, or just parts of the screen is quicker, as well as layers and support for the new pygame blending modes (like additive blending).

ideasman_42 came along and noted a few reference counting bugs in pygame. He also plans to send in a patch with a few speed ups - by avoiding PyArg_ParseTuple when not needed, and using METH_NOARGS when appropriate.

I started on a mask.from_surface() function, but got caught up finding a bug when printing a surface. I think that was caused by the last set of changes to surface. I wrote a unittest for it, and submitted the bug to the mailing list.

Richard Goedeken submitted a patch for a smooth scaling function for pygame.
You can read about it here. It comes with an mmx function. However pygame doesn't have the infrastructure set up yet to support mmx or other asm optimizations. So we plan to use the cpu detection code that SDL uses.

Monday, June 18, 2007

Webserver DOS, with linux move file - and broken file move semantics with webservers.

When a file is moved or removed on linux any processes with that file open still see the old file. So this means if you move a new 2 gig file over the top of an old 2 gig file, and some processes still have that file open there will be about 4 gigs of space used up - until the old file is closed.

Some webservers keep a file open for as long as the client is downloading it. Apache is one web server that does this. Some other webservers do not do this - like lighttpd.

The problem with reopening a file for smaller parts of a file as it is served to a web client - is that it breaks unix move semantics. The webclient will get a combination of both files, not one file or the other. This can be a problem in many cases. Consider a client downloading a html file that changes mid file. html tags won't balance up, and the client will download a syntactically invalid file.

So here is how a DOS can happen...

Say you have a big file mirror or something with lots of files that change fairly regularly. Perhaps a debian mirror, or a shareware mirror. If a DOS client wants to fill up a drive, and possibly cause corruption, or an incomplete mirror - all they need to do is start slowly downloading many files close to the time when the files are supposed to be updated. This requires very little resources on the client side to cause massive resource use on the server side. One client could take up less than 5 MiB of memory to make 2000+ connections and cause eg 2000 * 2 gig disk use - leading to disk empty situations.

However constantly reopening a file will help stop this type of DOS attack, it might corrupt downloaded files.

The changed file move semantics are something to watch out for with some webservers. Servers like lighttpd stop this form of DOS attack, but break file move semantics in the process. So you can not rely on them in your applications.