Sunday, January 17, 2010

good bye Vivienne

will miss my little sister. wish we had more time together. so sad. shit

Friday, January 15, 2010

Jquery 1.4

Another quality jquery release... jquery 1.4.

Some very handy changes, my favourite probably being able to use functions to set values. In short, some of the changes are:

  • setting values with functions
  • html5 form support
  • better JSON/JSONP and ajax functionality
  • easier element construction (just pass a dict of attributes)
  • reverse indexing like in python, where -1 is the last item.
  • better animation for multiple attributes at once
  • performance improvements, code cleanups, and bug fixing

  • Full details in the jquery 1.4 release notes.

    Wednesday, January 13, 2010

    worker pool optimization - batching apis, for some tasks.

    What are worker pools anyway?

    Worker pools are an optimization to the problem of processing many things in parallel. Rather than have a worker for every item you want to process, you spread the load between the available workers. As an example, creating 1,000,000 processes to add 1,000,000 numbers together is a bit heavy weight. You probably want to divide it up between 8 processes for your 8 core machine. This is the optimization worker pools do.

    With a usual worker pool there is a queue of jobs/data for the workers to work on. The workers are usually threads, processes or separate machines which do the work passed to them.

    So for 1000 items on that queue, there are around 1000 accesses to that queue. Usually the queue has to be thread safe or process safe, so that pieces of data are not sent to many workers at once.

    This can be an efficient method to use for some types of data. For example, if each job can take different amounts of time, like IO tasks over the internet... this is not optimal, but pretty good.

    Problem work loads for typical worker pools.

    Let's assume that the tasks are fairly easy to measure the average(or median if you like) time of each task. So either not IO tasks, or fairly equal length tasks. Then the central queue idea starts to fall down for the following reasons.

    What if the cost of starting a new job is quite high? Like if starting each job happened over a machine with a 200ms network latency (say using a HTTP call to the other side of the planet). Or if a new process needs to be spawned for each task ( say with exec or fork ).

    Or if the cost of accessing the queue is quite high? Like if you have a lock on the queue (eg a GIL) and lots of workers. Then the contention on that lock will be quite high.

    What if there are a lot more items than 1000? Like if there are 10,000,000 items? With so many items, it is worth trying to reduce or avoid that cost of accessing the queue all together.

    How to optimize the worker pool for these problems?

    The obvious solution is to divide the items up into chunks first, and then feed those big chunks of work to each worker. Luckily the obvious solution works quite well! It's trivial to divide a whole list of things into roughly equal size chunks quite quickly ( a python one liner *1).

    An example of how to improve your worker pool.

    Here is some example code to transform a pygame.threads.tmap command that uses a worker pool to do its work off a central worker queue, into one that first divides the work into roughly equal parts. Mentally replace pygame.threads.tmap with your own worker pool map function to get the same effect.

    import operator

    # Here's our one liner divider, as two lines.
    def divide_it(l, num_parts):
    return [ l[i:i+num_parts] for i in xrange(0, len(l), num_parts)]

    # Here is our_map which transforms a map into
    # one which takes bigger pieces.
    def our_map(old_map, f, work_to_do, num_workers):
    bigger_pieces = divide_it(work_to_do, len(work_to_do)//num_workers+1)
    parts = old_map(lambda parts: map(f, parts), bigger_pieces)
    return reduce(operator.add, parts)

    # now an example of how it can speed things up.
    if __name__ == "__main__":
    import pygame, pygame.threads, time

    # use 8 worker threads for our worker queue.
    num_workers = 8
    # Use the pygame threaded map function as our
    # normal worker queue.
    old_map = pygame.threads.tmap

    # make up a big list of work to do.
    work_to_do = list(range(100000))

    # a minimal function to run on all of the items of data.
    f = lambda x:x+1

    # We time our normal worker queue method.
    t3 = time.time()
    r = pygame.threads.tmap(f, work_to_do)
    t4 = time.time()

    # We use our new map function to divide the data up first.
    t1 = time.time()
    r = our_map(old_map, f, work_to_do, num_workers)
    t2 = time.time()
    del r

    print "dividing the work up time:%s:" % (t2-t1)
    print "normal threaded worker queue map time:%s:" % (t4-t3)

    $ python
    dividing the work up time:0.0565769672394:
    normal threaded worker queue map time:6.26608109474:

    For our contrived example we have 100,000 pieces of data to work through. If you created a thread for each piece of data it would surely take for ever. Which is why people often use a worker queue. However a normal worker queue can still be improved apon.

    Results for this contrived example made to make this technique look good?

    We get a 100x speedup by dividing the work up in this way. This won't work for all types of data and functions... but for certain cases as mentioned above, it is a great improvement. Not bad for something that could be written in one line of python!*1

    It's an interesting case of how massaging your data to use Batching API design techniques gives good results. It also shows how writing parallel code can be sped up with knowledge of the data you are processing.

    *1 - Well it could be done in one line if we were functional ninjas... but for sane reading it is split up into 12 lines.

    Tuesday, January 12, 2010

    mini languages that non programmers can understand

    There are hopefully a number of mini text based programming languages that non-programmers can understand. But what are they?

    One that I've used in the past is something like this:

    Which would parse into a python/javascript data structure like this:
    {name: 'Bob',
    gender: 'male/female',

    It's suprisingly common in things like search engines. Grandmas who occasionally check their email might not get it (but many do I'm sure!)... but I think a lot of others do. For things like search it is ok, if people know the magic 'terms'. If they do not know the terms, then they can just enter text to search normally. The mini language is used by advanced users.

    This is quite good for single line free form data entry. Since people only need to know the concept that you have 'key:value'. It's slightly easier than using urls, since separators can be different things.

    csv files

    Next up are comma separated files - csv.
    For example:
    These are like spread sheets. Many people seem to be able to edit these quite fine. Especially if they have a spread sheet program to do the editing.


    URLs are a mini language. With things like # anchors, query strings, and even paths being used by people all the time.


    ini files

    Common as configuration files.

    subsitution templates

    Common for web sites, and email systems.

    Hi ${name},

    ${age} ${gender}

    basic html and other mark up languages

    Quite a lot of people know bits of html. 'How do I do a new line? oh, that's right: brrrrrrrr.'

    However, I think that html is on the edge of too complicated. Modern html is especially complicated.

    Things like markdown, bbcode, and wiki languages all fall into this category. The languages can sometimes have only 5-10 elements - which make them easy to learn the basics.

    Older Wiki language text could look just like text someone would write in notepad. However modern ones - like html - now have all sorts of ${}[]==++ characters with special meanings.

    Are there any other mini languages which are easy to understand for non-programmers?

    Sunday, January 10, 2010

    pypy svn jit cheat sheet.

    A quick cheat sheet for trying out pypy jit. This is for a ubuntu/debian 32bit machine translating the jit version of pypy-c.
    # install dependencies for debian/ubuntu.
    sudo apt-get install python-dev libz-dev libbz2-dev libncurses-dev libexpat1-dev libssl-dev libgc-dev libffi-dev

    # download from svn.
    svn co pypy-trunk

    # Translate/compile the jit. This part can take a while.
    # Don't worry. Relax have a home brew.
    cd pypy-trunk/pypy/translator/goal
    ./ -Ojit

    # Or for the low memory pypy-c use this translate instead...
    #./ --gcremovetypeptr targetpypystandalone --objspace-std-withsharingdict
    The pypy getting started documentation has more about it if you're interested.

    pypy has most standard modules up to python2.5... and some from newer versions.

    I didn't need any external compiled extension modules... just sqlite, and cherrypy. So for this project I could use pypy! *happy dance*

    Didn't take long to port my code to it. I only had a couple of issues, which people in the #pypy channel on freenode helped me with.

    There is an alpha version of the sqlite3 which uses ctypes included with pypy as the 'pysqlite2' module. Seemed to work well enough for me, and passed all my tests.
    import sqlite3
    # for pypy, since the sqlite3 package is empty... but they have a pysqlite2.
    if not hasattr(sqlite3, "register_converter"):
    from pysqlite2 import dbapi2 as sqlite3
    Another issue I had, was I was using 'x is not y' to compare two integers in one function. In cpython they use a hack so that the first 100 numbers or so share the same identity. Since the numbers were always less than 100, this code was working fine in cpython. However, pypy doesn't have that problem/feature - so I just used the != instead.

    I think those were the only two things I had to change. All my tests were passing, and it was the end of the day, so I went down the pub to see a friend visiting from Tokyo who was having a birthday.

    Friday, January 08, 2010

    Unladen swallow review. Still needs work.

    Tried out unladen swallow on two work loads today. After the announcement they are going to try and bring it into core. So I finally got around to trying it (last time the build failed). An hour or so later, the build finally finished and I could try it out. The C++ llvm takes ages to compile, which is what took most of the extra time. What follows is a review of unladen swallow - as it stands today.

    The good part? Extensions work(mostly)! w00t. I could compile the C extension pygame, and run things with it.

    Now to run code I care about, my work loads - to see if their numbers hold true for me.
    cherrypy webserver benchmark: crash
    pygame tests: some crashes, mostly work.
    pygame.examples.testsprite : random pauses in the animation.

    The crashes I've found so far seem to be thread related I guess. The cherrypy one, and some of the pygame ones both use threads, so I'm guessing that's it.

    Random pauses for applications is a big FAIL. Animations fail to work, and user interactions pause or stutter. Web requests can take longer for unknown reasons etc. I'm not sure what causes the pauses, but they be there(arrrr, pirate noise).

    LLVM is a big, fast moving dependency written with another language, and a whole other runtime (C++). Unladen swallow uses a bundled version of it, since they often need the latest... and they need the latest fixes to it. This might make it difficult for OS packagers. Or LLVM might stabalise soon, and it could be a non-issue. Depending on C++ is a big issue for some people. Since some applications and environments can not use C++.

    The speed of unladen swallow? Slower than normal python for *my* benchmarks. Well, I couldn't benchmark some things because they crash with unladen... so eh. I might be able to track these problems down, but I just can't see the benefit so far. My programs I've tried do not go faster, so I'm not going to bother.

    Python 3 seems to be 80% of the speed for IO type programs like web servers (cherrypy) (see benchmarks in my previous post). However unladen-swallow only seems to be 10-20% slower for pygame games, but the random pauses make it unusable.

    Python2.x + psyco are way faster still on both these work loads. 20%-100% faster than python2.6 alone. Psyco, and stackless are both still being developed, and both seem to be giving better results than unladen swallow. Using selective optimisation with tools like shedskin, tinypyC++, rpython, cython will give you 20x speedups. So for many, writing code in a subset of python to get the speedups is worth it. Other people will be happy to write the 1% of their program that needs the speed in C. This is the good thing about unladen swallow... you should be able to keep using any C/C++/fortran extensions.

    Unladen-swallow has a google reality distortion bubble around it. They only benchmark programs they care about, and others are ignored. There are other peoples reports of their programs going slower, or not faster. However the response seems to be 'that is not our goal'. This is fine for them, as they are doing the work, and they want their own work loads to go faster. However, I'm not sure if ignoring the rest of the python communities work loads is a good idea if they are considering moving it into trunk.

    It's too early to declare unladen-swallow done, and good imho. I also think better research needs to go into it before declaring it an overall win at all. Outside review should be done to see if it actually makes things quicker/better for people. For my workloads, and for other peoples workloads it is worse. It also adds dependencies to C++ libraries - which is a nono for some python uses. Extra dependencies also increase the startup time. Startup time with unladen swallow is 33% slower compared to python for me (time python -c "import time").

    Let's look at one of their benchmarks - html5lib. See the issue html5lib no quicker or slower than CPython . They arranged the benchmark so unladen-swallow is run 10 times, to allow unladen swallow to warm up. Since Cpython is faster the first time through.

    blue - unladen-swallow, red - cpython 2.6. Time(y) for 10 runs(x).

    Notice, how jumpy the performance is of unladen on the other runs? This might be related to the random pauses unladen swallow has. I don't like this style of benchmark which does not account for the first run. Many times you only want to run code on a set of data once.

    When looking at their benchmark numbers, consider how they structure their benchmarks. It's always good to try benchmarking on your own workloads, rather than believing benchmarks from vendors.

    Memory usage is higher with unladen swallow. It takes around two times as much memory just to start the interpreter. The extra C++ memory management libraries, the extra set of byte code, and then extra machine code for everything has its toll. Memory usage is very important for servers, and for embedded systems. It is also important for most other types of programs. The main bottleneck is not the cpu, but memory, disk, and other IO. So they are trading better cpu speed (theoretically) for worse memory. However since memory is often the bottleneck - and not the cpu, the runtimes will often be slower for lots of work loads.

    It seems python2.6 will still be faster than unladen swallow for many peoples work loads. If they do not get other peoples programs and workloads working faster, or working at all, it will not be a carrot. As peoples programs work, and go faster with python2.6/2.7 it will be a stick*.

    Unladen swallow has not (yet) got to it's 5x faster goal, and for many work loads it is still slower or the same speed. For these reasons, I think it's too early to think about incorporating unladen swallow into python.

    * (ps... ok, that made no sense, sorry. Sticks and carrots?!?... donkeys like carrots, but so do ponies. I don't think we should hit people with sticks. Also people don't like carrots as much as perhaps chocolate or beer. Perhaps all this time hitting people with sticks and trying to get them to do things with carrots is the problem. Python 3 has heaps of cool things in it already... but more cool things always helps! Beer and chocolate would probably work best.)

    Thursday, January 07, 2010

    Using a html form as the model.

    I've mentioned this technique before, but I think it is worth repeating more clearly. Without any other code clouding the main message: 'All you need is love html.

    A HTML form can describe a model of your data quite well. This lets you go from HTML form design to a working Create Read Update Delete (CRUD) system. A CRUD can be thought of as a nice admin interface to edit your data (or a CMS if you prefer that acronym).

    For example:
    <form action='savepage' method='post'>
    title:<input type='text' name='title'>
    <textarea name='content'></textarea>
    <input type='submit' name='submit'></form>
    That's all you need to create a CRUD. Things like validation can be defined in the html easily enough then implemented in your code to be checked server side, and client side. Parse the form to get the model, and go from there.

    The benefits are simplicity, and that designers can make a form, and then pretty quickly go to a working system. No need to edit sql, java, php, python, etc - just normal html forms.

    Another benefit is more Rapid Application Design (RAD). From the html design you can quickly move to a working app. Especially in a work flow where the designers and clients mock up various forms - this is quicker. It also stops blockages in the production pipeline. Blockages that happen when waiting for a python/php/java programmer to implement the model.

    Multiple file uploads with html 5, cherrypy 3.2 and firefox 3.6beta5.

    Here's an example for uploading multiple files with HTML 5 and newer browsers, like the firefox 3.6 beta5. You can shift/ctrl select multiple files, and even drag and drop them in. Makes for a much nicer user experience uploading files - rather than having to select one at a time, or having to load some java/flash thing.

    It uses the unreleased cherrypy 3.2, with it's new request entity parsing tool hooks. See for details about the new control allowed over the whole process. It's a lot easier to make custom request entity parsing behaviour now, and in a much less hacky way than before.

    With the tool in there, files come in as a list of files instead.

    Wednesday, January 06, 2010

    Oldest python file in your home directory?

    Feeling just a little nostalgic this time of year.

    Just made a little script to find the oldest python files on your hard drive. [path] mystuff/python ~
    Update: Lennart mentions a unixy way in the comments of finding oldest files with this:
    find . -name '*.py' -printf "%T+ %p \n" | sort | more

    With that I found some really old python files of mine... woh! the oldest ones dated in 1998. There are older C, C++, haskel, javascript, java, pascal, prolog, asm, sql, perl, etc, etc and heaps of other old junk files, but those are the first ones I could find written in python.

    I guess that means I've been programming python for around 12 years now. Python was at version 1.4 or so, and 1.5 was released not long after. New style objects did not exist, and it was not all too uncommon to be able to segfault the interpreter (ping could easily crash linux & windows in those days, so python was doing pretty good).

    So what did some of the older python files do?

    Cut up writing was the oldest tool I found. Cutup writing was a technique used a lot in the 90s (before then and to this day as well). The idea is that you cutup various pieces of writing, and move the pieces around to get ideas.

    After that came a script to randomly ping different web hosts (by http) every few minutes. In 1998 it was common for ISPs to disconnect you from your modem if you were idle for a while. So this script seems to ping one of a random selection of hosts and then go back to sleep. I remember this being the first thing I wrote in python. It took me less than an hour to learn enough python do this little script. That one hour probably saved me $300/year or so in telephone bills.

    After that there is a script to convert all file names to lower case. Useful for bringing the contents of FAT drives onto a linux box.

    Then there was some thread testing code. Threads these days are way better, with lots better tools, better implementations, and more known about how to use them appropriately. Using threads in those days for IO was pretty crazy... but apache used processes! Python had a wicked set of async IO servers in its toolbox. Which were pretty darn cool in the day.

    Finally there were some mp3 making tools - to convert my massive collection of CDs. This was when some machines could barely play mp3s without crackling. Seems my tool would use various linux tools to make my job easier. Rip cds, get their names, convert them to mp3s. Using some old scsi drives I could go about my business without my machine slowing down completely and becoming unusable.

    What are your oldest python files?

    Tuesday, January 05, 2010

    Fifth London code dojo 6.30pm Thursday 7th of Jan.

    Here's the london dojo google calendar for much of 2010.

    More details, and signup here: The idea is to practice, learn and teach python skills.