Friday, August 10, 2007

timing and unittests - graphing speed regressions/improvements

Are there any python tools which allow you to run timing tests inside of unittests, and then see a useful report?

I'd like to see a report of how long each timing test took, and also see differences between runs. When comparing two runs I'd like to see visually which is faster.

Timing information is a very important thing to time for gui applications like games, and websites. As so many machines are very different, it's useful to be able to time things as they run on different machines. A game or website can quite easily run 10-30 times slower even on machines with the same CPU. Many other factors like OS, hard drive speed, available memory, installed drivers, directx/opengl/X/frame buffer, different browser speed, different installed versions of libraries or plugins like flash. Testing all these things manually is almost impossible, testing them manually every time something changes is definitely impossible.

So I want to use this tool pygame unittests specifically, and also for websites. If there's another testing framework which can do timing very well then we could change testing frame works away from unittest.

So I'd like to be able to save the timing data to a file or a database, and select the runs I'd want compared against.

Extra points for being able to do that automatically for different revisions of subversion. So I could tell it to checkout different revisions/branches, run the, run the tests, then show me a report. But that's something I could script later I guess.

It would be nice if the tests could be separate processes, so I could do something like:

def some_tests():

flags = ['1', '2', '3', '4', '5', '6', '7']
for f in flags:
my_f = lambda x: os.system("python -doit=%s" % f)
do_test(name = "a test for:%s" % f, my_f)

So it would just time how long each one takes to run as many times as it needs.

It'd be nice if I could read what the process prints, and provide a parsing function which can extract timing information from the output. So then when something prints "33FPS" or "blit_blend_ADD 10202094.0093" I could tell the timing framework what those mean.

Where script would be timed automatically, and I could see results for tests like "a test for 1", "a test for 2" etc.

That way I could more easily reuse existing timing/benchmark scripts, as well as time code which is not python - like any executable.

It would be nice to be able to keep the timing tests with the other tests, but be able to disable the timing tests when needed. because, it's likely some timing tests will run for a while, and do things like open many windows (which takes time).

Website collection would get bonus points. So it could collect the timing information from any machine, so I can compare them all in one place. Since I want to run these timing tests on different machines, and allow random people on the internet to submit their timing tests. Since I don't have 1000's of videocard/OS/cpu/hard disk combinations - it would be useful if anyone could submit their test results. A http collection method would be useful for storing timing information from other languages, like from flash, javascript, or html too.

Looking at average, mean, fastest run, and slowest run would be good too - rather than just average.

Merging multiple runs together so you can see the slowest results from each run, or the fastest from each run would be good. Then you do things like combine results for 1000 different machines, then find out the machine configuration for the slowest runs. If there common slow downs then you can try and see if there are any similarities for system configurations. Thus allowing you to direct your optimization work better.

Being able to run tests with a profiler active, then store the profiling data would be nice. This would be for different profilers like gprof, or a python profiler. So if you can see a profile report from a slow run, and find out which functions you might need to optimize at an even lower level.

Allowing different system configuration detection programs to run would be good too. So you can see what HD the slow machine was using, how much free ram it had, the load on the machine at the time of running, etc,etc.

All this tied in with respect for peoples privacy, so they know what data they are supplying, and how long it will be archived for.

Support for series of time would be nice too. Like giving it a list of how long each frame took to render. So then I could see a graph of that, then compare it over multiple runs. Then you could look at data from many machines, and compare how they are performing, as well as compare different algorithms on different machines.

Hopefully something already exists which can do some of these things?

What I've written above is more a wish list, but even a portion of that functionality would allow people to more easily test for performance regressions.


Grig Gheorghiu said...

You may want to take a look at nose and its plugins. Titus wrote a stopwatch plugin as part of his pinocchio nose extensions (ha ha):


philhassey said...

The part about automagically submitting the results to a website sounds particularily nice. Submitting bug reports to people is hard work, but having it done instantly is convenient. I've run the tests included with libraries before, but when they crash I always feel lazy and don't want to have to go through that whole:

- sign up on the dev mailing list
- explain what I was doing
- cut-n-paste the test-case that failed
- tell more about my system, my setup, blah blah blah ..

It would also be nice if the user could optionally enter their e-mail address at the beginning in case the dev. wanted to contact them to ask them for more details about their system, or whatever.

Marius said...

I expected to see "and a pony" by the end of the post :-)