I'd like to see a report of how long each timing test took, and also see differences between runs. When comparing two runs I'd like to see visually which is faster.
Timing information is a very important thing to time for gui applications like games, and websites. As so many machines are very different, it's useful to be able to time things as they run on different machines. A game or website can quite easily run 10-30 times slower even on machines with the same CPU. Many other factors like OS, hard drive speed, available memory, installed drivers, directx/opengl/X/frame buffer, different browser speed, different installed versions of libraries or plugins like flash. Testing all these things manually is almost impossible, testing them manually every time something changes is definitely impossible.
So I want to use this tool pygame unittests specifically, and also for websites. If there's another testing framework which can do timing very well then we could change testing frame works away from unittest.
So I'd like to be able to save the timing data to a file or a database, and select the runs I'd want compared against.
Extra points for being able to do that automatically for different revisions of subversion. So I could tell it to checkout different revisions/branches, run the setup.py, run the tests, then show me a report. But that's something I could script later I guess.
It would be nice if the tests could be separate processes, so I could do something like:
flags = ['1', '2', '3', '4', '5', '6', '7']
for f in flags:
my_f = lambda x: os.system("python myscript.py -doit=%s" % f)
do_test(name = "a test for:%s" % f, my_f)
So it would just time how long each one takes to run as many times as it needs.
It'd be nice if I could read what the process prints, and provide a parsing function which can extract timing information from the output. So then when something prints "33FPS" or "blit_blend_ADD 10202094.0093" I could tell the timing framework what those mean.
Where script would be timed automatically, and I could see results for tests like "a test for 1", "a test for 2" etc.
That way I could more easily reuse existing timing/benchmark scripts, as well as time code which is not python - like any executable.
It would be nice to be able to keep the timing tests with the other tests, but be able to disable the timing tests when needed. because, it's likely some timing tests will run for a while, and do things like open many windows (which takes time).
Looking at average, mean, fastest run, and slowest run would be good too - rather than just average.
Merging multiple runs together so you can see the slowest results from each run, or the fastest from each run would be good. Then you do things like combine results for 1000 different machines, then find out the machine configuration for the slowest runs. If there common slow downs then you can try and see if there are any similarities for system configurations. Thus allowing you to direct your optimization work better.
Being able to run tests with a profiler active, then store the profiling data would be nice. This would be for different profilers like gprof, or a python profiler. So if you can see a profile report from a slow run, and find out which functions you might need to optimize at an even lower level.
Allowing different system configuration detection programs to run would be good too. So you can see what HD the slow machine was using, how much free ram it had, the load on the machine at the time of running, etc,etc.
All this tied in with respect for peoples privacy, so they know what data they are supplying, and how long it will be archived for.
Support for series of time would be nice too. Like giving it a list of how long each frame took to render. So then I could see a graph of that, then compare it over multiple runs. Then you could look at data from many machines, and compare how they are performing, as well as compare different algorithms on different machines.
Hopefully something already exists which can do some of these things?
What I've written above is more a wish list, but even a portion of that functionality would allow people to more easily test for performance regressions.