I spent a little bit of time recently profiling and optimizing some code.
This is not a usual thing to do for lots of programing tasks, because these days, often code is fast enough to do the job. However fast enough for which computer? One of the machines I run is an old duron 850 mhz from around 1999. Some programs are starting to get a little slow. Because developers are only testing to make sure their programs run on the latest hardware. Lots of people I know still have computers even older than this! So I want things to run really nicely on my slow machine at least. In the hopes that it will run nicely on other peoples slow computers. Luckily it often doesn't take too mucheffort to speed things up.
Most things I do run ok even on my old slow machine. However on two demanding applications I am working on there is a needfor profiling and optimization.
Here are some of the things I found...
Python game optimization.
My game holepit was leaking a little bit of memory between levels. It turned out that a destructor was not being called for the level, so a dislay list was not being freed. I think there may have been an error in the destructor(which doesn't tell python if there is an error there), or python was waiting to garbage collect it. Whatever. I just put an explit clean up call there.
Also the display list was getting compiled and executed every frame. Rather than reusing the previously compiled display list. This gave quite a big speed up too. The moral of the story... memory leaks are bad, and so is garbage collection in performance intensive situations requiring os cleanup(opengl resources, and file resources).
For mouse motion events, sometimes three or four could be delivered per frame. So my 'turn the avatar around' code was being called up to four times times a frame. This turned out to be a major cause of slow down in the intro sequence because there the time is sped up. The intro uses recorded mouse events to control a character which writes the word 'holepit' on the screen. It was getting up to eight a frame. Once I filtered these events to only update on the last mouse motion event every frame then the intro sequence ran a lot better. Such that it didn't skip between sections of the word. Much nicer looking :)
It had less of an inpact in the main game though.
One thing that did come up though was a place where I was using the dreaded 'exec'. This was in some code which interfaces with C++ code. Finding a class name encoded into a string pointer, and constructing that class. So I removed the exec call by storing the classes in a dict keyed by thier names, then constructed the classes from that instead.
This did make a noticable performance impact.
There were a number of other low hanging fruit in holepit for increasing the performance. However the game was now runing quite a bit faster than before such that it is running faster than vsync on my slow computer(over 75 fps). So now it is 'fast enough'.
Now the tactic I used to optimize www.pretendpaper.com was quite a bit different.
As of this writing the website still needs a bunch of work to make faster. But it is getting close to usable now. It takes 4-5 seconds to load on my slower computer, less on my faster one.
I already knew without profiling some of the slow bits. There is *lots I can do to pretendpaper.com to speed it up. But I knew of a couple of sure things which would give massive speedups. Also I knew that by chosing this one thing, it would only take me very little time, and not require much reworking of code.
I just put eight requests into one request. Those eight requests always happened anyway, so I might as well do them at once all the time. The overall time to request one big thing is less because 1) there is no latency between the requests. 2) there is better compression because some of the data in the requests is the same.
During changing the requests to use the same one I found quite a bit of duplicated code. As well as some things which could be easily done(and cached) on the server side. Saving the client side quite a bit of work.
More waffling on about the differences.
So even without a profiler, you can sometimes just know what bits are slow. However a profiler can definitely help you out. Especially for a large amounts of code, or when you did not write the code in the first place.
Optimizing towards fast enough is a good goal. As long as you optimize for fast enough on the computers that it will run on, not your brand new really fast machine. Just optimizing to fast enough is important when there are still lots of features needed to add, and bugs to fix.