Friday, March 23, 2018

Investigating pypy frame drop

pypy has spikes in time it takes occasionally when using cpyext, otherwise known as pauses.
This is because it's deallocating lots of CPython objects all in one frame, rather than incrementally. This is likely to be addressed in a future release of pypy.

Mainly it's a problem when creating and deleting lots of CPython objects (like pygame.Rect). Not pure python objects.

To work around it for now,
  • the src/rect.c has been changed to have a free list of python objects, so it doesn't alloc and dealloc them. Instead it maintains it's own list of rect PyObject pointers, and reuses them. This is done in PR #431
  • Use PYPY_GC_NURSERY=1m incminimark environment variables when running pypy.
Below are timings of the default pygame, and default pypy (at time of writing). Then a benchmark showing the improvement with the rect freelist implemented in rect.c inside pygame. Then we also improve things by setting PYPY_GC_NURSERY=1m environment variable to change the behavior of the GC. I'm not aware of an API to do this within pygame itself unfortunately. Finally there are some cpython 3.6 timings for comparison.

hacked up benchmark.

The hacked up benchmark is in the branch pypy-hack-frame-bench of the pygame repo. It's based off examples.testsprite.

Because pypy currently doesn't compile matplot lib you need to run it in two steps. First to do the run (let it run for 20+ seconds). Then to load the data pickle, and show the graph.

pypy examples/ -plot -noupdate_rects -width 320 -height 200 100
python examples/ -plotpickle 

To run with the GC tweak...

PYPY_GC_NURSERY=1m pypy examples/ -plot -noupdate_rects -width 320 -height 200 100

graph colors

  • Blue is the first 1200 frames
  • orange is the last 1200 frames of the run.
This is useful to see if there are any jit warm up affects happening, or different behaviour over time. (doesn't appear so).

Time per frame on pypy 5.10.0.

For this example we don't see any 'jit warm up' behavior.
screen shot 2018-03-23 at 07 21 05

Time per frame on pypy 5.10.0, rect freelist, standard GC.

screen shot 2018-03-23 at 12 05 42

Time per frame on pypy 5.10.0, rect freelist, PYPY_GC_NURSERY=1m.

screen shot 2018-03-23 at 12 04 42
This final result for pypy 'ok' because we want under 0.0166 seconds used per frame. Improvements inside pypy itself in the future should hope to remove many of these spikes.

Time per frame on python3.6

Here we see both the average time per frame and maximums are smaller on python3.6.
screen shot 2018-03-23 at 12 22 02

Time per frame on python3.6, rect freelist

It seems the freelist for rects on python3.6 has a minimal affect.
screen shot 2018-03-23 at 12 20 12

Time per frame on python3.6, gc.disable()

Here we can see using gc.disable() seems to have no affect in this program (on python 3.6). I also tried this on pypy, and it had no affect either (on this program).
screen shot 2018-03-23 at 12 25 14
Other things I tried was to do a gc.collect(0) just before display.flip(), however this didn't appear to help in this case. This is because we can know if we have 5-10ms free waiting for the display VSYNC to flip (when running at 60Hz). Perhaps this can be attempted again as the cpyext GC is improved.

No comments: