Monday, February 20, 2017

Is Type Tracing for Python useful? Some experiments.

Type Tracing - as a program runs you trace it and record the types of variables coming in and out of functions, and being assigned to variables.
Is Type Tracing useful for providing quality benefits, documentation benefits, porting benefits, and also speed benefits to real python programs?

Python is now a gradually typed language, meaning that you can gradually apply types and along with type inference statically check your code is correct. Once you have added types to everything, you can catch quite a lot of errors. For several years I've been using the new type checking tools that have been popping up in the python ecosystem. I've given talks to user groups about them, and also trained people to use them. I think a lot of people are using these tools without even realizing it. They see in their IDE warnings about type issues, and methods are automatically completed for them.

But I've always had some thoughts in the back of my head about recording types at runtime of a program in order to help the type inference out (and to avoid having to annotate them manually yourself).

Note, that this technique is a different, but related thing to what is done in a tracing jit compiler.
Some days ago I decided to try Type Tracing out... and I was quite surprised by the results.

I asked myself these questions.

  • Can I store the types coming in and out of python functions, and the types assigned to variables in order to be useful for other things based on tracing the running of a program? (Yes)
  • Can I "Type Trace" a complex program? (Yes, a flask+sqlalchemy app test suite runs)
  • Is porting python 2 code quicker by Type Tracing combined with static type checking, documentation generation, and test generation? (Yes, refactoring is safer with a type checker and no manually written tests)
  • Can I generate better documentation automatically with Type Tracing? (Yes, return and parameter types and example values helps understanding greatly)
  • Can I use the types for automatic property testing? (Yes, hypothesis does useful testing just knowing some types and a few examples... which we recorded with the tracer)
  • Can I use example capture for tests and docs, as well as the types? (Yes)
  • Can I generate faster compiled code automatically just using the recorded types and Cython (Yes).

Benefits from Type Tracing.

Below I try to show that the following benefits can be obtained by combining Type Tracing with other existing python tools.
  • Automate documentation generation, by providing types to the documentation tool, and by collecting some example inputs and outputs.
  • Automate some type annotation.
  • Automatically find bugs static type checking can not. Without full type inference, existing python static type checkers can not find many issues until the types are fully annotated. Type Tracing can provide those types.
  • Speed up Python2 porting process, by finding issues other tools can't. It can also speed things up by showing people types and example inputs. This can greatly help people understand large programs when documentation is limited.
  • Use for Ahead Of Time (AOT) compilation with Cython.
  • Help property testing tools to find simple bugs without manually setting properties.

Tools used to hack something together.

  • coverage (extended the coverage checker to record types as it goes) 
  • mypy (static type checker for python)
  • Hypothesis (property testing... automated test generator)
  • Cython (a compiler for python code, and code with type annotations)
  • jedi (another python static type checker)
  • Sphinx (automatic documentation generator).
  • Cpython (the original C implementation of python)
More details below on the experiments.

Type Tracing using 'coverage'.

Originally I hacked up a set_trace script... and started going. But there really are so many corner cases. Also, I already run the "coverage" tool over the code base I'm working on.

I started with coverage.pytracer.PyTracer, since it's python. Coverage also comes with a faster tracer written in C. So far I'm just using the python one.

The plan later would be to perhaps use CoverageData. Which uses JSON, which means storing the type will be hard sometimes (eg, when they are dynamically generated). However, I think I'm happy to start with easy types. To start simple, I'll just record object types as strings with something like `repr(type(o)) if type(o) is not type else repr(o)`. Well, I'm not sure. So far, I'm happy with hacking everything into my fork of coverage, but to move it into production there is more work to be done. Things like multiprocess, multithreading all need to be handled.

Porting python 2 code with type tracing.

I first started porting code to python 3 in the betas... around 2007. Including some C API modules. I think I worked on one of the first single code base packages. Since then the tooling has gotten a lot better. Compatibility libraries exist (six), lots of people have figured out the dangerous points and documented them. Forward compatibility features were added into the python2.6 and 2.7, and 3.5 releases to make porting easier. However, it can still be hard.

Especially when Python 2 code bases often don't have many tests. Often zero tests. Also, there may be very little documentation, and the original developers have moved on.

But the code works, and it's been in production for a long time, and gets updates occasionally. Maybe it's not updated as often as it's needed because people are afraid of breaking things.

Steps to port to python 3 are usually these:

  1. Understand the code.
  2. Run the code in production (or on a copy of production data).
  3. With a debugger, look at what is coming in and out of functions.
  4. Write tests for everything.
  5. Write documentation.
  6. Run 2to3.
  7. Do lots of manual QA.
  8. Start refactoring.
  9. Repeat. Repeat manually writing tests, docs, and testing manually. Many times.
Remember that writing tests is usually harder than writing the code in the first place.

With type tracing helping to generate docs, types for the type checker, examples for human reading plus for the hypothesis property checker we get a lot more tools to help ensure quality.

A new way to port python2 code could be something like...
  1. Run program under Type Tracing, line/branch coverage, and example capture.
  2. Look at generated types, example inputs and outputs.
  3. Look at generated documentation.
  4. Gradually add type checking info with help of Type Tracing recorded types.
  5. Generate tests automatically with Type Tracing types, examples, and hypothesis automated property testing. Generate empty test stubs for things you still need to test.
  6. Once each module is fully typed, you can statically type check it.
  7. You can cross validate your type checked python code against your original code. Under the Type Tracer.
  8. Refactoring is easier with better docs, static type checks, tests, types for arguments and return values, and example inputs and outputs.
  9. Everything should be ported to work with the new forwards compatibility functionality in python2.7.
  10. Now with your various quality checks in place, you can start porting to python3. Note, you might not have needed to change any of the original code - only add types.
I would suggest the effort is about 1/5th of the normal time it takes to port things. Especially if you want to make sure the chance of introducing errors is very low.

Below are a couple of issues where Type Tracing can help over existing tools.

Integer divide issue.

Here I will show that the 2to3 conversion tool makes a bug with. Also, mypy does not detect a problem with the code.

def int_problem(x):
    return x / 4

$ python2
$ python3

$ mypy --py2
$ mypy
$ 2to3
RefactoringTool: Skipping optional fixer: buffer
RefactoringTool: Skipping optional fixer: idioms
RefactoringTool: Skipping optional fixer: set_literal
RefactoringTool: Skipping optional fixer: ws_comma
RefactoringTool: Refactored
---    (original)
+++    (refactored)
@@ -3,4 +3,4 @@
 def int_problem(x):
     return x / 4

RefactoringTool: Files that need to be modified:

See how when run under python3 it gives a different result?

Can we fix it when Type Tracing adds types?  (Yes)

So, how about if we run the program under type tracing, and record the input types coming in and out? See how it adds a python3 compatible comment about taking an int, and returning an int. This is so that mypy (and other type checkers) can see what it is supposed to take in.
def int_problem(x):
    # type: (int) -> int
    return x / 4
$ mypy error: Incompatible return value type (got "float", expected "int")
I'm happy that Yes, Type Tracing combined with mypy can detect this issue whereas mypy can not by itself.

Binary or Text file issue?

Another porting issue not caught by existing tools is trying to do the right thing when a python file is in binary mode or in text mode. If in binary, read() will return bytes, otherwise it might return text.

In theory this could be made to work, however at the time of writing, there is an open issue with "dependent types" or "Factory Pattern" functions in mypy. More information on this, and also a work around I wrote see this issue:

In there I show that you can create your own replacement that always returns one type. eg, open_rw(fname) instead of open(fname, 'rw').

Once you know that .read() will return bytes, then you also know that it can't call .format() in python 3. The solution is to use % string formatting on bytes, which is supported from python3.5 upwards.

x = # type: bytes

So the answer here is that mypy could likely solve this issue by itself in the future (once things are fully type annotated). But for now, it's good to see combining type tracing with mypy could help detect binary and text encoding issues much faster.

Generating Cython code with recorded types.

I wanted to see if this was possible. So I took the simple example from the cython documentation.

I used my type tracer to transform this python:
def f(x):
    return x**2-x

def integrate_f(a, b, N):
    s = 0
    dx = (b-a)/N
    for i in range(N):
        s += f(a+i*dx)
    return s * dx

Before you look below... take a guess what parameters a, b, and N are? Note, how there are no comments. Note how the variable names are single letter. Note, how there are no tests. There are no examples.

In [2]: %timeit integrate_f(10.4, 2.3, 17)
100000 loops, best of 3: 5.12 ┬Ás per loop

Into this Cython code with annotated types after running it through Type Tracing:
In [1]: %load_ext Cython

In [2]: %%cython
   ...: cdef double f(double x):
   ...:     return x**2-x
   ...: def integrate_f_c(double a, double b, int N):
   ...:     """
   ...:     :Example:
   ...:     >>> integrate_f_c(10.4, 2.3, 17)
   ...:     -342.34804152249137
   ...:     """
   ...:     cdef int i
   ...:     cdef double s, dx
   ...:     s = 0
   ...:     dx = (b-a)/N
   ...:     for i in range(N):
   ...:         s += f(a+i*dx)
   ...:     return s * dx 

In [3]: %timeit integrate_f_c(10.4, 2.3, 17)

10000000 loops, best of 3: 117 ns per loop
Normal python was 5200 nanoseconds. The cython compiled version is 117 nanoseconds.  The result is 44x faster code, and we have all the types annotated, with an example. This helps you understand it a little better than before too.

This was a great result for me. It shows that yes combining Type Tracing with Cython can give improvements over Cython just by itself. Note, that Cython is not only for speeding up simple numeric code. It's also been used to speed up string based code, database access, network access, and game code.

So far I've made a simple mapping of python types to cython types. To make the code more useful would require quite a bit more effort. However, if you use it as a tool to help you write cython code yourself, then it's very useful to speed up that process.

The best cases so far are when it knows all of the types, all of the types have direct cython mappings, and it avoids calling python functions inside the function. In other words, 'pure' functions.

Cross validation for Cython and python versions?

In a video processing project I worked on there were implementations in C, and other assembly implementations of the same functions. A very simple way of testing is to run all the implementations and compare the results. If the C implementation gives the same results as the assembly implementations, then there's a pretty good chance they are correct.

In [1]:  assert integrate_f_c(10.4, 2.3, 17) == integrate_f(10.4, 2.3, 17)

If we have a test runner, we can check if the inputs and outputs are the same between the compiled code and the non compiled code. That is, cross validate implementations against each other for correctness.

Property testing.

The most popular property testing framework Quickcheck from the Haskell world. However, python also has an implementation - Hypothesis. Rather than supply examples, as is usual with unit testing you tell it about properties which hold true.

Can we generate a hypothesis test automatically using just types collected with Type Tracing?

Below we can see some unit tests (example based testing), as well as some Hypothesis tests (property testing). They are for a function "always_add_something(x)", which always adds something to the number given in. As a property, we would say that "always_add_something(x) > x".  That property will hold to be true for every value of x given x is an int.

Note, that the program is fully typed, and passes type checking with mypy. Also note that there is 100% test coverage if I remove the divide by zero error I inserted.

from hypothesis import given
import hypothesis.strategies

from bad_logic_issue import always_add_something, always_add_something_good

def test_always_add_something():# type: () -> None
    #type: () -> None
    assert always_add_something(5) >= 5
    assert always_add_something(200) >= 200

def test_always_add_something_good():
    #type: () -> None
    assert always_add_something_good(5) >= 5
    assert always_add_something_good(200) >= 200

def test_always_add_something(x):
    assert always_add_something(x) > x

# Here we test the good one.
def test_always_add_something(x):
    assert always_add_something_good(x) > x
Here are two implementations of the function. The first one is a contrived example in order to show two types of logic errors that are quite common. Even 30 year old code used by billions of people has been shown to have these errors. They're sort of hard to find with normal testing methods.

def always_add_something(x):
    # type: (int) -> int
    '''Silly function that is supposed to always add something to x.

    But it doesn't always... even though we have
     - 'complete' test coverage.
     - fully typed
    r = x #type: int
    if x > 0 and x < 10:
        r += 20
    elif x > 15 and x < 30:
        r //= 0
    elif x > 100:
        r += 30

    return r

def always_add_something_good(x):
    # type: (int) -> int
    '''This one always does add something.
    return x + 1

Now, hypothesis can find the errors when you write the property that the return value needs to be greater than the input. What about if we just use the types we record with Type Tracing to give hypothesis a chance to test? Hypothesis comes with a number of test strategies which generate many variations of a type. Eg, there is an "integers" strategy.

# Will it find an error just telling hypothesis that it takes an int as input?
def test_always_add_something(x):

It finds the divide by zero issue (when x is 16). However it does not find the other issue, because it still does not know that there is a problem. We haven't told it anything about the result always needing to be greater than the input. ZeroDivisionError
-------------------------------------------------------- Hypothesis --------------------------------------------------------
Falsifying example: test_always_add_something(x=16)
The result is that yes, it could find one issue automatically, without having to write any extra test code, just from Trace Typing.

For pure functions, it would be also useful to record some examples for unit test generation.

In conclusion.

I'm happy with the experiment overall. I think it shows it can be a fairly useful technique for making python programs more understandable, faster, and more correct. It can also help speed up porting old python2 code dramatically (especially when that code has limited documentation and tests).

I think the experiment also shows that combining existing python tools (coverage, mypy, Cython, and hypothesis) can give some interesting extra abilities without not too much extra effort. eg. I didn't need to write a robust tracing module, I didn't need to write a static type checker, or a python compiler. However, it would take some effort to turn these into robust general purpose tools. Currently what I have is a collection of fragile hacks, without support for many corner cases :)

For now I don't plan to work on this any more in the short term. (Unless of course someone wants to hire me to port some python2 code. Then I'll work on these tools again since it speeds things up quite a lot).

Any corrections or suggestions? Please leave a comment, or see you on twitter @renedudfield


NotSqrt said...

Looks awesome !
Please tell us that you have some code visible somewhere, I'm so interested by this type tracing approach !

Pankaj Pandey said...

I am also very interested in such an approach. If you have something working please post it. IIRC Pycharm (the Python IDE from IntelliJ) can optionally do type tracing when debugging within the IDE to improve its static checking and suggestions. It also writes out stub files with the type information comments

Gavin Bisesi said...

I agree, please post your code somewhere! This would be a huge help to the community