Thursday, May 08, 2014

Statically checking python

With python, type checking is usually done at runtime.  Python will tell you of type problems when you run the code.  At runtime Python will tell you if there is a problem with the program 1 + 'a'. Since the types int and string can not be added together with the + operator.  This is why Python is often considered only a dynamically typed language, not a statically typed language.

However there are now tools to type check your python programs before you run them. Unfortunately it is not a widely known that you can type check python programs before you run them!

But what exactly are the benefits of static type checking anyway?  If you have 100% unit test coverage of your code, shouldn't that catch all the same bugs? Yes, it probably would. However, not all teams unit test all of their code.  Also, sometimes it might be dangerous to run the code to check for errors, or maybe the code takes a long time to run before it encounters errors.

You can check that:
  • interface methods are implemented
  • compatible operators are used on variables, eg "asdf" + 2
  • the correct number of arguments are passed into function calls
  • there are no unused variables, or modules
  • that variables aren't assigned to and never used
 Type inference also allows you to:
  • lookup the type(s) of a variable
  • find usages of functions
  • refactor, and type check the changes
There's hundreds more things you can check for with tools like pylint and some of the IDEs.  There are also field list markers in doc strings, where you can specify argument and return types. These can be used by tools like IDEs (eg pycharm) and other static checkers. Full program type inference is now doable for large parts of python. See the shedskin, and RPython restricted subsets for two implementations.

Now I will go over some problems you can check before runtime. I will show which tools work, and which tools do not work for these problems.

Want to download the code? You can follow along at this github link: https://github.com/illume/static_checking_python

The main tools I cover are:

Implements an interface

An interface is a way to specify what public interface implementations should present.

Can we check if an implementation of an interface in python has all of the required methods?  Below we have a Birds interface. We also have the Duck and Pidgeon implementations of the Birds interface.

implementsinterface1.py


import abc

class Birds(object):
    """docstring for Birds"""

    __metaclass__ = abc.ABCMeta

    def __init__(self, arg):
        self.arg = arg

    @abc.abstractmethod
    def noise(self):
        """docstring for noise"""
        pass

    @abc.abstractmethod
    def move(self):
        """docstring for move"""
        pass

class Duck(Birds):
    """docstring for Duck"""
    __implements__ = (Birds, )
    def __init__(self, arg):
        super(Duck, self).__init__(arg)

    def noises(self):
        """docstring for noise"""
        print self.arg

    def moves(self):
        """docstring for move"""
        print self.arg

class Pidgeon(Birds):
    """docstring for Pidgeon"""
    __implements__ = (Birds, )
    def __init__(self, arg):
        super(Pidgeon, self).__init__(arg)

    def noises(self):
        """docstring for noise"""
        print self.arg


    def moves(self):
        """docstring for move"""
        print self.arg 
 
 
 
 
(anenv)shit-robot:staticchecking rene$ pylint implementsinterface1.py 
No config file found, using default configuration
************* Module implementsinterface1
C:  1, 0: Missing module docstring (missing-docstring)
W: 23, 0: Method 'move' is abstract in class 'Birds' but is not overridden (abstract-method)
W: 23, 0: Method 'noise' is abstract in class 'Birds' but is not overridden (abstract-method)
W: 38, 0: Method 'move' is abstract in class 'Birds' but is not overridden (abstract-method)
W: 38, 0: Method 'noise' is abstract in class 'Birds' but is not overridden (abstract-method)

Here we see that pylint detects that we have not implemented the noise and move methods... because we made a typo.

implementsinterface2.py

In this file I correct the typo, and rerun pylint.

Here pylint can find that we are correctly implementing the required methods.

"""docstring for the module
"""
import abc

class Birds(object):
    """docstring for Birds"""

    __metaclass__ = abc.ABCMeta

    def __init__(self, arg):
        self.arg = arg

    @abc.abstractmethod
    def noise(self):
        """docstring for noise"""
        raise NotImplementedError

    @abc.abstractmethod
    def move(self):
        """docstring for move"""
        raise NotImplementedError

class Duck(Birds):
    """docstring for Duck"""
    __implements__ = (Birds, )
    def __init__(self, arg):
        super(Duck, self).__init__(arg)

    def noise(self):
        """docstring for noise"""
        # This will give a TypeError: cannot concatenate 'str' and 'int' objects
        # Pylint does not find this.
        print "a duck quacks this many times:" + 2

    def move(self):
        """docstring for move"""
        print self.arg

class Pidgeon(Birds):
    """docstring for Pidgeon"""
    __implements__ = (Birds, )
    def __init__(self, arg):
        super(Pidgeon, self).__init__(arg)

    def noise(self):
        """docstring for noise"""
        print self.arg


    def move(self):
        """docstring for move"""
        print self.arg
 

Detecting TypeErrors

Can you check for type errors? Let us test it out.

typeerror1.py


def doduck():
    # This will give a TypeError: cannot concatenate 'str' and 'int' objects
    # Pylint does not find this, However PyCharm does find it.
    return "The duck quacked this many times:" + 2

doduck()

First we try something simple... adding a string to a number.

This will give a TypeError: cannot concatenate 'str' and 'int' objects
Pylint does not find this error.

However, PyCharm shows the error. It says "Expected type 'str | unicode', got 'int' instead."
Note that although pylint did not find this error, it found these problems with the code:

(anenv)shit-robot:staticchecking rene$ pylint typeerror2.py 
No config file found, using default configuration
************* Module typeerror2
C:  7, 0: Final newline missing (missing-final-newline)
C:  1, 0: Missing module docstring (missing-docstring)
C:  1, 0: Invalid constant name "a" (invalid-name)
C:  3, 0: Missing function docstring (missing-docstring)
 

typeerror2.py

a = 2

def doduck():
    return "The duck quacked this many times:" + a

doduck()
 
We make it a bit more complicated for PyCharm, by putting the number into a variable.

Luckily, PyCharm still finds the error.

pylint still can not find this error.

typeerror3.py

import typeerrorsupport

def doduck():
    # This will give a TypeError: cannot concatenate 'str' and 'int' objects
    # Neither PyCharm, or pylint find this.
    return "The duck quacked this many times:" + typeerrorsupport.a

doduck()

Now we move the variable into a separate module.

This is where PyCharm does not find the error.

However pysonar2 can find the return type of the doduck() function is either a string or an int.
pysonar2 is not actually a tool for checking types, but only does the type inference in a library. It is meant for integrating into IDEs and such. It does advanced type inference in python.

Here it guesses that it could either return a string or an int type.

The command I used to generate some html output from all the files. I had installed and compiled
  java -jar pysonar2/target/pysonar-2.0-SNAPSHOT.jar . ./html

Follow the install instructions at the pysonar2 github page: https://github.com/yinwang0/pysonar2

typeerror4.py

import typeerrorsupport

def doduck():
    # This will give a TypeError: cannot concatenate 'str' and 'int' objects
    # Neither PyCharm, or pylint find this.
    return "The duck quacked this many times:{}".format(typeerrorsupport.a)

doduck()

This is correctly using the format method of string to put the int into the string.

Here pysonar2 correctly sees that an int is returned by the doduck() function.

typeerror5.py

import typeerrorsupport

def doduck():
    # This will give a TypeError: cannot concatenate 'str' and 'int' objects
    # Neither PyCharm, or pylint find this.
    return "The duck quacked this many times:{}" + typeerrorsupport.number_of_quacks()

doduck()

By adding the return type into the doc string of the typeerrorsupport.number_of_quacks() function we see that PyCharm can detect the TypeError.

If you follow the reST field lists, PyCharm (and other tools) can use that type information.

typeerrorsupport.py

a = 2

def number_of_quacks():
    """
    :rtype : int
    """
    return 2

The docstring tells it, that it would return an int type, and PyCharm detected this across module boundaries.

Note that pylint does not currently detect the TypeError even though we have told it the return type is int via the doctstring field list.

Conclusion

Much static type checking can be done in a dynamically typed language like python with modern tools.

Whilst none of the tools are perfect by themselves, we can see that by using a combination of the tools we can detect all of the problems we set out to detect.

You can use either the Abstract Base Class in the abc package which comes with python, or zope.interfaces to implement interfaces which can check that interfaces are implemented for you. I haven't gone into much detail here, just showed some basic interface checking with pylint.

PyCharm combined with appropriately written and typed doc strings can detect many problems. PyCharm also has many refactoring tools built in, and things like code completion for a dynamically typed language. Note, that this is not the only IDE or tool for python that can do this, but just the one I'm using for illustration purposes.

pysonar2 shows that types can be inferred without even specifying the types in docstrings. This has already been shown with tools like shedskin, and RPython(by pypy) which are statically inferring types from a subset of python. This style of type inference could be used for Ahead of Time (AOT) compilation, and within IDEs for better type inference. Better type inference allows better type checking, refactoring, and type inspection.