Saturday, February 04, 2017

python packaging zero - part one

In part zero of this series,
I pontificated on,
"What would python packaging zero look like?"
A zero'd package contains just code(and data). Nothing else.
Code readability is important. Code changeability is important.

These two things have always been a core part of what makes python good. However, the current python packaging world fails on both counts. It's actually pretty damn good overall though (binary wheels for most platforms, a resilient CDN, cached packages, dependency management, enhancement peps are being written, there's now a pypa organisation on github where people collaborate on code together... all good stuff).

However, having 10-40 config files in your repo is not readable, nor easily changeable. Which are the files that matter?

Generating files from a template is not changeable.

Django, rails, cookiecutter, sampleproject - they all generate dozens of files for your project from a template. But when you want to change these files?

Being able to change the name of your package or app is important. Especially for creative coding, where you don't even know what you're making! (I would suggest all the best types of coding are creative and have this factor).  If we are going to try and get the game and arts communities to use packages, then no way in hell should we point them at this huge complexity of what the current packaging system is.
i_can_not_think_of_a_name.py
So you start right away on the thing that matters. (hint, that's not packaging, it's your code). Then you eventually figure out you're writing an app to help save the whales from idiots.
savethewhales.py
You just rename the file. That's all you need in the repo. In the normal python packaging way you have to also update the setup.py, and all other sorts of places. Quite possibly causing hours of debugging to happen when something doesn't work later.

What other packaging information can be derived?

Previously I mentioned how other pieces of packaging metadata can be derived. Things like the package name, and the author_email. What else do we need, and where do we get it from?

  • If there is a "data/" folder then package that up. This is a convention.
  • If there is a test_savethewhales.py then perhaps run the tests before release. Again, a convention.
  • What does the package depend on? Does it use pygame or click? Add them to the "requirements" inside of setup.py. These can be obtained by "pip freeze", or by parsing the package imports.
  • Are there command line scripts in the app? Does savethewhales.py have a if __name__ == '__main__' and a main() function? Then lets make a "console_script" in the setup.py which generates the "savethewhales.exe" on windows, and the "savethewhales" script on unix.

Can folders be used as well as single file packages?

Yep. Now a folder with a .py file in it is considered a package by python (3.3+). So if we run our tool on a repo that contains a folder with files in it, then that is a package. This is easy to detect.

    punchnazis/trumps.py
    punchnazis/humans.py
    punchnazis/main.py
    data/
Say you have a game where you punch nazis (eg. wolfenstein altright edition), then the packaging tool can see these files, and make it upload a 'punchnazis' package for you. So then you can 'python3 -m punchnazis', or call the script: 'punchnazis'.


Complexity when we need it.

Optimize for simple cases, allow handling the complex cases.
 What if this doesn't work for my package? Not all packages will be able to work with this(I would suggest many can however). In these cases, then you are free to add all the extra special cases in your own setup code. At that point start adding all the 20 config files you need. (Have you seen some of the setup.py files in the wild? There's all sorts of special case handling for things that matter for those packages).

We should optimize for these simple use cases, because most of our code should be simple. We use convention, we make good (but opinionated) choices, we derive the information.

A temporary folder where setup.py files are generated for release.

When doing a release, we tag the version in git, and increment.
Generating a temporary folder with all the setup.py files, the MANIFEST.in, the code and data copied in. tox can run the tests. Twine can upload to pypi, docs can be uploaded to readthecode, all that good business.


(If you want to join the discussion on packaging games, please join us on the pygame mailing list).



[ED: I was pointed at these two tools which also show a dislike of lots of packaging boilerplate code...
]

4 comments:

Chris Arndt said...

> What does the package depend on? Does it use pygame or click? Add them to the "requirements" inside of setup.py. These can be obtained by "pip freeze", or by parsing the package imports.

This is IMHO a bit over-simplified and misleading. (Apart form very specific cases) You should never pin the versions of your dependencies in the install_requires list in setup.py. The output of "pip freeze" pins the versions you have installed. It is not meant for direct inclusion in setup.p, but for a requirements.txt of an application. Setting a range of acceptable versions (e.g. "foo>=1.0,<2.0") can be ok, but should be used with consideration.

Pinning the versions of your dependencies in setup.py leads to dependency conflicts if an application, which depends on your package, also depends on one of your dependencies, but in another version (e.g. in its requirements.txt).

Rene Dudfield said...

Good point. Also to consider is lock files/shrinkwrap and such. Where you pin dependencies. I agree the default should be to not specify versions.

graham-cracker said...

I like this line of thought. I grew tired of writing a setup.py for simple packages and threw this together a while ago: https://github.com/braingram/simple_setup It doesn't handle the hard cases (cython, etc.) and hasn't been tested with python 3 but seems relevant.

Rene Dudfield said...

Aha! Graham, yes it is very relevant. Thanks.

Seems it does quite a lot of what I was looking for. Finding all the information needed to do releases. The only main difference is that it lives in a setup.py rather than in a separate library. This has the advantage that not other libraries are needed to be installed first.

Thanks.