Thursday, February 02, 2017

Where the code & data things are. Part 3.

This is part three of a series of articles about packaging python games. Part one, Part two. More discussion is happening on the pygame mailing list.

TLDR; I think we should follow the python conventions and fix the problems they cause.

src, gamelib, and mygamepackage

1). One thing that is different in the skellington layout from the sampleproject one is that the naming is a bit more specific for where the code goes in skellington.

Skellington layout:
gamelib/
data/

Why this is good? Because you can start writing your code without first having to decide the name. It's a small thing, but in the context of game competitions it's more important. I'm not sure if it's really worth keeping that idea though.

Sampleproject layout after doing "skellington create mygamepackage"
mygamepackage/
data/

(Where skellington is the name of our tool. It could be pygame create... or whatever)

The benefits of this are that you can go into the repo and do:
import mygamepackage

Any it works. Because mygamepackage is just a normal package.

You can also do:
python mygamepackage/run.py
Whilst naming is important, the name of the package doesn't need to be the name of the Game. I've worked on projects where the company name, and app name changed at least three times whilst the package name stayed the same. So I don't think people need to worry so much. Also, it's not too hard to change the package name later on.
 
My vote would be to follow the python sampleproject way of doing things.


Data folder, and "get_data".

2). The other aspect I'm not sure of is having a data folder outside of the source folder. Having it inside the source folder means you can more easily include it inside a zip file for example. It also makes packaging slightly easier. Since you can just use from within a MANIFEST.in a recursive include of the whole "mygamepackage" folder.

Having data/ separate is the choice of the sampleproject, and the skellington.
I haven't really seen a modern justification for keeping data out of the package folder? I have vague recollections of reasons being: 'because debian does it'. My recollection is that Debian traditionally did it to keep code updates smaller. Because if you only change 1KB of source code, there's no point having a 20MB update every time.

A bonus of keeping data separate is that it forces you to use relative addressing of file locations. You don't hardcode "data/myfile.png" in all your paths. Do we recommend the python way of finding the data folder? This is package_data, and data_files setup attributes. https://github.com/pypa/sampleproject/blob/master/setup.py
They are a giant pain. One, because they require mentioning every single file, rather than just the whole folder. Two, because they require you updating configuration in both MANIFEST.in and setup.py. Also, there are different files included depending on which python packaging option you use.
See the packaging.python.org documentation on including data files: https://packaging.python.org/distributing/?highlight=data#data-files

Another issue is that, using the python way pkg_resources from setuptools needs to be used at runtime. pkg_resources gets you access to the resources in an abstract way. Which means you need setuptools at runtime (not just at install time). There is already work going into keeping that separate here: https://github.com/pypa/pkg_resources So I'm not sure this will be a problem in the next months.

I haven't confirmed if pkg_resources works with the various .exe making tools. I've always just used file paths. Thomas, does it work with Pynsist?

Having game resources inside a .zip (or .egg) makes loading a lot faster on many machines. pkg_resources supports files inside of .egg files. So, I think we should consider this possibility.

A single file .exe on windows used to be possible with pygame including all of the data. It worked by adding a .zip file to the end of the .exe and then decompressing that before running it. It actually made startup time slower, but the benefit was distribution was pretty easy. However, putting everything in a .zip file was just as good.

Perhaps we could work on adding a find_data_files() type function for setuptools, which would recursively add data files from data/. We could include it in our 'skellington' package until such a thing is more widely installed by setuptools.

Despite all the issues of having a separate data/ folder, it is the convention so far. So my vote is to follow that convention and try fixing the issues in setuptools/pkg_resources.


Too many files, too complex for newbies.

3) Modern python packages have 20-30 files in the root folder. I have heard the complaint many times that it makes it difficult to figure out where to put things. It makes it complex. This is the strong feedback I got in one pyweek where lots of people decided to use the older skellington instead.
We can help people asking the question "where does my game code live?", "where do my image files go?" by putting it right up the top of the readme.

We can also help by using dotfiles ".file" so they are hidden. And also using .gitignore and such. We can also try to keep as many 'packaging' related files in a 'dist' folder. Even better would be to put things in our 'skellington' package, in setuptools, or upstream wherever possible.

It used to be convention to have a 'dist' folder which would contain various distribution and packaging scripts. (It's where distutils puts things too). I'm not sure putting scripts in there is a good idea.
Another reason I think a package based layout will work now is that compared to 3 years ago, the python packaging system has improved a lot. As well, we don't need to support older pythons with more broken things. Also, I think if a few people iterate on the skellington, it should become clearer and less buggy than what I presented to people 3 years ago.

The other problem with having a million config files is that the question "where do I change the app description?" becomes harder. With cookiecutter, we can make a template which fills templates with all the metadata. However, often you want to change that after you started. Maybe there's no real solution right now for all this. It is definitely a concern we need to try to address at least in some way.
I think it's important that we test the structure with people and gain their feedback early on. To do this, I'd like to ask someone who hasn't done a python package before and who has done a game to package it up using our structure and tools.

My vote would be to add simple instructions to the top of the readme, to work on fixing things upstream as much as possible, and to be very mindful about adding extra config files or scripts, moving much config out of the repo as is possible.

No comments: