Thursday, July 26, 2007

Git problems... when moving.

There is a good explanation of the problems people have because of gits lack of move support. This is Linus explaining this problem he has with git.

The problem is that git gets confused when you move code from one file to another, and change it at the same time. Since it uses code similarity to find moved code - changing the code at the same time that you move it confuses it's move detection code.

So the solution is a convention of making almost no code changes when moving a file. Delete the file, and create your new file with no changes, and merge - then make your code changes in the new file.

See the problem with git and move for more details.

Of course systems which don't use move detection by code similarity completely fail when people delete a file, and create a new file(eg. bzr).

So both git, and bzr fail when you forget to follow conventions. An ultimate system would do both - detect code moves automatically, and allow you to do explicit moves.

Update: Jay Parlar comments that you can do explicit moves with git. By using the git-mv tool.

Saturday, July 21, 2007

Why urlencoding is a good 'format' for ajax.

urlencoding works for most languages. eg, javascript, flash, python, php. So you can use it in a limited sense to encode stuff for transport.

eg. a script could return this:
a=3&r=hello+there&end=1

Streaming is the cool thing you can do that you can't really do with json, or xml. Well you can, but it's a tad harder. Decode/encode is really quick for urlencoding, and can be slightly better than json/xml.

This is an old trick that's been used in the flash world forever, but you can do it with js too(but no one seems to). Since not many people seem to be doing it in js, I thought I'd share the technique. Json is probably a better encoding to use most of the time, but this method has it's advantages.

With the partial data you download you can try to urldecode it. If you put markers in the data, then you can check up to that point.

eg.

at data == "a=3&r=hello+th"
You can tell that a=3 is correct, but not what r equals. You also no you aren't at the end, because the end=1 part has not been reached.

So your code can do this:

data = get_any_data_available()
decoded = urldecode(data)
while decoded['end'] != '1':
# do stuff here.
data = get_any_data_available()
decoded = urldecode(data)



There's other tricks like continuously reading from an open connection. Your server side script can spit out new stuff as it wants - and your client side can just keep trying to decode the new data looking for marker points like the 'end=1' in the example above. An example is a chat script - which just prints out changes from the db as they happen - when a trigger fires it prints out the changes, then goes back to sleep.

Filling up tables with data is useful using streaming. Say you have 200KB of data to make a table with, creating the table as the data comes in allows you to show the data quicker. Because you don't need to wait for the entire 200KB to download before you start constructing the table in html.

In a similar way to the table filling example, building other parts of html is quicker with streaming. Because you don't need to wait for it all to download.

You can encode whatever you want in the url encoded variables... including json, or xml - or just plain strings.

For doing a sequence of things I usually use numbered variables: eg:
a0=2&b0=3&a1=22&b1=33

Which could be translated into: data = {'a':[2,22], 'b':[3,33]}

As you can see urlencoding can also encode data with less wasted space than json, or xml.

Yes it's hacky, wacky and weird - but works quite well for some things.

Revision tracking of functions is more important than file names.

With git, and bazaar two differences are that one tracks code moves fairly well and one tracks file moves well. One was originally written mostly by someone managing a code base, and the other is originally written by a company doing an operating system. So you can see their priorities - one has lots of code it cares more about, and the other has a lot of files it cares more about. (There are quite a few other priorities and features that each have).

I guess tracking the code itself is much more important than tracking file renames - for me. Much more useful anyway.

However *both* are important parts of revision control.

Many times during development I might cut a function/class from one file and move it into another. I can't get reports of this information in a meaningful way with bazaar (or can I?).

Also I might accidentally move a file with the command line tools, change it, then add it in again to the revision control system. This is because I forget that I need to use the revision controls system of moves.

A system that can automatically track *code* moving around, then can see that a file has been moved/renamed. It shows you that the code has moved, which is often more important, than showing you the file names have moved.

However the system that can track file renames *when you tell it* can not automatically see that I moved a function from one file to another. Or see that I split a big file of code into separate pieces - that is I moved one file into five.

Well it could, and I think that's important. I often care about what has changed at a module level, but more often care about what has changed at a class level, or a at the function level. I even care about change sets at the module, package, and application levels.

Full text search could help dramatically here, however having the tools built into the revision control system is much handier - and probably quicker.

So I think you should be able to see that the 'bazaar is lossless' argument is not true... or is it? I think bazaar is still lossless, it just doesn't let you get at all that information yet - just the data.

In the future it would be possible given a bazaar tree to find where a function has lived, no matter which file it has lived in.

Caring about what is in the files, is more important than caring about the files themselves. So these tools that work with revision control systems should be able to say things like

'svn log my_superfunction'

and see a log of changes to my_superfunction no matter where it lives. Perhaps giving it a hint of which file to look in, or even provide the full text of the current function.

However would this mean that the revision control system would need to know more about the content it is storing? I think a lot of it could be generalised, like 'this is a block of code' could be defined as a C++ class, a C function, or a python function. These tools could probably even be built outside of revision control systems. There's probably some tool that can be used for this already?

What about the case where code moves from one repository to another? This is a common case in a lot of projects. It'd be nice to be able to tell your RCS that you copied this function from project X with the repository XX. Then have your same commands for looking at history automatically work. I think bazaar - with launch pad - is interested in doing this type of stuff, and probably GIT too. I think with bazaar you can already do this - tell it you're merging from a separate svn repository, and it'll copy all of the history. Well, for files anyway, not sure about separate functions. Mostly when you move code from one project to the other, the file names will not be the same.

It's all very interesting reading about these different systems, and the thought that goes into them. Mostly though, I'll probably still just use add del update and commit - with hopefully a bit more of push, and merge. I'm using bazaar more and more now, along with svn too.

Friday, July 20, 2007

My issues with python ORMs

If a python ORM you know of addresses these issues, please let me know.

Python ORMs break with multiple processes.


Multiple processes are not assumed. Python ORMs all seem to use heavy local caching, which fails when the database is modified by another process. This is unfortunate as for me I like to use different tools for different jobs. Or there might be different people I work with that write tools in different processes. Or even the common case that each web request is run in a different process - or on a different machine.

It is not commonly known that just because something outside of a python app changes a database that the python app will break. Most applications that use databases do not break if the database changes from outside of the application.

Using memcache or something like it seems to be a solution to some of this problem. Also optionally allowing the ORM to not cache certain queries - or even stopping all caching. Caching using python dicts is bad anyway, because of pythons memory wastage.

Constructors are limited to creation.


Insert seems to be assumed to be the most common operation. So python ORMs seem to only allow you to do inserts with the constructors.

Having constructors which get a row based on a primary key would be nice. Since that is one of the most common things I do.
eg. 'p = Person(1)' would get the person with primary key 1.

A row is not the only use of mapping.


Even being able to do no select, insert or update in the constructor is useful.
eg. 'p = Person()' would create a person instance which is so far empty. I use this to shortcut things I want to do with that table - not on a row of that table.
So I can do things like this:
eg. 'rows = p.get_all("active=3")' It's just shorter for me to type this stuff, easier to remember how to do things, and is easier to read(I think).
eg2. p.save({'id':1, "name":"george"})

By using methods on a class which says 'I am acting on this table' you can shorten a lot of code. The two examples above show how much shorter, and simpler select and save are. The methods of the instance don't necessarily act on the row that instance might represent - but on the table that instance represents.

Having to pass each attribute by name.


I like to do code like this:
table.save(vars)

Not this:
t = Table(a=2, b = "asdfsadf", g="123123")

Where vars is a dict that has all of the things I want to save. If vars is a dict from a webpage where you use POST variables, then you should be able to recognize time savings. Also if you update the fields in a table you don't need to update the code here.

That way it can either do an insert or update depending if it already exists or not(determined by a primary key often).

The niceness of this is that I don't need to recode the save part depending on which variables the table takes. Also if there are extra variables passed in they are ignored. I don't need to write each individual attribute for each table.

A workaround would be to override the constructors to do this. But this behavior is not built in by default - and most python ORMs use the python constructor to pass in arguments.

Using python constructors you get a TypeError if you pass it unexpected variables. So for me that is a major problem for using the python constructor for this, and not having a separate save, insert or update methods. The python class creation semantics are not exactly what you want at all times.

Issues can be worked around.


These last issues are caused because I want a python relational mapper, not really an object relational mapper. Python doesn't force you to use objects for everything - neither should a relational mapper for python. Well, maybe it should - if the authors want that.

I work around these issues myself, so I can build my favourite API on top of existing python ORMs. But I'm wondering if I missed some solutions to these issues that are already in the python ORMs? Or why people don't see these as problems?

Wednesday, July 11, 2007

europython 2007 - batching apis as applied to webpages

hellos,

http://rene.f0o.com/~rene/stuff/europython2007/website_batching/

Here's my paper and code for combining multiple images - and other things.

Monday, July 09, 2007

europython2007 - Taking advantage of multiple CPUs for games - simply

Here's my Multiple CPU paper I am presenting tomorrow morning at europython. The pdf on the europython website is a bit old.

Hope to see you there! ... if you're not too hung over.

Abstract:

Taking advantage of multiple CPUs for games --- simply, is the topic of this paper. Using a simple interface that many are already familiar with --- Pythons 'map' function. It talks about how to avoid the Global Interpreter Lock (GIL) limitation in Pythons threading libraries. As well as how to choose which parts of a game to thread. Finally it shows how easy it can be, by converting a full pygame to make use of multiple CPUs --- and benchmarking the game.

snakes on a phone - europython 2007

This talk was about Python on Nokia s60 phones.

These are my notes...

Discusses implementation details - like what they had to change to get it working nicely. He also discussed what you can do with the phone - which turns out to be most things. Like accessing the camera, and sending/receiving sms.

It's based on python 2.2, but has had some things back ported. Like the pymalloc. He said at some point they might update the python to a newer version. The schedule is 'The future - a closer future than before.'. Then of course before that he said 'the future is now' - probably those two statements aren't related, and were taken out of context.

Including not using writable static data, and the modifying interpreter for the security model on the phone. The security model includes using capabilities, and signing executables. There are different levels of signing certificates. The basic ones are free, but others require contracts and money.

By default as python can run scripts the default capabilities. So the capabilities for the python are limited. You can sign it yourself if you want to though.

Even then it won't let you do somethings to your own phone. You'll need a manufacturer certificate to do some things.

It's interesting to compare the pocketpc python which you can run on windows mobile phones, and the nokia one. The pocketpc python seems to be a newer version - but the nokia python is supported by nokia employees. I'm not really sure which are integrated better with the phone.


He demonstrated using the phone via blue tooth. So he could have the interpreter running on his laptop, and running the code on the phone.

import camera
img=camera.take_photo()

Guess what that code does? That's pretty easy!

import messaging
messaging.sms_send('+3112342134234', u'a message')

import telephone
telephone.dial('+3109832405830495')
telephone.hang_up()
telephone.say(u'hello europython')

It all seems pretty cool, and makes me want to pick up a second hand s60 phone. I guess the next step will be for someone to make portable code so you can run the same python code on different mobile phones (pocketpc, s60, linux, etc).

He talked about trying to get their changes into the main python. Including things like reducing memory usage. Reducing memory usage would also be great for servers too - the less memory you use. So hopefully some of those changes can go back into the main python.

Friday, July 06, 2007

europython 2007 - hello from Vilnius

Hellos,

I'm sitting at my hostel in Vilnius preparing for my presentations.

I've managed to get wireless internet working from windows here, but for some reason my linux doesn't like this particular access point.

After a marathon 38ish hour journey I arrived at almost midnight, then fell asleep. I spent half the day today wandering around the old town looking at things.

I'm looking forward to when the conference starts. Should be good to meet everyone, and see what everyone is up to.

Here's some pictures from my adventures in old town today...







Sunday, July 01, 2007

Rugs

I have been working on a virtual lounge room for www.rugsonline.com.au - an online rug shop. The owner (David), also has a real shop a few blocks from me, and is a young guy who's got good ideas about websites.

This is a flashy website where you can select a rug, some furniture, and some paintings by a local artist. So you can kind of see how the rug might look in a room of your own. I guess kind of like an Ikea catalogue where you can change the items in the photo.

You can also change the colour of the walls, the shadows of the room, and the type of floor under the rug.


The front part was made with flash - with a lot of action script.

One of the hard parts was doing a 3D transform of the photo of the rug. So it sat in the room with the correct perspective. All of the photos were taken over head, so the transform needed to be done to make the rugs look ok in the scene. Since all of the photos were already taken, retaking the photos from a different angle wasn't practical.

I had to also code some functions to get rid of white parts around the edges of some images. I first coded it in php, which worked ok after caching the results - even if a little slow. Unfortunately I couldn't get the texture mapping code to use the alpha transparency part of the image. So I had to recode it in action script - which also was initially slow. So I had to use a built in function which was much faster (probably written in C) - but with not quite as good quality. You have to make trade offs sometimes right?

The bitmap functions in flash 8 are pretty good. Before flash 8 they were very limited. You can do many things which are too slow for actionscript by itself - by using the built in functions. However sometimes the documentation is lacking. I don't know if adobe-flash actively seeks feedback from people using their APIs. The quality of the documentation is quite good - and consistent. However common problems people had with the APIs are not addressed. It's like a release and forget style of documentation. Or maybe they're just busy! Since many of their APIs do get better with every release.

A better method for documentation is to look at problems people are having, and then either fix the API, or improve the documentation until people stop asking those questions. It's a good method we try and use with the pygame mailing list, and website documentation. If there are common problems - we try and document them, or fix the api so people don't have those problems or questions. It's better to fix it so people don't have to look at the documentation in the first place - but some times that's not possible. There's still some recurring questions/problems people have with pygame of course! There's also over 100 doc comments which need to be addressed, and folded back into the documentation.

Adobe do have user comments on their website though. However not many people use them, since people use the inbuilt flash documentation - which doesn't download the user comments from the website. Also you can't make user comments from inside flash. So it's no where near as helpful as the php documentation, or the pygame documentation doc comments. For the amount of people using flash, there should be FAR more doc comments than there is. So I think it's a user interface and work flow issue. Also maybe adobe not embracing contributions from people using their software - as best they could.

Not that pygame, or php allow you to make doc comments from within php, or pygame. But mostly people look at the documentation on the web. Maybe we could build a doc comment function into pygame itself? Since people use help(pygame.something) a lot, or see the doc strings in their editor. Maybe a pygame.add_doc_comment(pygame.somefunction) function, which uploads comments to the website ;)

Doc comments on websites are like bug reports - or patches. They are a very valuable source of information when trying to improve something.




Oops... I started ranting about documentation. Back to the virtual lounge room...

The flash front end integrates with a database which is shared with the shop part of the website. So the same rug descriptions, and photos are all used - and controlled from a management interface. David (the owner of rugs online) can change which couches, floors, and paintings are displayed. I made the database for the rest of the shop first, but it wasn't hard getting flash to use the same database. Flash has a number of different ways it can request information from websites - so that part wasn't very hard. Flash has been doing AJAXy style communication with websites since flash 4. I used php to communicate with flash, but it's really easy to integrate with other languages too. You can talk to flash a number of ways, but the simplest methods are to use urlencoding, or xml encoding. Almost every language used for making websites can do either urlencoding or xml encoding - often both. You can also use json, xmlrpc etc - or even sockets.

It would have been possible to do this virtual lounge room with javascript using the Canvas tag using modern browsers, but that's not what was chosen.

Have a play - it’s kinda fun making the images zoom across the screen: virtual lounge room.




Melbourne Web Developer Written by a Melbourne freelance web developer. Available for your projects - php, mysql, e commerce, javascript, CMS, css, flash, actionscript, python, games, postgresql, xml.