Tuesday, September 29, 2009

Spam detection on websites?

Assume you have a user content site - or you're using software that can somehow get spam links inserted into it.

How do you find out if your website has spam put on it?

It seems a common enough problem these days... people putting spam links on websites. Surely there must be a service or piece of software to detect such a thing?

I can think of a few ways to go about writing one fairly easily (using existing spam detection tools... but applying them to a spiders crawl of your website). It would be much nicer if there's already a tool which does such a thing though.

Saturday, September 26, 2009

Alsa midi, timidity, fluidsynth and jack.

If you don't have a midi output on linux(cause your laptop has crappy audio hardware) you can use timidity or fluidsynth to emulate it.
timidity -iA -B2,8 -Os -EFreverb=0

Well, this piece of html has a bunch of incantations for using timidity on linux... and also gives insight into how to use alsa midi tools.

Like listing midi ports, and connection midi ports with these two commands:
$ pmidi -l
Port Client name Port name
14:0 Midi Through Midi Through Port-0
20:0 USB Axiom 25 USB Axiom 25 MIDI 1
128:0 TiMidity TiMidity port 0
128:1 TiMidity TiMidity port 1
128:2 TiMidity TiMidity port 2
128:3 TiMidity TiMidity port 3


To connect the midi input from my usb Axiom 25 keyboard to the timidity synth the aconnect is the command to use.
aconnect 20:0 128:0

The AlsaMidiOverview has more information on things.

#remove all connections...
$ aconnect -x

# list all the connections(without using pmidi)
$ aconnect -o

# a gui for connections
# aconnectgui



Another synth that can be driven by midi is fluidsynth. qsynth is the graphical interface for fluidsynth which makes it easier to tweak. You can use it in pretty much the same way as timidity. It opens up a port (which you can list with pmidi -l), and then connect it to your keyboard with aconnect. fluidsynth is probably a bit nicer than timidity... and you can use soundfonts with it. Heaps of free sound fonts are available from resonance.org and hammersound.net.

This plugin is *very* useful:
http://alsa.opensrc.org/index.php/Jack_(plugin)

It allows all your alsa using programs to be routed through your jack server. This means you can use all of your normal audio programs with low latency, good mixing and synchonised audio - even over the network (with netjack).

Ubuntu does not have the alsa jack plugin included for some brain dead reason... even though debian has had it packaged for a year or so. However building from source is simple. http://www.alsa-project.org/main/index.php/Download (./configure && make && make install). I've gone back to removing pulse audio, as this system works very nicely for me.

Here's my old jackd script, I put in a ~/bin/jack_mine and start with screen at boot.
jackd -R -P 70 -d alsa -p 256 -n 3 -r 44100

I can now use various synths, samplers, and effects racks from my python scripts.



Jamin is a mastering program with a eq, compressors etc. I don't think I'll need it for live work. However it might be useful for lots of instruments. I don't see any reason why it couldn't be controlled by another person with a midi controller.

Next up I need to see if multiple sound cards can work... unfortunately that apparently doesn't work the best. You can create a virtual sound card from multiple sound cards, but they have trouble syncing. It's funny that it's currently easier to sync sound cards on multiple machines than on the same machine... from what I know so far.

For my use I don't think sync will matter too much... that is listening with headphones with one card, and outputting to sound system with another channel. My inbuilt sound card has two output lines already, and one line input(which can also be used for output).

Dell inspirion 1525... the channels are:
1(l) - 2(r) - first headphone plug, or speakers if headphone not in plug 0.
5(r) - 6(l) - second headphone plug.
3(l) - 4(r) - third headphone plug.
7(x) - 8(x) - unused... (not soldered on?)

So looks like 6 usable channels... nice for a crappy cheap latop :) This is much nicer than what windows vista allows me to do, and also way nicer than the pulseaudio/gnome combination. I've used this setup to output 6 channel audio with various different playback libraries including pygame.

Friday, September 25, 2009

screen for ghetto servers and startup scripts.

GNU screen is a good little tool for server administration, or running things on your own remote machines. It's even good for running things locally.

I hope this is useful for people who want to run scripts every time they login, or reboot... and who need interactive access to those scripts. Or useful for those people who are already using screen, but would like to make their setup a bit better:
  • scripting sessions, rather than doing them manually at each login or reboot,
  • finding your screen sessions more easily.
  • restarting scripts at reboot,
  • monitoring,
  • logging,
  • resource control


Running things as daemons is cool... but if you'd also like interactive control occasionally, running things with screen is useful.

Most servers have screen, watch and crontab(osx is lacking watch though) - including most linux distros, *bsd, osx, windows(with cygwin). Most OSes also have their own style init scripts(scripts to run things at boot or logon). So this screen, watch, crontab combination is ok if you want to use multiple different types of computers - but reuse your scripts. If you want something robust, and good this isn't for you.


Scripting sessions
You can make a shell script to script your screen sessions. So you don't need to do them manually each time you login. This can save you a *lot* of time, and can make you less afraid of a reboot :).

-- /home/me/bin/my_screens.sh --
#!/bin/bash

# start up your 'app1' to be restarted 2 seconds after it dies, every time it dies.
screen -d -m -S app1 -t "my web app one descriptive name" /opt/local/bin/watch -n 2 /home/me/someapp1/run_app.py > /home/me/logs/app1stdout >> /home/me/logs/app1stderr

# start running an application.
screen -d -m -S proxy -t "proxy server" /home/me/someapp2/run_app.py

# connect to my my server setup with ssh keys.
screen -d -m -S server1 -t "my server" ssh myserver.example.com

# monitoring run time test of my app1 every 67 seconds
screen -d -m -S monitor1 -t "my monitor script" /opt/local/bin/watch -n 67 /home/me/bin/app1_monitor.py



Finding screen sessions easily: you can run these commands:
screen -d -r app1
screen -d -r server1
screen -d -r proxy

This will let you connect to that screen from any shell. It detatches any session that's open. This is good as you can easily remember things with just the short names you choose, eg 'app1' 'proxy' etc. Normally you have to do 'screen -ls' look at the output, find the session you want to connect to, then finally 'screen -r random_sessionname'.

At Reboot: Then you can add this to your crontab with crontab -e (be careful not to use -r!!! which is right next to e on a qwerty keyboard). Or your crontab web interface with your hosting account( for example cpanel/whm, webmin, plex etc). Note that each user has a crontab. So to run your apps as different users, just login and change each ones crontab(or use sudo, or 'su username -l -c' from the roots crontab).
@reboot /home/me/bin/my_screens.sh
# this line below can be used from a root account to run as the user 'me'.
@reboot su me -l -c /home/me/bin/my_screens.sh


Restarting: Of course your app might crash or something... Look at the first one... that one has a watch -n 2... which means "run this process, wait for it to finish, then start it again after 2 seconds." It's kind of like a ghetto daemontools. Not as good as something like daemontools... but good enough for some purposes.

Monitoring: You can have a separate tool monitoring your scripts if you like... then if your app has frozen, or is overloaded... just send it some term, then kill signals... and watch will start another one up when it dies. Consider each script a 'runtime test'. If the 'runtime test' fails, then kill the app and restart it. The app could be a ssh proxy for your vpn - in which case the 'runtime test' would see if you can ping the network, and if not kill the ssh connection... so it restarts. A webserver runtime test might see if it can do a GET request... if not kill the server. A ghetto monitoring system for sure... but simple.

Logging: It's pretty easy to add some stderr and stdout redirection with > /home/me/logs/app1stdout >> /home/me/logs/app1stderr. This way you can have ghetto logging too.

Resource control: You can use ulimit in your scripts to limit how many resources your server can use. Then if it uses too much, it will die and be restarted in two seconds. Say you think your python web server should never *ever* take up 500MB of ram, then run it from a .sh file, and put ulimit -m 500000 before it. See ulimit -a for a list of things you can limit. Ghetto-quick resource control. Similarly you can use nice and ionice to make things behave more nicely :).

Debugging: screen doesn't give you an error message with -d -m. So you can either look in your logs or try out your command first with "screen cmd". eg "screen python". You can try out your @reboot command, not just by rebooting but by setting it to run in 10 minutes from now. See cron help pages for how to do that. You might want to use a sh -l in front of your command so it's a 'login shell'. This will setup your paths and environment variables like your login. Or setup explicitly for each app/script which paths and environment variables they need.



Again this isn't for everyone... but some ideas here might be useful for your own ghetto screen usage for servers and startup scripts.

Wednesday, September 23, 2009

Linux sound is getting better.

No I'm not talking about the free software song sung by Richard Stallman(very funny, but in a low quality .au format). Or the pronouciation of Linus and linux.

To start on this long-journey-of-a-rambling-diatribe-of-words, there's two good audio patches in the SDL bug tracker for the upcoming SDL 1.2.14 release.

One patch is for the pulse audio driver, and the other is for the alsa backend. These solve some of the high latency or scratchy sound issues some have.

That's right a new SDL release very soon... it's over a year since the last 1.2.13 release, and it seems like forever since the SDL 1.3 series begun. Most new development has been happening on the SDL 1.3 tree in the last year... so the 1.2 releases have slowed to an almost stop.

There's a good article on a x-platform atomic operation API for SDL http://www.thegrumpyprogrammer.com/node/16. That's one of the features that's been evolving over a few years, and is being implemented in svn.

In python terms, SDL 1.3 is like python3000. A refinement, and a promise to break backwards compatibility with the ABI. Note, not so much the API... the API is fairly backwards compatible... but some things must change. Also SDL 1.3 has lots of cool features I'm looking forward to.

Even though the SDL 1.3 tree is improving, and many people are now switching over to it, the SDL 1.2 series has a lot of life left in it.

So the SDL 1.2.14 release is all about fixing bugs, and applying patches. There's a lot of bug reports, and also a lot of patches in the main SDL bug tracker.

With free software and open source there is the mantra 'release early, release often' (the other mantra is 'release early, then abandon on sourceforge'). A stable version, that's used by people needs plenty of bug fixes, and people send in patches. Whereas a development version doesn't get the same kind of attention as released-and-used-by-people software. Many of these fixes done on the stable 1.2 tree will also be ported to the 1.3 tree too.

Now enough SDL 1.3 love... what else is improving in linux audio world? now for something completely different.

Well, pulse audio is frantically making releases. Three releases in september... so far, and five for 2009. Pulse and jack are also playing nicer together now(well, not packaged in ubuntu yet... grrrr, see bugs 198048 and 109659 this is critical for allowing many high end audio programs to work along side the 'beep' sound your terminal makes. Hopefully they'll get a good desktop architect (sound experience) from their job posting to fix things).

Jack is the low latency, synchronised audio system used by many professional audio programs on linux. Think unix pipes applied to audio, but in a way that works with the audio latency requirements. Both jack, and pulse audio have been ported to lots of other operating systems these days. Which can only be good for them getting more developer support... and making the linux audio world better along with it. You can see in their change logs, and repository commits that developers on different platforms other than linux are contributing quite a lot.

Even trusty old Open Sound System(OSS) has gotten better. OSS was removed from the kernel, replaced with alsa a while ago... but OSS kept going anyway. OSS4 has lots of things fixed compared to OSSv3 that most people remember using a long time ago. Including a fast transparent high quality in kernel mixer(good for crappy cards that only support one program outputing sound at a time). It also has a "record-what-you-hear" feature... for recording what is coming out of your sound card (a feature MS disabled in vista... booo!) The commercial version is now available as open source with a mecurial repository too! OSS is also quite x-platform.

What about sound applications?

The drum machine Hydrogen got a new release for the first time in three years... and this time it's not just linux only too.



A great DJ program called mixxx is another high quality multiplatform audio program. It's probably my most favourite audio program... just because it's so fun. You can even hook up real vinyl decks to it for scratch control(and midi ones). Unfortunately you can't pipe music in from other audio programs or in from a sound card... so you can't use the vinyl decks in that way. You have to use specially encoded records which the program then reads to figure out where the record is moving. The latest version features javascript scripting of midi and other parts of the program.

(go on, download it and become a dj ninja)

Guitarix is an amp emulator... it tries to sounds like various vintage guitar amps. Pretty fun to play around with.

(plenty of knobs to play with)

Especially in combination with many of the effect plugins available through the hundreds of LADSPA(guitarix is a LADSPA plugin too) and LV2 plugins. Other plugins available include vocoders and all sorts of weirdness.

Lash uptake has been good, and now lash talks dbus... letting it mix in nicely with the rest of the linux desktop ecosystem. Lash is a session system for linux audio programs. It lets you open your 12 different linux audio programs(remember audio in linux is like pipes... pipes with audio running through them instead of water... let's call them wires... but digital... maybe fiber optics... but not using light... ok whatever... why am I explaining it this way?... you're not five... too many dots. sorry.) and save your settings for later. The alternative is to each time open your 12 programs, set up the wiring between six of them, start messing around and finally... 2 hours later... realise you were supposed to be setting them up in a certain way rather than making stupid minimal beep noises to a house vocals mixed with a recording of a fart noise - filtered down to retro 8bit samples. Without lash, you couldn't save that brilliant setup and play with it later.

Audacity, the simple(yet advanced) audio editing work horse is moving towards a 2.0 by the end of the year. Audacity has been around for ages, and has been multi platform for ages. The 1.3 series seems to have been going on forever... but they do regular beta releases, and nightly builds. So it's pretty easy to get fresh versions. Do proper releases matter that much when new releases are pushed out every day? I guess so.


LiVES reached 1.0 earlier in the year after a long time in development... (since 2002!). LiVES is a video editor(which includes audio). It's actually quite useful for editing video! The other cool part of it, is that it's a VJ tool. So you can do those awesome projections you saw the last time you were in a club rushing around the place. You can control much of LiVES with midi too, which is mice.

(Make home movies of your loved ones. Like grandpa Nelson here.)

In fact lots of audio programs available for linux can be controlled by midi. Which is mice for me since you can easily do midi with python and pygame.midi.

Speaking of things midi and pythony... The vj program freej now has python wrappers! There are even five tutorials which use pygame. Unfortunately this is not in release form yet... but all this good stuff happening in the git repos.
(you too can make video art like this with freej... All you need is a crazy mask)

Both LiVES and freej use the frei0r video plugins. Which has nothing to do with linux sound getting better really. So there. Jerk(why did this guy even write this? I wish I didn't waste my time reading it.).

Comments? Important typos I should fix? Interesting linux audio things you're doing? Want to tell me how your tomatoes are growing in your garden? Gott a picture of your cat you'd like to share with me?

Tuesday, September 22, 2009

Where did the 'new' module go in python 3?

Anyone know where the 'new' module went in python 3?

2to3 can't seem to find 'new', and I can't find anywhere with my favourite search engine either... filed bug at: issue6964.

A complete 2to3 tool should know about all modules that are missing at least. It needs to actually know what to do with those modules, but should be able to at least tell you which modules are missing. I'm not sure how to get a complete top level module list sanely... I guess by scanning the libs directory of python.

Or maybe there is a module to find all python modules?

Each platform would be slightly different of course... and there'd be differences based on configure. Also some modules have probably stopped importing or compiling at all these days.

Then you could just find the intersection and differences with the lovely set module :)
# find the difference between the modules.
top_level_modules_not_in_3 = set(top_level_modules3series) - set(top_level_modules1_2series)


Well maybe the 2to3 tool could work a different way. Instead it could find all the modules it *does know about*, and warn you if it encounters modules it doesn't know about. You can already list fixes with: 2to3-3.1 -l

But what about packages with submodules? It seems hard to pin down exactly what's included with python. Or maybe there is an easy way to find out.

update: it is printed as a warning with python2.6 -3 -c "import new" . The types module is the one to use instead. A reminder to always use the python2.6 -3 to warn you of things the 2to3 tool can not fix. python2.6 -3 is a good thing to do on your current code base for preparation for a python 3 future.

Wednesday, September 16, 2009

py3k(python3) more than one year on - 0.96 % packages supporting py3k.

Python3 was released more than a year ago so far, and the release candidates and beta releases much before then.

How successful has the transition been to python3 so far?

One way to measure that is to look at how many python packages have been ported to py3k. One way to measure what packages are released is to look at the python package index(aka cheeseshop, aka pypi).

Not all packages ported to python3 are listed on pypi, and not all packages are listed on pypi. Also there are many packages which haven't been updated in a long time. However it is still a pretty good way to get a vague idea of how things are going.

73 packages are listed in the python3 section of pypi, and 7568 packages in total. That's 0.96% of python packages having been ported to python3.

Another large project index for python is the pygame.org website. Where there are currently over 2000 projects which use pygame. I think there are 2 projects ported to python3 on there(but I can't find them at the moment). This shows a section of the python community using it in projects. Most of the things listed on pypi are packages - not projects. It's showing what people are using for their projects - not what their libraries support. In a similar way, it could be good to see how many websites are running on top of python3. I think a lot of the people who have ported to python3 aren't really using it for their projects, but have done the porting work as a good will measure towards moving python forward.

Another way to measure the success of the migration, is to pick a random sampling of some popular packages and see if their python3 support exists.

Pygame(yes), Pyopengl(no), pyglet(no), numpy(no), PIL python imaging library(no), cherrypy(yes), Twisted(no), zope.interface(no), buildout(not sure, I think no), setuptools(no, patches available), django(no), plone(no), psyco(no), cython(yes), swig(yes), sqlalchemy(no).

With some packages being used by 1000s or 10,000s of projects, those popular projects hold back the py3k migration significantly. It would seem that some applied efforts to the right projects would help the py3k migration a lot. Perhaps a py3k porting team could be made to help port important libraries to py3k.

How about other python implementations supporting python3 features? None have full python3 support as of yet. For example jython(no), pypy(no), ironpython(no), tinypy(no), python-on-a-chip(no), unladenswallow(no), shedskin(no). However some implementations support some new python3 features.

How about wsgi? wsgi is the python specification for web gateways... that it specifies how different web frameworks, web servers and applications can talk to each other and out to http. The wsgiref module in python3 is somewhat broken, and the amendments for python3 have not made it into a new wsgi spec. However work is being done towards it with a couple of major wsgi users supporting python3(cherrypy and mod_wsgi).

Another question to ask is: 'are many projects planning to support py3k soon? Or have they decided not to work on py3k at all yet?'. It seems many projects have decided not to put in the work yet. At this point, for many projects they don't see enough benefit towards moving to py3k. Or their dependencies have not been ported, and they are waiting on those to be ported before beginning to port themselves.

How well have the python developers themselves developed the support material for people upgrading their code? It looks like the cporting guide is still incomplete and hasn't been updated in a while. However the CPython API using projects have taken up the slack... so there are now a number of extensions for people to look at for guidance. It's possible to make CPython extensions which support both 2.x and 3.x APIs.

There is now a 3to2 script being worked on. This allows projects to write their code in python3 code, and have it translated into python2 code. The python developers realised that having a 2to3 script was backwards in a way - requiring developers to stick with their python 2 code. However, many projects seem to not use the translation script, since it hasn't worked for them. Instead they seem to have either made separate branches, or made their code so that it works in both 2.x and 3.x.

Support for python3 was dropped, and python3.1 is the new python3. However python3 still exists in some distributions (like ubuntu).

So how are the various OS distributions going with their python3 support? The latest version of OSX to be released (snow leopard) uses a version of python2.6.1. Most unix distributions are using python2.6 as their main python at the moment. However most of them have also packaged python3.x as well. So it's fairly easy for people to try out python3 alongside their python2.x installation(s). macports currently has py25(286 ports), py26(206 ports), py30(0 ports... since py3.0 isn't supported by python.org), and py31(4 ports). So for macports, it has 1.9% of py31 packages ported compared to py25. This shows similar percentage to the ratio of ported packages in the pypi index(0.96%).

This is not mirrored by the number of windows downloads from python.org. Python2.6.2 windows installer had 786400 downloads, python2.5.4(104291), python 3.1(241363) 3.1.1(214871) for a total of 456234 for 3.x. This is around 58% comparing 2.6 and (3.0+3.1). Strangely about the same amount of people are downloading 3.0 as 3.1 - even though it states that 3.1 is the new py3k and 3.0 is not supported anymore. This is just windows download counts for August... if you compare it to most unix distributions, they almost all come with python2.5 or python2.6.

So, is the python3 migration going along swimmingly? Or has it failed to reach its goals(what were its goals if any)? What can we do to help? Should we even help at this point? comments?

Friday, September 11, 2009

The many JIT projects... Parrot plans to ditch its own JIT and move towards one using LLVM.

It seems LLVM has gained another user, in the parrot multi language VM project. They plan to ditch their current JIT implementation and start using LLVM.

Full details are on their jit wiki planning page. There is more of a discussion on the parrot developer Andrew Whitworths blog here and here.

Parrot aligns very nicely with the LLVM project which itself is attempting to be used by many language projects.

Along with the unladen swallow project(python using LLVM for JIT), this brings other dynamic languages in contact with LLVM. This can only mean good things for dynamically typed languages on top of LLVM.

Mac ruby is another project switching to LLVM - they have been working on it since march.

Rubinius seems to be another ruby implementation mostly written in ruby, and the rest in C++ with LLVM. It even supports C API modules written for the main ruby implementation. 'Rubinius is already faster than MRI on micro-benchmarks but is often slower than MRI running applications'.


Hopefully this will help LLVM become more portable, and faster at creating code... as well as being able to create larger amounts of code(LLVM only supports generating up to 16MB of code currently, but that limit is being worked on).

It's yet to be proven that a major dynamically typed language can be sped up nicely with LLVM, but these projects using it should help it get there.

luaijt for lua and psyco V21 for python are both successful JIT projects for dynamic languages. However, both are limited in their platform support - only supporting 32bit x86 platforms. Other successful JITd dynamically typed languages include the many javascript and action script implementations... including V8(x86 32 and arm) and tracemonkey (which uses nanojit which supports many backends: arm, x86 64/32, sparc, ppc etc).

Luajit is compared to lua llvm here and here. It seems lua jit2 is faster than luallvm, and the posts explain why. It also points out that LJ2 is faster than C speeds for some things.


pypy decided not to use LLVM too, and has embarked on making their own jit system. At one point there was code in the pypy svn repository to support LLVM, but it was removed a while ago. One comment in the past was that LLVM was too slow at generating code, and that it was a very large dependency. LLVM is C++ code that takes quite a while to compile itself, and the library is quite large.

Despite these downsides of LLVM, it can generate very efficient code. LLVM is often comparable to the fastest generated code for the C language. This is one reason why the unladed swallow project has chosen it. The unladen swallow projects goal is to optimize long running server processes... so it doesn't care that much about generating fast math code, or in taking its time to generate native code. This makes sense, considering that it is a google sponsored project.

Another interesting project for python is the corepy project. It's a run time assembler for python. One project corepy is used for is to accelerate numpy operations - using SSE and multiple cores - so even the numpy written in C can go much faster with the corepy accelerated version. The numcorepy blog lists the results of the project. Including a 200,000 particle particle system done with numcorepy on the cpu(s).

In the same vein of accelerating numpy code - the pygpu, and pycuda projects make it possible to use GPU accelerated versions of numpy functions. This allows python code to run way faster than is possible on any available CPU. These projects generate shader code in C-like languages to run on the GPU. So in a way they are also JIT libraries.

liborc is a runtime cross platform assembler which supports many vector operations. Unlike many of the code generators that do not support vector operations - liborc does. It's a replacement for liboil and is used for gstreamer and dirac multimedia libraries.

Inferno is a virtual machine project which includes a JIT for many platforms.


1psyco v2 doesn't seem to have a web page yet, just a svn(not the old psyco v1 svn on source forge).

updates: from the comments... added note about mac ruby using llvm, the inferno vm, and the rubinius ruby using llvm. Added link to numcorepy project, and a link to pypy. Added some links to a comparison of lua llvm and luajit, and a link to lua llvm.

Thursday, September 10, 2009

Linux 2.6.31 released... the good bits.

The new linux kernel has been released. Here are the human readable changes.

Here's the cool stuff (the links in the original article were broken, so I've fixed the links here):
  1. USB 3 support
  2. CUSE (character devices in userspace) and OSS Proxy
  3. Improve desktop interactivity under memory pressure
  4. ATI Radeon Kernel Mode Setting support
  5. Performance Counters
  6. IEEE 802.15.4 Low-Rate Wireless Personal Area Networks support
  7. Gcov support
  8. Kmemcheck
  9. Kmemleak
  10. Fsnotify
  11. Preliminary NFS 4.1 client support
  12. Context Readahead algorithm and mmap readhead improvements


For me the performance counters will be the most useful thing. Also being able to use and write user space character devices is cool(especially for audio). USB3 support is awesome, but not useful right now... since there isn't even much hardware out yet!


More info on what that low power wireless support is, can be found on the wikipedia: IEEE_802.15.4-2006.

Tuesday, September 08, 2009

Dependency analysis, and a digression onto mock ducks.

Dependency analysis allows all sorts of fun things in software.

It can be used to reduce software defects. How? Say you have 10 components, and over time they may bit rot, or change. By reducing a dependency on as many of the components as possible, means you have less of a chance of encountering a bug. It also means you have exponentially less code to update or re-factor. Another reason, is that combining multiple components together requires more testing... exponentially more testing(which is why unit-tests are popular).

Performance can be improved with dependency analysis too. Not just by reducing the amount of code run. If code doesn't have dependencies, it can be run in separation. This is where some object oriented design is missing something. When they have methods which change the state of an object internally - then they have a dependency. At this stage it makes task, and data level parallelism harder.

Compare these two calls:
map(o.meth, data)
map(f, data)

If you had a dependency analysis module you could check what dependencies f, and o.meth had. Then you could safely distribute them, and not require locking or anything else of that kind. Failing to have this available to you, you can make sure they use locking, or atomic operations... or you can manually make sure they do not have any dependencies.

Unfortunately method calls often change the state of an object, even if some times they don't need to. Say half way inside a method, it assigns something to self? Then you've changed the state of the object, and your code is not safe for distribution.

What language features, or design ideas encourage the reduction of dependencies? Functional programming is one. Unit testing is another. Good packaging, and module systems are a third.

Duck typing is another one that can help reduce dependencies. However, it has problems too. Say you have a class like this:

class Neighbor:
def use_your_duck(self, duck):
self.number_of_feathers = duck.number_of_feathers


The issue is that the caller of Neighbor.use_your_duck isn't sure exactly what use_your_duck needs a duck for. Why does Neighbor need to use a whole duck, just to know how many feathers it has? By giving it the whole duck, you've created a dependency on ducks. Each time your neighbor needs to figure out how many feathers are on the duck, you need to give your neighbor the duck. What if instead, you just count the feathers your self, and give that number to the neighbor instead.

What if your duck changes the amount of feathers... if it's important that your neighbor gets an accurate feather count, then they will want access to the duck. Letting your neighbor count the feathers itself, means you have less work to do. This is why it's good to be able to give a reference to a duck... rather than just counting the feathers.

However, if your neighbor moves to Alaska, and you live in Buenos Aires - then you might have a problem sending them a duck every time they want to count the feathers. Now your neighbor has to fly over to pick up the duck, take it to Alaska and count the feathers... or just keep the duck in Alaska. Another option is for you to just tell your ex-Neighbor how many feathers the duck has over the phone. Your neighbor gives you a call, you go off to count the feathers... and call-back.

Or, you could make some mock-duck, and give your neighbor this. You could make it mostly like a duck... well, you could design this mock-duck forever to figure out what your neighbor is doing with your duck every day. So you plant a spy camera in your neighbors basement... and note your neighbor only ever counts the feathers on the duck... never plucks your duck or does anything else to the duck. So it's safe for you to make your mock-duck with a bunch of feathers on it - and be fairly sure your neighbor will not break when borrowing your mock-duck.

Anyway... enough typing about ducks for now.

Wednesday, September 02, 2009

python build bots down... maybe they need a spectacularly adequate build page instead?

Seems the python build bot pages are down. Maybe something simpler is needed instead of build bots? Something that requires less maintenance.


update from svn
compile
run tests if compile completes
upload results to a simple page
(configure(stdout, stderr),
build(stdout, stderr),
test results(stdout, stderr),
binaries).


The beauty of this is that it can be easily de-centralized.

Could even use pypi infrastructure for this now. Each buildbot has a new project setup, which then updates the pypi project each time it builds. Have a special pypi tag(category) for python build bots, so people can easily search for them. To reduce the spam on pypi, just make it mark the releases as hidden... so they don't show up. The results are added to the pypi listing.


Probably only needs one cmd added to the python setup script to do the upload(based on existing pypi code).

No central authority, and very simple. Anyone who wants to run a 'buildbot' can without authority.


Can probably make it all work so the buildbot runner just does a special 'python setup.py buildbot' added to their cron, assuming they are already registered with pypi. So they don't need to setup a special buildbot, just use the existing python source code to do it.


work required:
  • write a script to add a buildbot command. To be added to python source code when finished.
    • uploads the results of the build/test into a .zip file to pypi. The zip contains stderr and stdout of each config, build, test stages. Also contains the resulting binaries. eg, .msi .exe .tar.gz etc.
    • Only update binaries if all tests pass.
    • Updates from svn, and adds the svn revision to the python version.
  • LATER - add pypi category for python build bot results.
  • LATER - write code to parse results from any uploads into that category into one main page with all results listed.
Instead of modifying the python source, this command could be made separately, and then added to python source after it has been proven. Adding it to python source however will mean that anyone can just do a 'python setup.py buildbot'. It turns all of the python developers... and anyone else who feels like helping into a buildbot.

As a bonus people could even use pypi to keep track of python, and not just python modules. It also turns python development into a series of continuous releases, stabalized when the tree decides to pass all the tests.


Something like this pseudo code would work:
 ./configure > config_stdout.txt \
2> config_stderr.txt && \
make > make_stdout.txt 2> make_stderr.txt && \
run_tests.py test_stdout.txt 2> test_stderr.txt && \
upload_build_results.py my_python_buildbot_pypi_page


Anyone else want to finish the script? To add building, and installing into a local directory, and the actual 'upload_build_results.py' to pypi etc.