Tuesday, September 08, 2009

Dependency analysis, and a digression onto mock ducks.

Dependency analysis allows all sorts of fun things in software.

It can be used to reduce software defects. How? Say you have 10 components, and over time they may bit rot, or change. By reducing a dependency on as many of the components as possible, means you have less of a chance of encountering a bug. It also means you have exponentially less code to update or re-factor. Another reason, is that combining multiple components together requires more testing... exponentially more testing(which is why unit-tests are popular).

Performance can be improved with dependency analysis too. Not just by reducing the amount of code run. If code doesn't have dependencies, it can be run in separation. This is where some object oriented design is missing something. When they have methods which change the state of an object internally - then they have a dependency. At this stage it makes task, and data level parallelism harder.

Compare these two calls:
map(o.meth, data)
map(f, data)

If you had a dependency analysis module you could check what dependencies f, and o.meth had. Then you could safely distribute them, and not require locking or anything else of that kind. Failing to have this available to you, you can make sure they use locking, or atomic operations... or you can manually make sure they do not have any dependencies.

Unfortunately method calls often change the state of an object, even if some times they don't need to. Say half way inside a method, it assigns something to self? Then you've changed the state of the object, and your code is not safe for distribution.

What language features, or design ideas encourage the reduction of dependencies? Functional programming is one. Unit testing is another. Good packaging, and module systems are a third.

Duck typing is another one that can help reduce dependencies. However, it has problems too. Say you have a class like this:

class Neighbor:
def use_your_duck(self, duck):
self.number_of_feathers = duck.number_of_feathers

The issue is that the caller of Neighbor.use_your_duck isn't sure exactly what use_your_duck needs a duck for. Why does Neighbor need to use a whole duck, just to know how many feathers it has? By giving it the whole duck, you've created a dependency on ducks. Each time your neighbor needs to figure out how many feathers are on the duck, you need to give your neighbor the duck. What if instead, you just count the feathers your self, and give that number to the neighbor instead.

What if your duck changes the amount of feathers... if it's important that your neighbor gets an accurate feather count, then they will want access to the duck. Letting your neighbor count the feathers itself, means you have less work to do. This is why it's good to be able to give a reference to a duck... rather than just counting the feathers.

However, if your neighbor moves to Alaska, and you live in Buenos Aires - then you might have a problem sending them a duck every time they want to count the feathers. Now your neighbor has to fly over to pick up the duck, take it to Alaska and count the feathers... or just keep the duck in Alaska. Another option is for you to just tell your ex-Neighbor how many feathers the duck has over the phone. Your neighbor gives you a call, you go off to count the feathers... and call-back.

Or, you could make some mock-duck, and give your neighbor this. You could make it mostly like a duck... well, you could design this mock-duck forever to figure out what your neighbor is doing with your duck every day. So you plant a spy camera in your neighbors basement... and note your neighbor only ever counts the feathers on the duck... never plucks your duck or does anything else to the duck. So it's safe for you to make your mock-duck with a bunch of feathers on it - and be fairly sure your neighbor will not break when borrowing your mock-duck.

Anyway... enough typing about ducks for now.

1 comment:

nnis said...

The last part sounds like the difference between Data-structured and Data coupling.

class Neighbor:
....def use_your_duck(self, duck):
........self.number_of_feathers = duck.number_of_feathers


class Neighbor:
....def use_your_duck(self, num_of_feathers):
........self.number_of_feathers = num_of_feathers

Depends on how much about the duck you really need to know. If it is just the numbers of feathers just send that. If you have a lot more questions, it might be easier to just ship the whole duck over or locate the one processing ducks closer to the one producing them.