RFC: Leaky abstractions for HTTP encoding, and a better high level abstraction.
So I was spinning some webs in late 1884, like many others in this time of industry. Everyone is doing it these days(spinning webs). With the lantern hanging over my carbon fibre 100% macrobiotic desk -- the soft flickering light cast a long shadow through my webs.
I stared into the webs, and something struck me!! It was a blow to my head brain.
At some point most web code decided to use a dictionary for GET and POST variables.
Like a leaky boat, a leaky abstraction is bareable, but also annoying. I hate leaky boats so mostly get around on horse back or by horse and carriage. I also enjoy a brisk walk through the country side in the morning, and I love to scuttle through dark alleyways at night in the exciting colourful parts of the townships slums.
[Editors note*1: my friend wants me to try out one of his chain powered Pull Bicycles. I told him he is crazy and that his Pull bicycles will never catch on. Saying that they do look like fun, especially the ones that fly.]
What abstraction do they use for GET and POST variables?
Sometimes it is a dictionary with unique decoded keys, single decoded values, with no ordering of the keys.
Most web codes for languages including python, php, and other vices get http wrong in this regard.
They often lose these useful properties of the encoding:
ordering. duplication. streaming. raw encoding.
A generator/iterator of tuples of ((key,value), (raw_key, raw_value)) is the nicest high level abstraction I can think of. You don't lose any of those properties.
...
... **.... .*. . . *##. . ..........
...
Sorry for the mess. There was an ink blot in my quill -- so I had to fix it to write more paper words.
Now with the new abstraction you can build more leaky abstractions on top if you like. WHATEVA. You can use your hash if you like too, but maybe there is a better way than using a dictionary that isn't leaky.
Say I would like to inspect this POST:
stop_the_war_now_you_bastards=4&blabla=asdf...69MB_of_data_here
Now by putting everything into your dictionary(hash, or array) you have to read over all of the 69MB of data.
But what if I just wanted to get the value for the first 'stop_the_war_now_you_bastards' variable? Using an iterator or a generator I wouldn't need to process all of that 69MB of data, and everyone could just live in peace. No one likes to give 69MB to someone if there is no need to send 69MB... it's just a waste of bandwidth on the intarwebs.
The problem is that http encoding is not always correct. Sometimes you need to look at the raw encoded version. Like how sometimes different clients/servers encode/decode '+'. This can cause subtle bugs in some cases (like when you are comparing things). I won't explain further... you'll just have to believe me... like how you believe the news.
Unfortunately keeping the raw encoding around with the decoded version wastes memory. Wasting memory leaves a yucky taste in my mouth. So laziness comes to the rescue again!!! Lazily decode your variables. Hey, no point decoding that 69MB if you don't need it eh?
You are allowed to put any number of different variables in the same request.
So this means ?a=3&b=2&a=4 is valid.
But what does GET['a'] give you in most web codes? Are you getting 3 or 4? This is why you need to figure into your abstraction both ordering and duplication.
*1 The editor and I are the same person. Did you read the editors note words with the same internal voice as the rest of these words? Is the editors voice a deep manly voice, or perhaps a mousey womans voice? Note that the first chain powered bicycles came out later in 1885, a year after my friend was talking about starting a company up to make them. I think someone heard about his designs one night at the pub. oh well.
I stared into the webs, and something struck me!! It was a blow to my head brain.
Most people in 1884 have made the wrong abstraction for HTTP
At some point most web code decided to use a dictionary for GET and POST variables.
Like a leaky boat, a leaky abstraction is bareable, but also annoying. I hate leaky boats so mostly get around on horse back or by horse and carriage. I also enjoy a brisk walk through the country side in the morning, and I love to scuttle through dark alleyways at night in the exciting colourful parts of the townships slums.
[Editors note*1: my friend wants me to try out one of his chain powered Pull Bicycles. I told him he is crazy and that his Pull bicycles will never catch on. Saying that they do look like fun, especially the ones that fly.]
Now I like hash as much as the next person... but too much hash is not good.
What abstraction do they use for GET and POST variables?
Sometimes it is a dictionary with unique decoded keys, single decoded values, with no ordering of the keys.
Most web codes for languages including python, php, and other vices get http wrong in this regard.
They often lose these useful properties of the encoding:
A generator/iterator of tuples of ((key,value), (raw_key, raw_value)) is the nicest high level abstraction I can think of. You don't lose any of those properties.
...
... **.... .*. . . *##. . ..........
...
Sorry for the mess. There was an ink blot in my quill -- so I had to fix it to write more paper words.
Now with the new abstraction you can build more leaky abstractions on top if you like. WHATEVA. You can use your hash if you like too, but maybe there is a better way than using a dictionary that isn't leaky.
Using hash is like a bad DOM. You have to use it all before you can get anything out.
Say I would like to inspect this POST:
stop_the_war_now_you_bastards=4&blabla=asdf...69MB_of_data_here
Now by putting everything into your dictionary(hash, or array) you have to read over all of the 69MB of data.
But what if I just wanted to get the value for the first 'stop_the_war_now_you_bastards' variable? Using an iterator or a generator I wouldn't need to process all of that 69MB of data, and everyone could just live in peace. No one likes to give 69MB to someone if there is no need to send 69MB... it's just a waste of bandwidth on the intarwebs.
raw encoding is needed sometimes
The problem is that http encoding is not always correct. Sometimes you need to look at the raw encoded version. Like how sometimes different clients/servers encode/decode '+'. This can cause subtle bugs in some cases (like when you are comparing things). I won't explain further... you'll just have to believe me... like how you believe the news.
Unfortunately keeping the raw encoding around with the decoded version wastes memory. Wasting memory leaves a yucky taste in my mouth. So laziness comes to the rescue again!!! Lazily decode your variables. Hey, no point decoding that 69MB if you don't need it eh?
Ordering, and duplication of variables
You are allowed to put any number of different variables in the same request.
So this means ?a=3&b=2&a=4 is valid.
But what does GET['a'] give you in most web codes? Are you getting 3 or 4? This is why you need to figure into your abstraction both ordering and duplication.
*1 The editor and I are the same person. Did you read the editors note words with the same internal voice as the rest of these words? Is the editors voice a deep manly voice, or perhaps a mousey womans voice? Note that the first chain powered bicycles came out later in 1885, a year after my friend was talking about starting a company up to make them. I think someone heard about his designs one night at the pub. oh well.
Comments