Tuesday, July 08, 2008

Protocol Buffers - is it really faster than xml?

It seems google is claiming their protocol buffers are faster than xml... without any proof.

Consider AsmXml, which can process xml at over 200MB/s on old machines.

The protocol buffers from google also generate wrappers for different languages, and other nice things. But for loading structures into and out of memory, xml can be very fast.

Before claiming things like that, I think proof in the form of benchmarks are needed.

I don't doubt they thought that xml was slower, since many implementations are slower. Maybe xml is slower, but there is no proof yet. Also I'm sure the other nice features of protocol buffers make them perfectly suited for their task.

Url encoding could have been used nicely too.

6 comments:

shubham said...

I agree with you illume that things stated without proof are not to be believed easily. Why dont you conduct some tests and experiments to compare both xml and protocol buffers to see which is faster. As for me, i havent worked on xml, but found protocol buffers very convinient. It will be pretty enlightening to see the results of such experiments.

illume said...

@shubham
unfortunately I've not enough time for such science... I'll have to leave adding science to the protocol buffers claims to its authors.

It definitely would be interesting to compare the speed of AsmXml to protocol buffers.

Ian said...

my first thought was it was 'lighter' on the network.

David said...

The encoding is a very simple binary data format. See the documentation here: http://code.google.com/apis/protocolbuffers/docs/encoding.html

Basically, data is encoded as a tag to identify the field, followed by the data type, followed by an optional data length for fields like strings, followed by the data.

At runtime the data is represented in your program by a generated object that can parse itself with very simple logic. It just scans through the binary fields until it recognizes a tag number and then it does a very simple copy/decode and stores the value directly in a member field.

Since this is an extremely simple and easy to serialize and parse data format (it's hard to imagine it getting much simpler!), it would be very difficult for structured XML to compete speed-wise given all the tokenizing and string -> data conversions it has to do. An XML representation of the same data would also be significantly larger, which would slow down communication between machines.

Mike K said...

On pure parse algorithm alone, XML needs to read look at every character to be sure it's not the start of a new tag. If you know the number of bytes in a chunk, you can omit the test and merely add the offset to the next chunk. So weather the implementation is faster or not, algorithmically it is.

Not to mention offlining the idea of schema, generating native code for you.

kunal said...

Ahh if this indeed is a bad joke . Folks at google should know markup a little better . If fast as their only concern why not try binary XML .
Maybe there schema is not upto the mark .