Ask HN: Experience with Protocol Buffers

  • This has just been posted on HN

    http://msgpack.sourceforge.net/

    Make sure you benchmark simple JSON. It might be enough.

  • I've used this for some projects, with much success. In one case I used it as a data interchange format between a C++ app an some Python utilities. It worked very well for that and the generated API's were easy to work with, albiet a bit bulky.

    In another case, more of an experiment, I used them to serialize game data and sent it using enet. This proved very flexible and easy to change/add things, and the packets were extremely compact.

    Pros:

    * Read/Write access to data from C++ or Python

    * Generated API's were easy to work with

    * Very compact representation

    * Ascii-dump version very useful for debugging

    * More error checking than something like json (i.e. it tells you if you leave out a required field)

    Cons:

    * Adds some build steps, can be more of a headache to maintain (compared to json or something)

    * API can't parse ascii version, bad for config data or other stuff that might want to be human readable (vs. xml or json)

    * Generally requires copying your data into the protobuf struct, and then packing, rather than going straight from your "native" format into a packed buffer.

    * Adds a bit more complexity

    * Not as lightweight as json

    For what you're doing, I would recommend them.

    They're great for "structure" style data, a little weird for array-style. For example, one of the things I was storing was a 4x4 matrix, and I resorted to making a struct with 16 members such as m_00, m_01, etc.. which worked fine and it stored it compactly but was a little weird. I don't think there's a way to have a float[16] or something like that. I could be wrong, maybe there's a better way to do this.

    Generally, these days i use one of three formats. I am very happy to have outgrown xml.

    protobuf -- for hierarchical, nested data, if it needs to be compact and accessed from different languages

    JSON -- for quick and dirty stuff, when format needs to be flexible (or when i need to use javascript)

    GTO -- for large sets of structured data. (www.opengto.org)

  • We were using .NET's WCF messaging system, but wanted a faster/smaller format. http://code.google.com/p/protobuf-net/ let us keep our code the same, while using protobuf for the wire format. Worked quite well.

    Another approach to consider is using a text format (XML, JSON), then running it through fast compression like QuickLZ. This has the benefit of not having to change the program much more than a call to compress/decompress.

  • Protocol Buffers will work fine and their documentation is very clear. These are my findings based on my experience -

    * Serializing data is ok but parsing takes quite a bit of time especially for large requests. (I am talking in milli seconds) * PBs always require a copy from your internal app data to its structures. Couldn't find a way to avoid that. * They have variable length encoding and it might be a good option if your data comprised of large percent of integers. From our experience don't use it if you are sending within your corp network as packing and parsing takes more time compared to savings in amount of data transfer. They might be a good option if you are sending data across slow networks.

    Some of the metrics show that Thrift performs better than PBs. Also Thrift provides options of using different protocols. If Performance is prime criteria JSON + zipping should be a good option. Also they won't have an intermediate step of generating marshaling code.

  • Check out Avro, too. With both Protocol Buffers and Thrift, it's really hard to evolve schemas because you won't be able to read data written with an earlier version of the schema. Avro has the speed of binary while being flexible enough to read older data with later versions of the schema.

  • Protos will work fine. Thrift will also work fine. ________ (insert other binary format) will also work fine.

    As long as there are libraries for the languages you're using it's not a big deal. I'd recommend solving the problem and moving on - in the serialization format wars the real victim is productivity.

  • If your data layout is fairly static, PBs are good.

    What I did for an app was encode a kind of JSON in PBs:

    http://pwpwp.blogspot.com/2009/08/storing-json-as-protocol-b...