Commanding infinite streaming storage with Apache Kafka and Pyrostore

  • I like it. Personally, one of my biggest problems with Kafka is its operational complexity. I’ve just had one too many instances of Kafka brokers getting stuck while doing an upgrade and things like that.

    Additionally, I would really, really like to be able to use it as an Event Store, easily accessible by anyone in the org with infinite data retention. I know Kafka kind-of sort-of provides this functionality, but it doesn’t work in practice.

    This appears to be a solution to this problem. Will be interesting to see whether it gains traction.

  • Everything Distributed Masonry does is very interesting. Wish I had more excuses to use your stuff at work.

    Storing all data forever in a single source of truth is awesome until regulation like GDPR comes along. Do you have plans to support excision or is your guidance on personal data to avoid putting it into a system like Kafka/Pyrostore?

  • Integration with Azure Managed Disks : Due to the ingestion heavy nature, the disks attached to the nodes on the cluster often result as the bottleneck. Traditionally, to scale this bottleneck, more nodes need to be added. Azure Managed Disks is a technology that provides cheaper, scalable disks that are a fraction of the cost of a node. HDInsight Kafka has integrated with these disks to provide upto 16 TB/node instead of the traditional 1 TB. This results in an exponentially higher scale, while reducing costs in the inverse, exponential manner.

    https://azure.microsoft.com/en-us/services/hdinsight/apache-...

    Is this same approach as pyro ?

  • This is what Apache Pulsar (https://pulsar.incubator.apache.org/) already provides - infinite streaming storage, with simple/flexible messaging streaming API and kafka compatible

  • Very interesting and reminds me of Pravega (http://pravega.io/). Seems like unbounded streams will be the next big step in streaming technology.

    https://www.youtube.com/watch?v=cMrTRJjwWys

  • These are the guys behind www.onyxplatform.org. That alone tells me this is legit stuff. We will give it a try.

  • > tradeoffs in our operation of Kafka have lossy effects on stream-ability. Balancing costs and operational feasibility, we ask Kafka to forget older data through retention policies.

    What does ' lossy effects on stream-ability. ' mean here. Stream slows down, data loss or something else?

  • I wonder if this would ever be integrated into Kafka proper. Shipping out historical chunks onto infinite storage seems like a generally sensible thing.

    This would be even better if it didn't need a modified client.