Why Are People into Event Sourcing?

  • Event sourcing isn't nearly as common knowledge among new programmers as the CRUD-one-row-per-entity pattern, and it really should be. I liken it to introducing version control for your data; when immutable updates are your canonical source, no matter how much the system behind them changes, or the business requirements change, and no matter how many teams are deriving different things from them in parallel, they can all work off of the same data and "merge" their efforts together.

    The one downside is that shifting your business logic to read-time means that you need to have very efficient ways of accessing and memoizing derived data. For some applications, this can be as simple as having the correct database indices over your WhateverUpdates tables, fetching all updates into memory and merging on each request. For others, you'll need to have a real-time stream processing pipeline to preemptively get your derived data into the right shape into a cache. And those are more moving parts than your typical monolith app, but the

    One benefit to actually using event sourcing with a stream processing system is that, in many cases, it can be the most effective way to scale both traffic capacity and organizational bandwidth, much in the same way that individually scalable microservices can (and fully compatible with that approach!). Martin Kleppman at Confluent (a LinkedIn spinoff creating and consulting on stream processing systems) writes some great and highly-approachable articles about this. Highly recommended reading.

    http://www.confluent.io/blog/making-sense-of-stream-processi...

    http://www.confluent.io/blog/turning-the-database-inside-out...

  • Here's the term I wish was unfashionable with the kids: reshaping.

    Did you spot all those command-to-query-to-event-to-log-to-storage data type conversions in those pretty diagrams? That's a whole bunch of needless reshaping of data as it flows through the system.

    For each one of those data transformations to be successful, there has to be accurate communications between people and bug free code written in the data conversion and routing of messages through the system. All those moving parts make changing the system extremely painful, lotsa ripple effects - and every time you have to make a change to your events, you'd have a data migration project for any running event streams.

    Naming things is hard too, and there's a lot more naming of entities needed in a CQRS-ES system.

    I like all the promised benefits of a CQRS and ES, but I can't imagine a case where I'd take the risk of attempting it on anything but a toy project. Perhaps if I was on the version 5 rewrite project for an insanely profitable system where the requirements and design are completely understood up-front. I would need to grok some canonical example of a large, well-architected, well-implemented representative system before I would ever attempt to implement one.

    Are there any non-toy examples of successful CQRS-ES with open source available to read? Did those projects go over-budget, and by how much? Would the authors of those examples still recommend the architecture now that they've gone through the experience?

  • As someone that has fallen for the "event sourcing" promise before, the article does a decent job explaining the promise. Not sure if it will be the next article, but the actual task of delivering on this work is where things break. Hard.

    The vast majority of the things you will ever program are pretty much guaranteed from one statement to the next. Hard boundaries, where things can fail, are often decently understood and actually quite visible in the code.

    Moving everything to be an event completely throws this out the window. You can take a naive view, where you pretend from one event to the next is safe to happen. However, to start building up the system to cope when this is not the case starts to build a complicated system. In areas that are decidedly not related to your business domain. (Well, for most of us.)

    Maybe some day there will be a system that helps with this. Until then, my main advice is to make sure you have solved your system with a naive solution before you move on.

  • Architecting around events has several ramifications.

    For building up a picture of the world, it's pretty good. It's very nice to be able to replay a log of events and recreate a view of the way things are expected to be; if there's a bug in your code, you can fix it and repeat the replay to get back into a good state (with caveats, sometimes later actions creating events may be dependent on an invalid intermediate state). Whereas mutating updates erase history, perhaps with some ad-hoc logging on the side that is more often than not worthless for machine consumption.

    For decoupled related action, it's not too bad. If you have some subsystem that needs to twiddle some bits or trigger an action when it sees an event go by, it just needs to plug into the event stream, appropriately filtered.

    For coordinated action OTOH, e.g. a high-level application business-logic algorithm, you need to start thinking in terms of explicit state machines and, in the worst case, COMEFROM-oriented programming[1]. Depending on how the events are represented, published and subscribed to, navigating control flow involves repeated whole-repo text searching.

    It's best if your application logic is not very complicated and inherently suitable to loose coupling, IMO.

    [1] https://en.wikipedia.org/wiki/COMEFROM

  • FYI in case the author reads this, since this seems to be intended as an intro for people who aren't already familiar with this stuff: I didn't see "CQRS" defined anywhere in this article or in the two or three links I followed from it; they all begin with an assumption that you know the acronym, and delve straight into details. It might be good to define some terms in the front matter (unless I've misunderstood the target audience).

  • Two must-read documents for those who want to learn more about this method of building reactive applications:

    https://engineering.linkedin.com/distributed-systems/log-wha...

    http://martinfowler.com/eaaDev/EventSourcing.html

    Note that Martin's blog is what inspired the event bus in https://home-assistant.io, an open source home automation project I occasionally contribute to.

  • I've tried working out how to move to an event sourcing system, but I always struggle with locking behavior. Do you just have to invent your own locking mechanisms on top of event sourcing?

  • Axon Framework http://www.axonframework.org is a great place to start if you're into Java and want to get a feeling for how event sourcing works.

    There's also a great presentation by the developer, Allard Buijze, at https://www.youtube.com/watch?v=s2zH7BsqtAk.

  • There is a lot that could be done to make event sourcing easier to work with...

    Imagine tooling that allowed an event stream to be used to create state for testing modules, crudlike helpers to allow crud-familiar developers to think that way at first, and workflows based on snapshots, rewind, etc.

    I think a model that used events that correlated to graph deltas rather than crud deltas would be the cat's ass, and many queries about the near-current state could be handled efficiently using ephemeral subgraphs as indexes located at the network's edges.

    If anyone wants to discuss and possibly build some of this stuff, let me know :)

  • I was looking into Event sourcing for a system I built recently, and the tooling just doesn't seem to be that widespread yet. How do you read out of the entire event stream to figure out the current state? While there are tols, they seem to be .net focused. Just didn't seem to be a "standard" answer yet.

    We ended up going with microservices that pub/sub events into Kafka, but maintain their own databases. There's another microservice that lets you query past events for statistics.

  • I for some months now have tried to build a small test-case for a invoice app. I wish to have a good syn strategy and the use of ES sound good. However, I have find how replicate the functionality of a normal app with this: For example, what to do for avoid duplicates and in general pre-saving validations. Also, I need to anyway to use RDBMS tables for hold current-data and RDBMS have not a good history for stream back results.

  • I have been working with this sort of patterns for a while but I have yet to find good texts exploring the topic. Does anyone have book or paper recommendations for event sourcing? The stuff I have seen is mostly programmers reporting on something that worked on their particular domain. I am, looking for something more rigorous and comprehensive.

  • As an interesting comparison, some people see the Redux/Flux pattern as a front-end parallel to event sourcing.

    [0]: https://github.com/reactjs/redux/issues/891#issuecomment-158...

  • How strange, just today I've heard the Event Sourcing name and thought I don't know what it is. (Turns out it is this old idea I knew under various different names). And at the same day I hear about Event Sourcing on HN. What's the buzz?

  • Very curious: if you have multiple datastores, how do you ensure they are consistent? If you scale sideways, how do you ensure nothing gets lost if there's a partition? Etc?

  • Having been part of a project to rewrite a monolith e-commerce site into an event-sourced, domain driven, CQRS system, let me tell you in which situation that is not possible: when you already have data. Remember that in a DDD, ES, CQRS system, the event store is the single source of truth. If you already have data in a relational database, then the existing data is the source of truth. You can't have two sources of truth, that completely defeats the purpose. So it's not actually possible to migrate to an event sourced system, you can only create one from scratch, with no existing data.