Event sourcing migrations retrospective
Few weeks ago, I was talking to one of my co-workers about Event Sourcing.
At some point, he was inquiring about retrospectives on migrating old systems to event sourcing, benefits obtained, pitfalls to avoid, and so on and so forth.
I have looked around a bit, ask some friends, nothing relevant came up.
I'll try to summarize my 6+ years of experiences in this log.
Experiences
Distributed mail server
Business context: A FOSS company, providing a G-suit alternative.
Team context: 6 seasoned SWE, already accustomed to software craftmanship, DDD, and event sourcing.
Technical context: it was a 10 years old Java project, mostly relying on Cassandra, when I have joined, they were in the middle of a complete modernization.
Challenges: Solve concurrency issues, especially in the mailboxes area, most of the bugs are in this area (race conditions occurs when a new e-mail were incoming, while the user was working on them).
Event sourcing scope: the mailboxes, very few events were involved.
Pitfalls: the event sourcing implementation was quite complex and quite limited, which means many of the operations were stateful, leading to a global state permanently incoherent (e.g. two instances connected to the same Cassandra cluster will has two distinct states). Add new events involved a lot of boilerplate.
Benefits: Mailboxes concurrency issues were fixed, the implementation was comprehensive enough to have a lot of tools (debugging, rebuilding projections).
Shared dashboard
Business context: A pre-seed startup looking for its product-market-fit.
Team context: 4-people team, mostly juniors, discovering DDD.
Technical context: A brown-field Haskell project I had started few months ago, using AWS DynamoDB as primary data store.
Challenges: The project was stressed by numerous changes. We also had to figure out what the user was doing, especially when discovering new corner-cases (also called business opportunities).
Event sourcing scope: the whole organization, each "group" was in the aggregate, alongside with users management, dashboards, etc. everything.
Pitfalls: Team adoption was a huge pain point, most of them considered it as a "waste", also, intents were missing (Commands should have been logged), migration was done in big-bang mode (7-day effort), which created some tensions. Keeping original projections in the discovery phase added burden without any benefits.
Benefits: Each new workflow or concept took only a few hours to implement in the backend, events and commands where exposed in the API endpoints, new events where pushed asynchronously.
Distributed business process
Business context: A mature B2B company customer-focused with a very comprehensive (yet adapting) business processes.
Team context: 6-people team, mostly seniors, with some DDD/Event sourcing knowledge.
Technical context: 3 years project, microservice based, each of them had a specific data/events store (EventStore, plain files, SQLite) and design discipline.
Challenges: The business process was really complex (40 steps, 60 events), constantly changing.
Event sourcing scope: Only the regular, customer-facing process.
Pitfalls: Each microservice had their own events, trying to integrate each-other events, loosing information along the way, events were also rigid, no event could be added easily, or modified, or the stream rewrote. Most of the computation were done around a single stream, projections being computed at startup (45 minutes). While only few things were outside the system, but it was the missing piece to have a stable stream. Every microservice were involved in new features. A big events re-design was planed, but never implemented.
Benefits: Each microservice had their own events, giving a lot of flexibility, limited changes (such as new projections) were easy.
Lessons learned
- Events shouldn't be set in stone (they aim to evolved, be updated, deprecated, rewritten)
- Big bang rewrites are bad, but incomplete migrations are harmful
- Big aggregates are okay, in the literature, aggregates should have few events, in practice, big aggregates are better than a lot of smaller ones involving synchronization
- Focus on developer experience, focus on tooling and easiness to add events
- Don't focus on projections, they could be added later, if they are really popular and critical