Event sourcing migrations retrospective

8th Oct 2024
4 min read
Tags:
software engineering,
design,
retrospective,
architecture

Few weeks ago, I was talking to one of my co-workers about Event Sourcing.

At some point, he was inquiring about retrospectives on migrating old systems to event sourcing, benefits obtained, pitfalls to avoid, and so on and so forth.

I have looked around a bit, ask some friends, nothing relevant came up.

I'll try to summarize my 6+ years of experiences in this log.

Experiences

Distributed mail server

Business context: A FOSS company, providing a G-suit alternative.

Team context: 6 seasoned SWE, already accustomed to software craftmanship, DDD, and event sourcing.

Technical context: it was a 10 years old Java project, mostly relying on Cassandra, when I have joined, they were in the middle of a complete modernization.

Challenges: Solve concurrency issues, especially in the mailboxes area, most of the bugs are in this area (race conditions occurs when a new e-mail were incoming, while the user was working on them).

Event sourcing scope: the mailboxes, very few events were involved.

Pitfalls: the event sourcing implementation was quite complex and quite limited, which means many of the operations were stateful, leading to a global state permanently incoherent (e.g. two instances connected to the same Cassandra cluster will has two distinct states). Add new events involved a lot of boilerplate.

Benefits: Mailboxes concurrency issues were fixed, the implementation was comprehensive enough to have a lot of tools (debugging, rebuilding projections).

Shared dashboard

Business context: A pre-seed startup looking for its product-market-fit.

Team context: 4-people team, mostly juniors, discovering DDD.

Technical context: A brown-field Haskell project I had started few months ago, using AWS DynamoDB as primary data store.

Challenges: The project was stressed by numerous changes. We also had to figure out what the user was doing, especially when discovering new corner-cases (also called business opportunities).

Event sourcing scope: the whole organization, each "group" was in the aggregate, alongside with users management, dashboards, etc. everything.

Pitfalls: Team adoption was a huge pain point, most of them considered it as a "waste", also, intents were missing (Commands should have been logged), migration was done in big-bang mode (7-day effort), which created some tensions. Keeping original projections in the discovery phase added burden without any benefits.

Benefits: Each new workflow or concept took only a few hours to implement in the backend, events and commands where exposed in the API endpoints, new events where pushed asynchronously.

Distributed business process

Business context: A mature B2B company customer-focused with a very comprehensive (yet adapting) business processes.

Team context: 6-people team, mostly seniors, with some DDD/Event sourcing knowledge.

Technical context: 3 years project, microservice based, each of them had a specific data/events store (EventStore, plain files, SQLite) and design discipline.

Challenges: The business process was really complex (40 steps, 60 events), constantly changing.

Event sourcing scope: Only the regular, customer-facing process.

Pitfalls: Each microservice had their own events, trying to integrate each-other events, loosing information along the way, events were also rigid, no event could be added easily, or modified, or the stream rewrote. Most of the computation were done around a single stream, projections being computed at startup (45 minutes). While only few things were outside the system, but it was the missing piece to have a stable stream. Every microservice were involved in new features. A big events re-design was planed, but never implemented.

Benefits: Each microservice had their own events, giving a lot of flexibility, limited changes (such as new projections) were easy.

Lessons learned

Events shouldn't be set in stone (they aim to evolved, be updated, deprecated, rewritten)
Big bang rewrites are bad, but incomplete migrations are harmful
Big aggregates are okay, in the literature, aggregates should have few events, in practice, big aggregates are better than a lot of smaller ones involving synchronization
Focus on developer experience, focus on tooling and easiness to add events
Don't focus on projections, they could be added later, if they are really popular and critical