Early data modeling mistakes

7th Oct 2025
2 min read
Tags:
software engineering,
retrospective,
architecture

At this point in time, I have worked on three products based on NoSQL data-stores, more specifically with Cassandra/AWS DynamoDB and ElsaticSearch.

I had personally introduced them myself in one of them, and they were already there in the others.

Each time, it was to reduce complexity and be scalable.

Each time the decision was made early in the project, and at some point, the team was putting a lot of effort to get use to it.

Mainly because it was not fitting the actual access patterns, and due to the operational complexity and costs.

The major issue with NoSQL data-store is that, the data modeling should be done with the read pattern in mind.

The issue, when taking this decision early, is to guess the main read patterns when there is no usage, or product vision (which no one should fully rely on to build an architecture).

In every cases, we never reach reach the point NoSQL's scalability would actually be useful.

There are two ways to get around it.

Stick with a classical Relational Database Management System, which at least will not prevent discovering read patterns.

Rely on Event Sourcing.

It may seem controversial, it is true that coming up with a well-design set of commands and events is hard, but it can be done iteratively.

I usually start by creating one event by UI actions (e.g. "Add an article in the basket", "Change basket item count"), and over time, I'll rewrite the events to reduce duplication.

It's particularly effective during inception/discovery phases:

Add as many events as there are UI/UX action in a single aggregate/stream
When events are stabilized (no change in few weeks/months), there are too much (> 40-200), regroup them
If there are still too much, or there are performance requirements forcing to distinguish life-cycles, split the aggregate/stream