Early data modeling mistakes
At this point in time, I have worked on three products based on NoSQL data-stores, more specifically with Cassandra/AWS DynamoDB and ElsaticSearch.
I had personally introduced them myself in one of them, and they were already there in the others.
Each time, it was to reduce complexity and be scalable.
Each time the decision was made early in the project, and at some point, the team was putting a lot of effort to get use to it.
Mainly because it was not fitting the actual access patterns, and due to the operational complexity and costs.
The major issue with NoSQL data-store is that, the data modeling should be done with the read pattern in mind.
The issue, when taking this decision early, is to guess the main read patterns when there is no usage, or product vision (which no one should fully rely on to build an architecture).
In every cases, we never reach reach the point NoSQL's scalability would actually be useful.
There are two ways to get around it.
Stick with a classical Relational Database Management System, which at least will not prevent discovering read patterns.
Rely on Event Sourcing.
It may seem controversial, it is true that coming up with a well-design set of commands and events is hard, but it can be done iteratively.
I usually start by creating one event by UI actions (e.g. "Add an article in the basket", "Change basket item count"), and over time, I'll rewrite the events to reduce duplication.
It's particularly effective during inception/discovery phases:
- Add as many events as there are UI/UX action in a single aggregate/stream
- When events are stabilized (no change in few weeks/months), there are too much (> 40-200), regroup them
- If there are still too much, or there are performance requirements forcing to distinguish life-cycles, split the aggregate/stream