Infrastructure corporations
11 years ago, I was completing a double graduation:
- A master degree in Software Engineering, I consider and present myself as a Software Engineer
- A master degree in Research about Networks, Telecommunications, and Services
It worked basically because my engineering school partnered with a local university and my last engineering year has been lightened of the management syllabus, and replaced by the courses purely focused on computer theory and research courses.
My final internship should have lead to a master's thesis.
A master's degree is a research paper, a majority of research papers are structured as follows:
- Abstract (which should be written first)
- Introduction
- Background/context of the issue
- Related works
- Solution (description, and eventually implementation)
- Lesson learned/retrospective if it was implemented
- Conclusion
- Acknowledgements
- References
To write my master's degree, I had to read 8-12 research papers daily (300 pages) for 5 out of the 7 months of my internship, some of them were published in the 1930s.
There is a lot of synthesis work involved, more than the work on your solution.
After painfully writing my master's degree, I have stopped reading research papers until few months when I have started to look at "papers every Software Engineers should read":
- Amazon’s DynamoDB.**
- Google File System.
- MapReduce.
- Bigtable.
- Hadoop Distributed File System.
- Kafka.
- etc.
Unlike the papers I have read for my master thesis (which was about programming languages and software architecture), I was surprised that most of them where based on internal solutions:
- RocksDB (Meta): leverage disaggregated storage at Meta was built using the Tectonic File System
- Spanner (Google):
- Bigtable Spanner evolved from a Bigtable-like versioned key-value store. Its implementation is layered onto a Bigtable-based implementation.
- Google File System (GFS)
- Colossus (previous version)
- Megastore
- TrueTime
I'm not sure if it's the network effect, or if it's an attribution bias, but my intuition is that each big tech company creates their own stack, and research papers are constrained by some kind of tech sovereignty policy.
Note: I do not recommend reading research, it's a nice to have, but they are really dense and are based on issues few of us are actually facing in day to day work.
Something we have to consider is that big techs are wider than their main products, they are infrastructure companies, building data-centers, power-plants, networks (including subsea cables, low/medium Earth orbits).
Few days ago, AWS got an outage in a subcomponent of DynamoDB, the postmortem (or this beginner-friendly explanation) displayed the interdependency of technologies led to a cascade of failure, disturbing other services (own by AWS or not).