As software engineering professionals (irrespective of the role), we wouldn’t get enough time or opportunity to design & build a variety of complex distributed systems. Most of our regular time is spend on coding, fixes, planning, escalations, demo etc. These activities are no doubt important but should not stop us from learning software architectures and the best way to learn is to study existing systems.
Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successeshttp://aosabook.org/en/index.html
In this article, I share my top (and favorite) 3 open source distributed systems (in no priority order) which make up for a great case study of distributed system design. These 3 systems have their limitations and areas of improvements but have also evolved architecturally. In my opinion, studying (at minimum) strategies implemented for Replication, Sharding, Master node election and Data delivery to clients will add value to case study.
(1) HDFS – Based on Google’s famous research paper The Google File System, Hadoop Distributed File System (HDFS) has been a remarkable creation from Doug Cutting (including the Hadoop framework) that continues to be the key component of most big data systems. Leveraging native OS file systems to build an abstracted distributed File System that utilizes not-so-expensive commodity servers, combined with in-built resiliency and rack awareness truly made democratization of big data processing.
In Doug’s words, “It (Hadoop) certainly wasn’t transactional or relational in any fundamental way. It tended to encourage people to be more experimental, agile in their approach, to embrace all kinds of wacky data formats and what people like to call unstructured, which I think is kind of a pejorative for what a database doesn’t handle elegantly.”
Optionally, you can include MapReduce as part of studying HDFS architecture.
(2) Elasticsearch – Created by Shay Banon and based on Apache Lucene, it has become one of the most popular, feature-rich, NoSQL document store for text-based search. Primarily used for log analytics but has evolved to serve multiple use cases while ingesting and analyzing JSON data. There are multiple components in the architecture coordinating to provide resiliency and keep the cluster available thus making Elasticsearch an interesting case study.
(3) Apache Kafka – From the website, “an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications”. The value and popularity of Kafka are such that it’s the de-facto publish/subscribe based streaming messaging system.
Adding Kafka in this list is also important because it can be seen both as a data store as well as a pub/sub based message queue. A streaming system architecturally differs from traditional notion of data store in the sense of various guarantees it may provide for data delivery between producer and consumer.
- At most once—Messages may be lost but are never redelivered.
- At least once—Messages are never lost but may be redelivered.
- Exactly once—this is what people actually want, each message is delivered once and only once.
You can find list of use cases implemented using Kafka here. There is also a free e-book available from confluent.io and a recent architecture improvement plan in detail here. Finally, must-read for a case study – Kafka design docs.
- Hadoop: The Definitive Guide by Tom White
- Designing Data-Intensive Applications by Martin Kleppmann
- Hadoop Application Architectures by Mark Grover et al.
- Kafka: The Definitive Guide by Neha Narkhede et al.
- Designing Event-Driven Systems by Ben Stopford
- Elasticsearch: The Definitive Guide by Clinton Gormley et al.
- Learning Elastic Stack 7.0 by Sharath Kumar M N et al.
Hope this helps. Let me know in the feedback, your favorite distributed systems for case study.
Categories: system design