Queue, Cache, Shard – Dallen's Landing

Premise

Several months back, I stumbled upon a reddit thread claiming that all distributed systems problems could be solved with either a cache, a queue, or sharding. I had never considered this, and while my distributed systems experience is still expanding, most systems I’ve worked on have leveraged some or all of these techniques. In an industry where engineers are far more expensive than compute costs, the ability to distill a complex problem space into one of three basics categories seems significant. In an attempt to understand if queues, caches, and shards really are the answer to all system design problems, I want to explore a few classic examples through this lens.

To begin, I would like to explain what I perceive to be the core uses of each of these technologies:

Queues – increase write capacity and smooth traffic at the cost of end to end latency
Caches – Increase read capacity, particularly when reads are expensive to perform (require significant filtering or value remapping) at the cost of either consistency or availability
Shards – Increase total capacity at the cost of globalized or aggregated views

Based on these principals, it would seem as though we can increase both read and write capacity freely without too much headache. Let’s examine how these tools would be applied to class system design questions.

The URL Shortener

Any engineer who has studies for a systems design interview is familiar with the URL shortener design question. The two basic requirements are as follows:

Users should be able to submit a long URL and receive a shortened version
Users should be able to access the original URL by using the shortened URL

Essentially, build a system which maps short URLs to long ones. To make the question more interesting, engineers are often asked to support a user base of 10B and a Request Per Second (RPS) or 100B, a situation which seems unlikely to occur, but does require some more thought.

Most solutions to the URL shortener involve a distributed method for generating unique short urls, and a geo-located CDN for faster data look-ups. If we blur our eyes slightly, this sounds a lot like a sharded id assignment database which scales write capacity along with a cache which scales read capacity. So for the URL shortener, it would seem sharding and caching are sufficient to attain our goals!

URL Shortener ✅

Collaborative Document Editor

“Build google docs” is another class system design question. There are generally two main functional requirements in this problem:

Users can edit the same document in real time
User writes are ingested and persisted even when each client lacks an up to date view of the document

We can assume that the scale is roughly equal to our hypothetical URL shortener, and get directly into potential solutions. The approach I have most often seen when reading about this question uses a Write Ahead Log to track user edits while providing a consistent view of the doc regardless of traffic. This involves a queue of write operations which will eventually be persisted to a database, along with a cache of document segments which provide for low latency reading of the document as a whole. In this case, sharding may or may not be used on the document level, but cannot be used within a single document as it does not allow for effective aggregation of the document’s content as a whole. It would seem as though, yet again, the core of the distributed system is a queue to smooth traffic and write capacity paired with a cache to accelerate read capacity.

Google Docs ✅

Top K Most Viewed List

While less common, another classic system design problem involves locating the top K most viewed pages on a web platform. For example, you might want to see the top 5 most viewed posts on a blog, or the top 10 most viewed videos on YouTube. The functional requirements usually consist of:

Provide a global ranking of all pages across a whole platform
Allow efficient reading of this global rank data at any time

The key issue with this design problem is it’s requirement for a global aggregation of page view data across a potentially infinite number of pages. Using our three tools, we can have a sharded queue of events being delivered to temporary storage hosts which track the number of views on each page using a sliding window of some kind. From each of these shards, we can select the top K elements and aggregate them into a global top K list. We can then use a cache to store and provide this information for a set period of time.

While this appears to solve our problem, I think there is a critical tool being used which requires it’s own category: Map Reduce. In this case, we have a set of sharded nodes storing a subset of page view events which need to be aggregated across. There is a simple way to do this, each node computes it’s local top K elements and provides it to a global aggregator. Never the less, this is a design approach which feel distinctly different than the three listed in our hypothesis.

Top 10 List ❌

Conclusions

The hypothesis that any system design problem can be solved with only three tools turned out to be false. However, while writing this blog post I came to appreciate more deeply the flexibility and efficacy of queues, caches, and shards. I never expected all distributed systems problems to be reduced to a couple of three solutions, however I did not anticipate how hard it would be to find a problem which couldn’t be solved using some form of queue, cache, and shard. Obviously each of the solutions described above require careful considerations as to the format of the queue message, cache invalidation techniques, and shard keys, much of which is non-trivial. At the same time, I found this exercise to be a helpful reminder that when designing large scale systems, it’s worth asking if a simple cache or queue can address the problem before digging deeper into more complex and nuanced solutions.

In the future, perhaps I will add Map Reduce to my list of system design tools and write a followup post. It seems unlikely that any finite set of tools could possibly solve the infinite set of system design situations, but the exercise is enjoyable none the less.