With the scale at which the applications are working today, caching is all over and most of us are already familiar with it. It is very common for all CPUs, browsers, Web Apps, Mobile Apps, DBs to maintain their own cache. As per Amazon, “1 second of load lag time would cost Amazon $1.6 Billion in sales in a year” or for Walmart “When the load time jumps from 1 second to 4 seconds, conversions decline sharply. For every 1 second of improvement, we experience a 2% conversion increase”.
It is important to have a caching layer to ensure blazing fast responses. Caching stores data on faster storage like instead of storing on a disk, it stores on memory, or for example the browsers store cached data on disk to avoid multiple network roundtrips. Today there are multiple caching solutions available in the market. But there is no one-size-fits-all solution for all kinds of application, so in this article, we will discuss various parameters which will help you design your own caching solution as per your application needs.
Before implementing a caching layer for your application, it is important to ask yourself these questions:
- Which use cases in your system require high throughput, fast response, and low latency?
- Are you fine with data inconsistency if you use the cache?
- What kind of data do you want to store? Objects, static data, simple key-value pairs, or in-memory data structures or native data structure or static files?
- Do you need to maintain a cache for transactional/master data?
- Do you need an in-process cache or a shared cache in a single node or distributed cache for the n-number of nodes?
- Do you need an open-source, commercial, or framework-provided cache solution?
- If you use distributed cache, what about performance, reliability, scalability, availability, and affordability?
Various caching strategies tell us about the relationship between the data source and the caching system.
Cache Aside: In this strategy, your app has access to both cache & storage, Whenever the request comes for any data, it checks the key in cache and if there is cache miss then the app goes to storage to retrieve data and insert it in the cache. This strategy is useful when using an external cache provider like Redis or Memcached.
The main benefit we get from this is that we cache only what’s needed. But there are few cons to this approach as well as this adds an implementation complexity and also the cache misses are expensive. Also, the problem of data staleness may arise if the underlying data changes as these changes are not notified to cache.
Read Through/ Lazy Loading: In this strategy, the app doesn’t have access to storage and only communicates to cache and if there is a cache miss then the cache will go to storage and get the cache data. This pattern is common in ORM frameworks.
This strategy helps us implement On-Demand caching and we cache only what is needed. If there are multiple cache nodes and a node failure occurs then this doesn’t impact the app but the latency increases. As in the new cache when more requests start coming, the new data needs to be added due to miss.
The drawback of using this strategy is that cache misses are expensive: As we check in the cache, retrieve from the database, and then put data into the cache. This makes the process expensive. Also, this suffers from data staleness when the underlying data changes and the cache key is not expired.
Write Through: In this strategy, our app interacts with API that for each update also upsert that data is cache then data is never stale but then it makes writes more expensive and also we may end up writing data which no one may ever read. Also, both the operation of inserting into the database and upserting caching should be done in a single transaction otherwise the data will become stale.
This approach is suitable for a read-heavy system that can’t tolerate staleness. The drawback of using this approach is we get a write penalty and also there can be an issue of Cache Churn (If the cache data is never read then the cache will host unnecessary data. This should be handled with expiry or TTL).
Write Behind: The data is not written immediately but the application will write to the cache first then after a certain configured interval, the written data is asynchronously synced to the underlying data source. Here the cache acts as a buffer and maintains a queue of write operations so that they can be synced in order of insertion.
This solves the problem of write penalty — As read write both happen at the cache side this improves performance as well as it reduces the load on storage due to bulk update making it suitable for high throughput read/write system. Another benefit of using this approach is that the app is insulated from database failures.
The main pitfalls of this reliability, if cache crashes then we lose some updates. Also, there is a lack of consistency, when flush is not often then this problem will come. Also due to eventual consistency if there is any direct operation on the database it might be using stale data.
Refresh Ahead Caching: In this, the cached data is refreshed before it expires. Oracle coherence uses this.
The refresh-ahead time is expressed as a percentage of the entry’s expiration time. For example, assume that the expiration time for entries in the cache is set to 60 seconds and the refresh-ahead factor is set to 0.5. If the cached object is accessed after 60 seconds, Coherence will perform a synchronous read from the cache store to refresh its value. However, if a request is performed for an entry that is more than 30 but less than 60 seconds old, the current value in the cache is returned and Coherence schedules an asynchronous reload from the cache store.
This approach has reduced latency compared to read-through Cache. Also when the same cached key is being used by a large number of users. So staleness of data gets mitigated to an extent. But it adds, implementation complexity as the cache service will have to take extra steps to refresh all keys.
Eventually, the cache will become very large (It has too many keys or takes too much memory). The eviction policy sets a maximum limit on cache size. To do so the cached items are removed from the cache depending upon the eviction policy in place.
LRU Policy (Least Recently Used): Most caches use this as a default eviction policy. It is based on a linked list where the head points to the next item to be removed. If there is a cache hit then the hit item moves to the tail of the list becoming the most recently used item. If it is a cache miss then the new element moves to the tail of the list and pushes one element out of the list from the head. This suffers from false cache eviction when a lot of new keys are requested then some popular keys may also get removed. It is nearest to the most optimal algorithm.
LFU Policy (Least Frequently Used): This policy should be used if it is required to keep the most popular keys at all times. This solves the problem of false cache eviction but is more complex than LRU. Every key has a counter when the cache hit then that entry count is reset but others keep increasing. When there is a cache miss then the new element is added with the zero counter and the key with the highest counter value is evicted. This has an overhead of keeping the counter but as always in designing it’s a tradeoff to choose from.
MRU Policy (Most Recently Used): Let’s consider tinder as a use case. Let’s say it stores the matching profile in some cache for a user so once you have taken some action on a profile (left or right swipe), post that the data is no longer needed, so this can be freed up to get some more available space.
FIFO (First In First Out): It’s similar to MRU but it follows the strict ordering of inserted data items whereas in MRU the order of insertion is not relevant.
Based on your application requirements you may use a combination of the strategies.
Why use cache?
Caching improves latency and can reduce the load at origin( servers and databases). Caching provides the following benefits:
1. Availability of content during network interruptions: If some situation where the content is not accessible from the origin server, the user may still be able to access the data from the cache.
2. Increased Read Performance (aka latency): More performance can be gained by using aggressive caching.
3. Responsiveness: The content is retrieved faster as the entire round trip is not required and cache can be maintained close to the user like browser cache.
4. Increased Throughput: As it reduces the load from the origin server.
5. Decreased network cost: the requested content can be cached at various points in the network path as a result there will be no network activity beyond the cache.
Cache Use Cases
1. Token Caching: Caching the token will give high-performance user authentication and validation.
2. Session Store: When there is a spike in the user requests then many of them may end up being read the query in the database, for some non-critical dynamic data those can be stored in the cache.
3. Web Page caching: During the spike time, we can store full/fragmented pages and serve them through cache.
4. Counter / Id Generation: When we have a set of relational data and we want to generate a unique id for such data then we can fetch in cache and work with such data at scale. Click here to read a paper on the same.
5. Speeding up RDBMS: RDBMS becomes slower when working with millions of rows. Storing unnecessary or historical data for a longer duration can make indexing slow. In this scenario, we can cache the select queries and invalid them after some period of time. Many RDBMS comes with their own internal caching but that is limited by memory so it is better to use an external system for such caching.
6. In-Memory Data Lookup: Cache can be used to store information like static data, responses, historical data, or any other data depending on your use case. It can be used in dynamic problem solutions.
Cache Data Types
1. Object store: It is suitable for unmodifiable data such as database result sets, HTTP responses, rendered Html Pages, etc.
2. Key-Value Store: The data is stored as simple key-value pairs. This is supported by any cache providers.
3. Native data structure cache: It allows storing and retrieving data from natively supported Data Structures.
4. In-Memory caching: This is suitable for key-value storage or object stored in memory and to be accessible by the same node. If your app instances are running on various nodes and requiring the same data this will lead to cache duplication.
5. Static File Cache: This type of cache is suitable for storing static files such as images, GIFs, CSS, JS files.
Single Node (In-Process) Caching
This is useful for non-distributed standalone systems. Here the app itself is responsible for instantiating and managing its own 3rd party cache objects.
One of the common use cases for such caching is — creating a cached object pool for the most recently used network connection, caching database entities.
Also, this can be used when working with standalone mobile and web apps where we want to temporarily store some data which we get from our APIs or static files like images, CSS, JS, etc. This may also be used as a way to share already created objects between various methods in the backend.
Here the data is locally available so it is easy to maintain and is incredibly fast but the node memory can be consumed up by cache when the size grows and if multiple app relies on the same set of data then there is cache data duplication.
When we have to scale our app to millions of request per minute then we deal with huge amounts of data which cannot be handled by a single node. We need several machines clusters and several such clusters to scale our application to this scale. For any good distributed caching solution, it must meet the below-mentioned requirements:
Availability: High availability is very important, depending on the use case it might be okay to get stale data but unavailability is not desirable. Cache needs to be available at all times even where the data center is not operating or there is some natural calamity.
Manageability: It should be easy to deploy, monitor, and use the provided dashboard, and should provide real time metrics that can be used by SREs
Scalability: The system should deliver a steady performance even under load. The distributed cache system must be able to elastically grow up and down.
Affordability: Cost is always an important part of the decision, this includes both upfront cost and all ongoing costs as well. The evaluation should be based on the total cost of ownership (License, Services, maintenance, hardware, and support).
Simplicity: It is also an important considering, as adding a new cache to our existing deployment should not add any complexity to the solution.
Performance: the cache should be able to meet the required throughput at all times for read and write operations.
Caching is very easy to understand, simple to implement and results are always breathtaking. I am more a friend of Redis than Memcached because of extra DB features provided by Redis like persistence, built-in data structures like lists and sets. In the next part, we will compare in detail various Caching solutions available.