Whether you’re new to Redis or you just need a refresher of the features available, this guide can help you understand all the data structures that Redis provides.
Redis data structures are simple — none of them are likely to be a perfect match for the problem you’re trying to solve. But if you pick the right initial structure for your data, Redis commands can guide you toward efficient ways to get what you need. Many Redis commands have a prefix which indicates the type of underlying data structure they are meant to work with.
Prefix | Uses | Caveats and Common Misuses | |
---|---|---|---|
Hash | H | Object storage; anything you would use a dictionary object for | Just like Redis itself, one of the largest problems users encounter with Hashes is key accounting. Be sure you have a system in place to ensure you don't fill up your hashes with keys and values you don't need. |
List | L | Queues, stacks and cyclical lists | Don't try to do random access on large lists, and be careful about queues backing up. |
Set | S | Tagging systems, buckets for time-series data | Intersections and unions must be used with care - keep your sets small. Sets are not the best way to count uniques. Use HyperLogLog instead! |
Sorted Set | Z | Scoreboards, lexicographical searches, random-access lists | Zsets are one of the most versatile structures, and also one of the most expensive. Keep a close eye on your big O! If you're using sorted sets for autocomplete, you might investigate ZRANGEBYLEX, which handles lexicographical searching (but be careful with UTF-8!) |
Stream | X | Time series, queues, message distribution | Failing to cap streams is common, they also sometimes get confused with the capabilities of distributed messaging systems (specifically Apache Kafka). |
String | (none) | Simple caches, counters, bit manipulation | Be careful about extremely large values, the GETting or SETting of which can introduce hard-to-find failures in certain networks and Redis clients. It is easy to hurt yourself by using SETNX to build a locking system. |
Redis provides a number of operations for manipulating special cases of particular types, while the “under-the-hood” type is one of the above general purpose data types:
Prefix | Underlying Type | Uses | Caveats and Common Misuses | |
---|---|---|---|---|
Bitmap | BIT | String | Space-efficient flags for analytics | Confusion about capabilities, or losing track of the complexities of accounting in a bitmap-based system. |
Counter | INCR, DECR | String | Space-efficient analytics | Overflow of the 64-bit signed range will return an error |
Geohash | GEO | Sorted Set | Store locators, finding nearby users | Long-distance calculations, anywhere specialized projections or sophisticated geographical functions are needed. |
HyperLogLog | PF | String | Counting uniques for analytics | Merging multiple HLLs together can be costly. Be sure to have a cap on the number of parameters you specify in a PFMERGE. |
Pub/Sub | PUB, SUB, PSUB | (none) | Ephemeral message passing | Simple, fast message-passing with no durability. The Stream type is a better choice for most use cases. |
Misusing a data structure’s access patterns is like misusing a dictionary: don’t flip through every page to look up a word! The time complexity of every Redis command is documented on redis.io.
Big O is a way of expressing the “limiting behavior” for how something will perform. A simple way of thinking of it is how the operation will perform as the amount of data in the system grows. If an operation is O(1), it will take a fixed amount of time to complete, no matter how much data is involved. An operation which is O(n) will scale linearly with the data, and one which is O(n²) becomes exponentially slower as the amount of data increases.
For a more thorough treatment, see this easy introduction to big O with sample code and graphs.
Good system design takes more than basic algorithmic analysis, but it’s worthwhile to brush up on the basics from time to time, and think about the total complexity of data access in your app.
Redis is single-threaded, which makes it easy to reason about but also can also cause some surprise to people only familiar with multithreaded data access. Long-running operations are more dangerous in Redis than in other databases: since Redis only does one thing at a time, long-running operations can cause system-wide backups.
Keeping data access operations very granular is the key to good performance with Redis. 100,000 O(1) operations in Redis may well be preferable to 1 O(N) operation on a 100,000-element collection, even though you’ll be paying a penalty due to network and protocol parsing.
How can you find out the pain-points in your system? Check SLOWLOG regularly (or if you’re a Memetria customer, just look at the “Slow Queries” tab on your dashboard). The longer a single operation runs, the more it slows down everything else your system is trying to do, so keep it short!