Caching vs. Tiering: The Procrustean Bed Revisited

In science you need to understand the world; in business you need others to misunderstand it.
- Nassim Nicholas Taleb, “The Bed of Procrustes”

In Greek myth, Procrustes was an interesting character, albeit in a psychopathic way. He was a son of Poseidon and lived in Attica, between Athens and Eleusis. He would grab passersby and have them sleep in a special bed. He wanted the bed to fit them perfectly. So he chopped off the legs of travelers who were too tall, and stretched out those who were too short, ensuring a perfect fit. The “Procrustean bed” is a metaphor for arbitrarily forcing things to fit an unnatural scheme.

What does this have to do with caching and tiering? Read on.

Procrustes-Bed-1024x575

Caching and tiering technologies have been part of storage systems for a long time. Caching has traditionally been used to improve performance by writing, and copying hot data for reads, to a small amount of DRAM/NVRAM. Tiering has been used to save money by using migration software to periodically move cold data to cheaper storage.

Tiering has its roots in Hierarchical Storage Management (HSM). After HSM there was ILM (Information Lifecycle Management) and more recently AST (Automated Storage Tiering). The storage tiers changed over time as storage media performance grew incrementally – SATA, 7.2K SAS, 10K SAS, 15K SAS, FC, etc. Data migration across tiers evolved from manual migration to semi-automated to automated policy-based migration. Despite the evolutions, the tiering approach has fundamental limitations. Unless the data access patterns are well understood or deterministic, data distribution across tiers will not be optimal, hurting performance and wasting expensive drives. Additionally, the data migration process which periodically migrates data in bulk, consumes significant array resources, further affecting performance.

Enter virtualization. Virtual servers and server clusters need a lot of IOPs, orders of magnitude more than old disk-based architectures can provide. Flash storage can provide a ton of IOPs. However flash is a lot more expensive than disk. So, incumbent vendors retrofitted flash as a high-end storage tier into their legacy disk-based architectures. A classic Procrustean move – force-fitting a potent next-gen technology into ancient, limiting architectures. In a virtual server environment where VM migrations are a regular occurence, data access patterns are anything but predictable. Sure, there will be some performance gains, but at a disproportionate price. Not the best use of flash or cash. A bigger travesty is that vendors often charge extra for tiering software licenses.

Most new post-virtualization hybrid storage architectures use flash for data caching. SSDs are used as a second level persistent storage cache behind DRAM. Built from the ground up with flash as a core component, the system caches fine grained hot data in DRAM and flash, in real time. The idea is to serve a majority of the data requests from DRAM or SSD. The storage system de-stages writes and updates asynchronously and automatically. Hard drives are used as dense, inexpensive, long-term storage. A vast majority of I/O to and from servers is handled from DRAM and flash, ensuring very high IOPs and low latency. The low cost of hard disks subsidizes the overall cost of the array.Tastes great, less filling. Moreover, there is no additional software or a license fee for caching. It is a part of the core system.

Of course, there are nuances within the new generation cached architectures which can translate to real differences in performance, longevity and overall TCO. That is another blog for another day.

By the way, if you are wondering about what happened to Procrustes, he was killed by Theseus the Greek hero. Theseus forced Procrustes to lie in his own bed and decapitated him. Apparently the bed was not a perfect fit for Procrustes.


Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>