SNIA: Avoiding tail latency by failing IO operations on purpose

SNIA logoConsistency and predictability matter. You expect Google to answer your search query within a second. If it takes two seconds, that is slow but ok. Much longer and you will probably hit refresh because ‘it’s broken and maybe that will fix it’.

There are many examples that could substitute the scenario above. Starting a Netflix movie, refreshing your Facebook timeline, or powering on an Azure VM. Or in your business: retrieving an MRI scan or patient data, compiling a 3D model, or listing all POs from last month.

Ensuring your service can meet this demand of predictability and consistency requires a multifaceted approach, both in hardware and procedures. You can have a modern hypervisor environment with fast hardware, but if you allow a substantially lower spec system in the cluster, performance will not be consistent. What happens when a virtual machine moves to the lower spec system and suddenly takes longer to finish a query?

Similarly, in storage, tiering across different disk types helps improve TCO. However, what happens when data trickles down to the slowest tier? Achieving that lower TCO comes with the tradeoff of less latency predictability.

These challenges are not new. If they impact user experience too much, you can usually work around them. For example, ensure your data is moved to a faster tier in time. If you have a lot of budget, maybe forgo the slowest & cheapest NL-SAS tier and stick to SAS & SSD. But what if the source of the latency inconsistency is something internal to a component, like a drive?

Continue reading “SNIA: Avoiding tail latency by failing IO operations on purpose”

Hardware has set the pace for latency, time for software to catch up

I can’t recall the last storage system installation that didn’t have some amount of solid state drives in its configuration. And for good reason: we’re rapidly getting used to the performance benefits of SSD technology. Faster applications usually result in real business value. The doctor treating patients can get his paperwork done faster and thus has time for more patients in a day. Or the batch job processing customer mailing lists or CGI renderings completes sooner, giving you a faster time to market.

To reduce the application wait times even further, solid state drives need to be able to achieve even lower latencies. Just reducing the media latency won’t cut it anymore: the software component in the chain needs to catch up. Intel is doing just that with Storage Performance Development Kit (SPDK).

Continue reading “Hardware has set the pace for latency, time for software to catch up”