EMC World – Day 4 – VNX: Skew & Data Placement

Hitachi Ultrastar SSD400S SSD FamilyThe last breakout session I attended at EMC World 2013 was titled “Leveraging SSD: Designing for FAST Cache & FAST VP on Unified Storage”. By far one of the best breakout sessions of the week for me: technical and I can apply it in the field on pretty much a daily basis!

If you’ve got SSD drives in a VNX you can use it in two ways: as FAST Cache or as an extreme performance tier in your storage pools (FAST VP). Either of these implementations has advantages and disadvantages and they can also be used concurrently (i.e. use FAST Cache AND FAST VP). It depends on what you want to do…

But let’s start at the beginning: first you need to figure out your skew! What?!

Skew

Yes, skew! Skew will indicate how much of your capacity is generating how much of your IOps. You need some sort of skew (i.e. hot vs cold data) before tiering makes any sense.

skew

A LUN on a VNX storage pool is built out of 1GB chunks. Lets assume you had performance statistics for each of these chunks. If you would sort them in order of IOps you could end up with a chunk that generated 10% of total IOps, another chunk that generated 8%, another at 5%, 2%, etc. The very last chunk would generate next to no IOps. If you organized it in the manner detailed on the slide above, you’d end up with a similar graph. Of course EMC² has tools to generate such graphs… 😉

If you locate the point where the graph is at a 45 degree angle (or: fraction of IOPs + fraction of capacity = 1), you’ll end up at your skew point. In the example above this is 70%: 70% of your IOps are generated by 30% of your capacity.

Every environment has a different graph. Starting at the very worst of them (but certainly the easiest!): a skew of 50%. It would resemble a linear graph at a 45 degree angle, with half your capacity generating half your IOps. In essence all your data is about equally active. In this case tiering makes no sense: just size your entire pool with a suitable drive type and you’re done!

A better tiering example:  a skew of 90%. 10% of your capacity is 90% of your IOps. These scenarios are the ideal use-cases for tiering: a small amount of capacity is generating a lot of IOps: put it on flash! Add a bit of SAS to bridge the gap between flash and NL-SAS and top off with NL-SAS to get the capacity you need.

Data placement

With your multi-tier storage pool, FAST VP will take care of the rest. It will monitor the 1GB chunks of data according to I/O intensity and assign them a temperature. Hot data will be moved to flash, warm data to SAS and cold data to NL-SAS. FAST VP will try to use your investment most efficiently: it will always try to fit as much of the data in the fastest tier before moving down. This means that even if your pool isn’t completely full, you will always be placing as much data as possible on flash (or SAS). It will fill each tier up to 90% before moving down to the next tier: the remaining 10% is left free to assign new chunks.

And with this assigment of new chunks lies a problem. In R31 you could choose between 4 tiering options per LUN: auto tier, highest tier, lowest tier or disable auto-tiering. Most people use auto tiering since this is the only option that really allows FAST VP to do its magic. However if you chose auto tiering, the VNX will allocate new chunks on the private RAID group with the highest free capacity.

Ok, so we’ve got three tiers. FAST VP will fill the SSD and SAS tier to 90%. NL-SAS will almost always have the highest percentage of free space since you like to keep some free space for ad-hoc LUN requests. And even if all tiers are exactly 90% full, due to the fact the NL-SAS disks are a lot larger than the SSD and SAS drives, it will still have the most free capacity in GB. And this is where the data allocation algorithm looks for: the most free capacity. So your new chunks will almost always end up on NL-SAS; the lowest tier. If your high performance database is writing data and needs to allocate new chunks, these will end up on NL-SAS, even if this new data is very hot right from the start.

To remedy this, starting with R32 a new tiering option was implemented: start high then auto tier. New chunks will end up in the highest possible tier. If the highest tier is full it will pick the next one down; in our 3-tier example SAS. In the next relocation window FAST VP will then relocate all chunks, rebalance the tiers to 90% max and you’re ready for the next day.

Be aware that if you allocate too many chunks in a 24 hour period you may still fill up the free space in your SSD tier and even your SAS tier, resulting in your new chunks still being allocated on NL-SAS. The new tiering option is a big improvement but cannot magically create space in a tier that’s full! 😉

Stay tuned for part 2 of this “Leveraging SSD: Designing for FAST Cache & FAST VP on Unified Storage” session in an upcoming blog!

One Pingback/Trackback