FAST VP: Let it do its job!

FAST VP in actionNot all data is accessed equally. Some data is more popular than other data that may only be accessed infrequently. With the introduction of FAST VP in the CX4 & VNX series it is possible to create a single storage pool that has multiple different types of drives. The system chops your LUNs into slices and each slice is assigned a temperature based on the activity of that slice. Heavy accessed slices are hot, infrequently accessed slices are cold. FAST VP then moves the hottest slices to the fastest tier. Once that tier is full the remaining hot slices go to the second fastest tier, etc… This does absolute wonders to your TCO: your cold data is now stored on cheap NL-SAS disks instead of expensive SSDs and your end-users won’t know a thing. There’s one scenario which will get you in trouble though and that’s infrequent, heavy use of formerly cold data…

FAST VP Slice Relocation

FAST VP cuts your LUNs into slices: a CX4 and first generation VNX use a 1GB slice size, a VNX2 with MCx technology uses 256MB slices. The array constantly monitors the amount of I/O going to a slice and assigns a slice a temperature. Once every day during the relocation schedule all slices are relocated to their respective tier. FAST VP will always try to stuff as much data in the faster tiers as possible: if you have a storage pool with 10TB of EFDs and 20TB of SAS but only 5TB of data, all of it will be in the EFD tier. FAST VP also tries to keep 10% of free space in every tier to make sure there’s enough room for the creation of new slices for thin provisioned LUNs or if you create a new LUN.

Now picture the following scenario: a server or application is generating a lot of I/O. Slices are promoted to the faster tiers, the array is performing well and the response time to the application is OK, For the rest of the month the application isn’t used because the company isn’t generating any mailings. The data cools down, the slices are demoted to the capacity tier and stored on slower NL-SAS drives. At the beginning of the new month the application is brought back to life and starts generating I/O again. FAST Cache might be able to quickly adjust to this changing workload but FAST VP can only adjust to the changing workload during the relocation window, which is once every day. For the rest of the day the storage response time for that application is slow (NL-SAS speeds), the app owners complain and you’re unhappy.

FAST VP Tiering PolicyTo resolve these kinds of issues there’s a setting you can adjust on the LUN: auto-tiering policy. Normally it should be in the automatic mode of “start high then auto-tier”. If your application behaves like the one described above you might be tempted to put it in “Highest Tier” mode to force FAST VP to put all the slices in the highest tier. But be aware that you are now claiming space in the highest tier for slices that might never need the performance. Other, hotter slices will as a result be pushed down. The problem becomes even more apparent if you have a three-tier pool: SSD, SAS and NL-SAS. There is (currently) no way to pin a LUN to the middle tier. Selecting the “Highest Tier” option in a 3-tier pool moves all your data to SSDs which are fast but also very expensive.

This week I was analyzing a customer array and I noticed a high amount of data being relocated each night. I knew the majority of the performance issues were spindle related: they’ve got quite a lot of high capacity NL-SAS drives and relatively few SAS drives. It would have been easy to just recommend “Buy more disks and expand your performance tier!”. Analyzing the graphs I noticed the following:

FAST VP Temperature by Tier

As you can see the performance tier has much of the red (hot) slices, a little bit of orange and some yellow and green. The capacity tier however ALSO has red slices, a lot of orange, some yellow, etc. Performance wise those red and orange slices should be on the performance tier! Except they can’t because that space is occupied by (pinned) green slices.

As a side note: even if all the hot slices on the capacity tier were allowed to move to the performance tier, they wouldn’t all fit there: there’s only about 8000 slices of “not-red” space in the performance tier and there’s over 10.000 slices of orange and red data in the capacity tier. So yeah… I still recommended to buy some additional SAS disks.

Analysis of the FAST VP slices on a per LUN basis.Now you might want to know which LUNs contain all that relatively cold data and are forcing it into the performance pool. Luckily there’s a different report for that: it shows the LUNs and their slices plus their temperatures. This way you can retrace which pinned LUNs have the most cold data and thus waste the most space in your high tier. The image on the right shows the pinned LUNs. You can see that one LUN is completely red; this LUN would stay in the highest tier even if you turned auto tiering back on. The two LUNs on the left have quite a lot of relatively cold (green) data; in this case switching these LUNs back to auto tiering would cause these green blocks to fall back to the capacity tier and to make room for other slices.

Looking at the LUN sheet the other way around you could also say: “There’s a lot of relatively hot (orange) data on the capacity drives and almost none on the performance tier. Which LUNs are this and if I unpin those Highest Tier LUNs, which LUNs would benefit?”

FAST VP Slice analysis orange

Given sufficient space in the performance pool these slices would immediately move up during the next relocation. Further investigation turned out that these LUNs are part of the Exchange environment and that it was indeed pretty slow… Also note that these LUNs have quite a lot of idle (grey) data. This is the best example that FAST VP actually works and saves you money: this idle data is now on capacity drives and not costing you a boatload of money on faster drives!

Storage pool design

Storage pools can be designed using one, two or three tiers. In case of a perfect predictable and constant workload a three-tier pool will always be best: a little bit of flash will service the extremely hot data, the near-idle data will be on NL-SAS drives and everything in the middle will be serviced by the SAS pool. Real life isn’t a lab however, so workloads will fluctuate and data will alternate between hot and cold and will get moved up and down across the tiers.

Be very aware of what your application demands from the storage. If you have one of those applications that bursts once a month, is idle the rest of the month and absolutely needs SAS response times during that burst, my advice would be to put it in a pool that does NOT have NL-SAS. You could trick FAST VP and pin it to the highest tier, but if you have a three tier pool with solid state drives this will result in the LUN claiming SSD capacity. There is currently no middle tier option in the tiering policy; there’s only auto-tier, highest tier, lowest tier and the no data movement option. *cough EMC request for enhancement cough*

This will most likely result in you designing a number of different pool layouts: for example one SSD&SAS pool for response time critical and fluctuating workloads, one three tier pool for generic use and maybe even a SAS & NL-SAS pool for cheap data that doesn’t justify the cost for flash capacity. Unfortunately you now have to very strictly plan the capacity expansions for each of these pools: currently the only way to reclaim drives is to move all data off the storage pool, destroy the pool and recreate new pools or expand existing ones. An option to remove a private RAID group from the pool would be ideal and could perhaps easily work with the existing slice relocation process… *cough second EMC request for enhancement cough*
The alternative which I see some customers use is keeping drives unbound and only adding them to the respective pool that needs extra capacity when there is need to. The downside is that these unbound drives aren’t adding performance to the pool.

Moral of the story: let FAST VP do its job! The more LUNs you pin to the highest tier, the less space is left for FAST VP to effectively use for hot slices. You’re effectively saying to FAST VP: “I know better than you!”. As you could see in the illustrations above, you usually don’t. Contrast this with a temperature overview from a pool that has fewer LUNs pinned to the high tier.

FAST VP Temperature by Tier no pinned LUNs

Almost no red and orange slices on the capacity pool and only a tiny bit of wasted green slices in the performance pool. Something you’ll immediately see in the disk utilizations: the NL-SAS drives are handling much less IOps and have some performance left to give.

So what are your thoughts on FAST VP and storage pools in general in a VNX? How do you design them? Do you pin a lot of LUNs to either the highest or lowest tier or do you just let FAST VP do its job autonomously? Leave a comment!