FAST VP: Let it do its job!

FAST VP in actionNot all data is accessed equally. Some data is more popular than other data that may only be accessed infrequently. With the introduction of FAST VP in the CX4 & VNX series it is possible to create a single storage pool that has multiple different types of drives. The system chops your LUNs into slices and each slice is assigned a temperature based on the activity of that slice. Heavy accessed slices are hot, infrequently accessed slices are cold. FAST VP then moves the hottest slices to the fastest tier. Once that tier is full the remaining hot slices go to the second fastest tier, etc… This does absolute wonders to your TCO: your cold data is now stored on cheap NL-SAS disks instead of expensive SSDs and your end-users won’t know a thing. There’s one scenario which will get you in trouble though and that’s infrequent, heavy use of formerly cold data…

FAST VP Slice Relocation

FAST VP cuts your LUNs into slices: a CX4 and first generation VNX use a 1GB slice size, a VNX2 with MCx technology uses 256MB slices. The array constantly monitors the amount of I/O going to a slice and assigns a slice a temperature. Once every day during the relocation schedule all slices are relocated to their respective tier. FAST VP will always try to stuff as much data in the faster tiers as possible: if you have a storage pool with 10TB of EFDs and 20TB of SAS but only 5TB of data, all of it will be in the EFD tier. FAST VP also tries to keep 10% of free space in every tier to make sure there’s enough room for the creation of new slices for thin provisioned LUNs or if you create a new LUN.

Now picture the following scenario: a server or application is generating a lot of I/O. Slices are promoted to the faster tiers, the array is performing well and the response time to the application is OK, For the rest of the month the application isn’t used because the company isn’t generating any mailings. The data cools down, the slices are demoted to the capacity tier and stored on slower NL-SAS drives. At the beginning of the new month the application is brought back to life and starts generating I/O again. FAST Cache might be able to quickly adjust to this changing workload but FAST VP can only adjust to the changing workload during the relocation window, which is once every day. For the rest of the day the storage response time for that application is slow (NL-SAS speeds), the app owners complain and you’re unhappy.

FAST VP Tiering PolicyTo resolve these kinds of issues there’s a setting you can adjust on the LUN: auto-tiering policy. Normally it should be in the automatic mode of “start high then auto-tier”. If your application behaves like the one described above you might be tempted to put it in “Highest Tier” mode to force FAST VP to put all the slices in the highest tier. But be aware that you are now claiming space in the highest tier for slices that might never need the performance. Other, hotter slices will as a result be pushed down. The problem becomes even more apparent if you have a three-tier pool: SSD, SAS and NL-SAS. There is (currently) no way to pin a LUN to the middle tier. Selecting the “Highest Tier” option in a 3-tier pool moves all your data to SSDs which are fast but also very expensive.

This week I was analyzing a customer array and I noticed a high amount of data being relocated each night. I knew the majority of the performance issues were spindle related: they’ve got quite a lot of high capacity NL-SAS drives and relatively few SAS drives. It would have been easy to just recommend “Buy more disks and expand your performance tier!”. Analyzing the graphs I noticed the following:

FAST VP Temperature by Tier

As you can see the performance tier has much of the red (hot) slices, a little bit of orange and some yellow and green. The capacity tier however ALSO has red slices, a lot of orange, some yellow, etc. Performance wise those red and orange slices should be on the performance tier! Except they can’t because that space is occupied by (pinned) green slices.

As a side note: even if all the hot slices on the capacity tier were allowed to move to the performance tier, they wouldn’t all fit there: there’s only about 8000 slices of “not-red” space in the performance tier and there’s over 10.000 slices of orange and red data in the capacity tier. So yeah… I still recommended to buy some additional SAS disks.

Analysis of the FAST VP slices on a per LUN basis.Now you might want to know which LUNs contain all that relatively cold data and are forcing it into the performance pool. Luckily there’s a different report for that: it shows the LUNs and their slices plus their temperatures. This way you can retrace which pinned LUNs have the most cold data and thus waste the most space in your high tier. The image on the right shows the pinned LUNs. You can see that one LUN is completely red; this LUN would stay in the highest tier even if you turned auto tiering back on. The two LUNs on the left have quite a lot of relatively cold (green) data; in this case switching these LUNs back to auto tiering would cause these green blocks to fall back to the capacity tier and to make room for other slices.

Looking at the LUN sheet the other way around you could also say: “There’s a lot of relatively hot (orange) data on the capacity drives and almost none on the performance tier. Which LUNs are this and if I unpin those Highest Tier LUNs, which LUNs would benefit?”

FAST VP Slice analysis orange

Given sufficient space in the performance pool these slices would immediately move up during the next relocation. Further investigation turned out that these LUNs are part of the Exchange environment and that it was indeed pretty slow… Also note that these LUNs have quite a lot of idle (grey) data. This is the best example that FAST VP actually works and saves you money: this idle data is now on capacity drives and not costing you a boatload of money on faster drives!

Storage pool design

Storage pools can be designed using one, two or three tiers. In case of a perfect predictable and constant workload a three-tier pool will always be best: a little bit of flash will service the extremely hot data, the near-idle data will be on NL-SAS drives and everything in the middle will be serviced by the SAS pool. Real life isn’t a lab however, so workloads will fluctuate and data will alternate between hot and cold and will get moved up and down across the tiers.

Be very aware of what your application demands from the storage. If you have one of those applications that bursts once a month, is idle the rest of the month and absolutely needs SAS response times during that burst, my advice would be to put it in a pool that does NOT have NL-SAS. You could trick FAST VP and pin it to the highest tier, but if you have a three tier pool with solid state drives this will result in the LUN claiming SSD capacity. There is currently no middle tier option in the tiering policy; there’s only auto-tier, highest tier, lowest tier and the no data movement option. *cough EMC request for enhancement cough*

This will most likely result in you designing a number of different pool layouts: for example one SSD&SAS pool for response time critical and fluctuating workloads, one three tier pool for generic use and maybe even a SAS & NL-SAS pool for cheap data that doesn’t justify the cost for flash capacity. Unfortunately you now have to very strictly plan the capacity expansions for each of these pools: currently the only way to reclaim drives is to move all data off the storage pool, destroy the pool and recreate new pools or expand existing ones. An option to remove a private RAID group from the pool would be ideal and could perhaps easily work with the existing slice relocation process… *cough second EMC request for enhancement cough*
The alternative which I see some customers use is keeping drives unbound and only adding them to the respective pool that needs extra capacity when there is need to. The downside is that these unbound drives aren’t adding performance to the pool.

Moral of the story: let FAST VP do its job! The more LUNs you pin to the highest tier, the less space is left for FAST VP to effectively use for hot slices. You’re effectively saying to FAST VP: “I know better than you!”. As you could see in the illustrations above, you usually don’t. Contrast this with a temperature overview from a pool that has fewer LUNs pinned to the high tier.

FAST VP Temperature by Tier no pinned LUNs

Almost no red and orange slices on the capacity pool and only a tiny bit of wasted green slices in the performance pool. Something you’ll immediately see in the disk utilizations: the NL-SAS drives are handling much less IOps and have some performance left to give.

So what are your thoughts on FAST VP and storage pools in general in a VNX? How do you design them? Do you pin a lot of LUNs to either the highest or lowest tier or do you just let FAST VP do its job autonomously? Leave a comment!

  • Allen Ward

    Great post Jon! FAST VP configuration is always a somewhat contentious issue but I gree with pretty much everything you have to say here. Yes there are situations in which you need to customize and pin some data but for the most part “get out of the way and let FAST VP do it’s job”.
    Let’s start with a resounding YES to both of your less than subtle enhancement requests. Those are both things that have been brough up before and I fail to understand why they haven’t been implemented yet.
    As for how we do things… I know we’ve had this discussion before, but I’ll post it again here in response to (and in the contexct of) your post.
    We generally run our VNX arrays with four basic pools.
    * Pool_0 – For transaction logs of databases and database type apps. This is pure SAS 15k in a RAID 1/0(4+4) configuration
    * Pool_1 – For ESX clusters and covering almost all ESX allocations except for database type VMs. This is three tiered with EFD in a RAID5(4+1), SAS 15K in a RAID5 (8+1), and NL-SAS in RAID6(14+2)
    * Pool_2 – For all other high performance demand applications (including the database table space for DB VMs). This pool is configured identically to Pool_1
    * Pool_3 – For all moderate to low performance demand allocations. This pool has no EFD in it but the SAS and NL-SAS are configured the same as Pool_1 & 2
    I’m often asked why Pool_1 & 2 are kept seperate. That was a concious choice early on in the design process so that if we ever get to the point of allowing ESX to self provision they can only touch the disk we put in “their pool”. It isolates ESX capacity neatly.
    As for pinning anything, we just don’t. For our workloads there are not enough reasons to outweigh the negative aspects of pinning.

    • Jon Klaus

      Allen, thanks a lot for your praise and constructive feedback! I’m going to resubmit those RfE’s via the EMC Elect channels; many of the customers that I come across complain about missing these features so there is definitely demand for them.
      As for your pool layout and specifically the separate ESX pool: makes perfect sense! Your Pool_2 does have NL-SAS though; don’t your DBA’s complain about low performance once they start accessing formerly stale data?

      • Allen Ward

        Good question regarding the NL-SAS in Pool_2. The missing factor here is the Symmetrix VMAX we use as well with FAST VP. Anything that isn’t a good fit for the VNX or requires the more stringent DR capabilities offered by SRDF is on a VMAX 20K. Since FAST VP is a very different beast on Symmetrix I won’t go into the pool/policy config here, but it would make for a good discussion next time we sit down over a few drinks.

        • Sunil Gautam

          Hi

          I have one question on FP data movement . I have configured different RAID in all 3 Pools .

          Pool _1 – EFD – RAID10

          Pool_2 – FC – RAID 5

          Pool_3 -SATA – RAID 6

          I have set the FAST Policy to 50% /30%/20% .

          Lets suppose a block residing in FC pool needs to be promoted to EFD pool so Data will be transfered between RAID5 to RAID10 .how it will work ?

          • Jon Klaus

            Hi Sunil,

            Since you’re configuring percentages in the FAST Policy, I’m assuming you’re using a VMAX instead of a VNX. Regardless of the system though, you can mix the RAID levels in a FAST Policy or tier without problems.

            Once data needs to be promoted to a higher tier, that data is read from (in your example) the FC tier and rewritten to the EFD tier. The system doesn’t just move the RAID5 tracks from the FC disks to the EFD disks: instead it distills the data from the RAID level (read the R5 blocks, glues them together, blablabla) and then writes them out to RAID10 (writes one copy and another mirror on another disk.

            Overly simplified: it reads the R5 tracks into memory (resulting in maybe a nice powerpoint presentation) and then just writes that powerpoint presentation in R10 blocks to the EFD tier. So the “conversion between RAID levels” so to say happens in the storage processor.

            Does that clarify it or make it worse? 🙂

          • Sunil Gautam

            thanks a lot for your explanation . And all this task is carried out by FAST controller ..right ?
            Also the performance data is collected by micro code and analysed by FAST controller and implemented by FAST VP Engine in VMAX . So can you tell me where all data collected during this process like Performance Matrix are stored ?

          • Jon Klaus

            Correct, that’s all done by the controller and completely transparent to the end-user and without input from the admin. The FAST Controller looks which data blocks are hot and distributes it across the tiers accordingly.
            I’m actually not entirely sure if and where the performance data is stored; I think the actual performance data is truncated after the FAST controller has determined a “block temperature”, keeping just the temperatures for the next round of calculations.

          • Sunil Gautam

            I think this data resides in SFS volumes as SFS volumes are created to improve the system performance .

  • Christian Kotte

    Hello,
    how can I generate those slice temperature reports?
    Regards
    Christian

    • Jon Klaus

      Hi Cristian, go to https://app.mitrend.com/emc/ and follow the instructions to gather the FAST VP slice information files from your array. Basically the following commands:

      Gathering
      array configuration information:

      naviseccli
      -User -Password -Address -Scope 0
      arrayconfig -capture -output arrayconfig.xml

      Gathering
      FAST statistics:

      naviseccli
      -User -Password -Address -Scope 0 analyzer
      -archive –fastdata

      Zip those files and upload the zip as a EMC VNX – Skew workpackage and you should get a report a couple of hours later.