Recently I’ve ran a project for a new EMC Symmetrix VMAX 10k installation. The install was a breeze and the migration of data to the system fairly straightforward. The customer saw some good performance improvements on the storage front and there is plenty of capacity in the system to cater for immediate growth. Yet when I opened the Unisphere for VMAX interface and browsed to the performance tab, my heart skipped a beat. What are those red queue depth utilization bars? We were seeing good response times, weren’t we? Were we at risk? How about scalability? Lets dig deeper and find out.
Several weeks ago we performed a resiliency test on a VMAX10k that was about to be put into production. The customer wanted confirmation that the array was wired up properly and would respond correctly in case there were any issues like a power or disk failure. This is fairly standard testing and makes sense: better to pull a power plug and see the array go down while there is no production running against it, right?
We pulled one power feed from each system bay and storage bay. Obviously: no problem for the VMAX, it dialed out, notified EMC it lost power on a couple of feeds and that’s it. Next up we yanked out a drive: I/O continued, the array dialed out to EMC that something was wrong with a drive, but… we didn’t see anything in Unisphere for VMAX!
Not all data is accessed equally. Some data is more popular than other data that may only be accessed infrequently. With the introduction of FAST VP in the CX4 & VNX series it is possible to create a single storage pool that has multiple different types of drives. The system chops your LUNs into slices and each slice is assigned a temperature based on the activity of that slice. Heavy accessed slices are hot, infrequently accessed slices are cold. FAST VP then moves the hottest slices to the fastest tier. Once that tier is full the remaining hot slices go to the second fastest tier, etc… This does absolute wonders to your TCO: your cold data is now stored on cheap NL-SAS disks instead of expensive SSDs and your end-users won’t know a thing. There’s one scenario which will get you in trouble though and that’s infrequent, heavy use of formerly cold data…
MCx FAST Cache is an addition to the regular DRAM cache in a VNX. DRAM cache is very fast but also (relatively) expensive so an array will have a limited amount of it. Spinning disks are large in capacity and relatively cheap but slow. To bridge the performance gap there are the solid-state drives; both performance wise and cost wise somewhere between DRAM and spinning disks. There’s one problem though: a LUN usually isn’t 100% active all of the time. This means that placing a LUN on SSDs might not drive your SSDs hard enough to get the most from your investment. If only there was software that makes sure only the hottest data is placed on those SSDs and that will quickly adjust this data placement depending on the changing workload, you’d have a lot more bang for your buck. Enter MCx FAST Cache: now with an improved software stack with less overhead which results in better write performance.
When troubleshooting performance in a CLARiiON or VNX storage array you’ll often see graphs that resemble something like this: write cache maxing out to 100% on one or even two storage processors. Once this occurs the array starts a process called forced flushing to flush writes to disk and create new space in the cache for incoming writes. This absolutely wrecks the performance of all applications using the array. With the MCx cache improvements made in the VNX2 series there should be a lot less forced flushes and a much improved performance.