VNX2 hands-on (a.k.a. Who stole my SPS?!)

A VNX5400 installed in the rackIn September 2013 EMC announced the new generation VNX with MCx technology (or VNX2). The main advantage of the new generation is a massive performance increase: with MCx technology the VNX2 can effectively use all the CPU cores available in the storage processors. Apart from a vast performance increase there’s also a boatload of new features: deduplication, active-active LUNs, smaller (256MB) chunks for FAST VP, persistent hotspares, etc. Read more about that in my previous post.

It took a while before I could get my hands on an actual VNX2 in the field. So when we needed two new VNX2 systems for a project, guess which resources I claimed to install them. Me, myself and I! Only to have a small heart attack upon unboxing the first VNX5400: someone stole my standby power supplies (SPS)!

Bye SPS, hello BBU!

A quick RTFM later I was reassured (and my knowledge refreshed): the new VNX2 systems don’t use an SPS but instead use a Li-Ion BBU installed in the DPE. The exception is the VNX8000 which still has an SPS due to higher power demands; the SPS does use Li-Ion technology though!

VNX5400 DPE rear view w/ BBU at position 1 and 9.
BBU located at 1 and 9 for SPB and SPA respectively.

“Aha!”, I can hear you think. “Since the beginning of time EMC has been advising against keeping only the write cache powered in case of a power outage. They claim you’ll lose data when there’s an extended outage and that dumping the write cache contents to disk is far superior! And now they’re doing the same like all those competitors! BOO!”

Well.. no. When the DPE loses power, the BBU will provide power to the SP CPU module, base module, fans and the first four disk drives (or vault drives). In practice this is the entire DPE with all drives included. It has enough power to perform two complete write cache dumps to vault. So in case of a power outage, your VNX2 write cache protection is identical to the VNX1; it’s just uses a smaller and lighter battery. Good news in my book, since I no longer need to haul lead bricks into the data center!

The VNX2 also handles power glitches a bit better, not triggering a write cache dump if the glitch is resolved in 15 seconds or less. This should help with transitions to generator power. I’m not entirely sure how useful this is since your data center should be on UPS power anyway…

Talking about cache…

We’ve seen a trend in storage systems doing things faster, better and easier for some time now. If you’ve managed CLARiiON systems and tried to squeeze every last bit of drive performance out of it, you’ve probably chopped LUNs into pieces, spread ’em across RAID groups and glued them together again using striped MetaLUNs. The CX4 generation introduced storage pools, effectively having FLARE or VNX OE do all that automatically and better.

Why stop there? Up to and including VNX OE 32 you had to specify the cache page size and ratio of write and read cache. Usually this meant maxing out write cache and configuring a couple of hundred MB of read cache, but depending on the workload it might be beneficial to change it to something else. This could go as far as reconfiguring the cache multiple times a day for specific workloads. Luckily this is no longer needed with the VNX2: the system will automatically manage the ratio depending on the workload, utilizing the SP memory most efficiently.

Cache management for the VNX1 on the left, VNX2 on the right.
VNX1 on the left with SP Cache and SP Memory tabs, fully automated cache management on the VNX2 on the right.

VNX2 Persistent hotspares

Something else you’ll probably notice is that you can no longer create hotspares. The VNX2 abandons the concept of dedicated, pre-configured hotspares and instead uses persistent hotspares. In the VNX1 generation a drive fails, the hotspare kicks in, an operator replaces the faulty drive and the data from the hotspare is copied back to the original location onto a fresh drive. The hotspare now assumes the role of hotspare again, ready for the next array-wide failure.

VNX2 persistent hotspares; any empty disk in the array is a hotspare.
Note: the 1 per 60 ratio is a bug; the proper ratio is 1:30. This will be fixed in an upcoming VNX OE release.

In the VNX2, any unused disk is eligible for hot spare. As soon as a drive fails and after a 5 minute waiting period, the rebuild will initiate to any unused, eligible drive in the array.

VNX2 Drive Sparing MatrixThis drive becomes a permanent member of the RAID group in favor of the failed drive. Once the original drive is replaced it is unused and therefore a valid candidate for hotsparing in the future. This does mean your physical drive layout might change over time; there are some options to copy the data back to the original location: either copy it back using the CLI with the copytodisk command or use the MCx DriveMobility feature. Read more about that in Dave Rings post.