Troubleshooting any system requires information about the configuration of the system and how it’s behaving over time. Unsurprisingly, this is also valid when you’re troubleshooting performance on a Dell EMC VNX. So help your storage engineer, and enable performance data logging on the VNX!
Excelero Storage launched their NVMesh product back in March 2017 at Storage Field Day 12. NVMesh is a software defined storage solution using commodity servers and NVMe devices. Using NVMesh and the Excelero RDDA protocol, we saw some mind blowing performance numbers, both in raw IOps and in latency, while keeping hardware and licensing costs low.
I can’t recall the last storage system installation that didn’t have some amount of solid state drives in its configuration. And for good reason: we’re rapidly getting used to the performance benefits of SSD technology. Faster applications usually result in real business value. The doctor treating patients can get his paperwork done faster and thus has time for more patients in a day. Or the batch job processing customer mailing lists or CGI renderings completes sooner, giving you a faster time to market.
To reduce the application wait times even further, solid state drives need to be able to achieve even lower latencies. Just reducing the media latency won’t cut it anymore: the software component in the chain needs to catch up. Intel is doing just that with Storage Performance Development Kit (SPDK).
Earlier this year Nimble Storage announced their all-flash array called the Predictive Flash Platform; you can read my thoughts on the launch over here. InfoSight is one of the core components of that announcement, which is why we had the opportunity for a fireside chat with the Nimble Storage data science team. We discussed the workings of InfoSight & VMVision and how this relates to actual benefits for an owner of a Nimble Storage array. This post will also touch on some of the key points discussed during the later Storage Field Day 10.
Recently I’ve ran a project for a new EMC Symmetrix VMAX 10k installation. The install was a breeze and the migration of data to the system fairly straightforward. The customer saw some good performance improvements on the storage front and there is plenty of capacity in the system to cater for immediate growth. Yet when I opened the Unisphere for VMAX interface and browsed to the performance tab, my heart skipped a beat. What are those red queue depth utilization bars? We were seeing good response times, weren’t we? Were we at risk? How about scalability? Lets dig deeper and find out.
Several weeks ago we performed a resiliency test on a VMAX10k that was about to be put into production. The customer wanted confirmation that the array was wired up properly and would respond correctly in case there were any issues like a power or disk failure. This is fairly standard testing and makes sense: better to pull a power plug and see the array go down while there is no production running against it, right?
We pulled one power feed from each system bay and storage bay. Obviously: no problem for the VMAX, it dialed out, notified EMC it lost power on a couple of feeds and that’s it. Next up we yanked out a drive: I/O continued, the array dialed out to EMC that something was wrong with a drive, but… we didn’t see anything in Unisphere for VMAX!
Not all data is accessed equally. Some data is more popular than other data that may only be accessed infrequently. With the introduction of FAST VP in the CX4 & VNX series it is possible to create a single storage pool that has multiple different types of drives. The system chops your LUNs into slices and each slice is assigned a temperature based on the activity of that slice. Heavy accessed slices are hot, infrequently accessed slices are cold. FAST VP then moves the hottest slices to the fastest tier. Once that tier is full the remaining hot slices go to the second fastest tier, etc… This does absolute wonders to your TCO: your cold data is now stored on cheap NL-SAS disks instead of expensive SSDs and your end-users won’t know a thing. There’s one scenario which will get you in trouble though and that’s infrequent, heavy use of formerly cold data…
MCx FAST Cache is an addition to the regular DRAM cache in a VNX. DRAM cache is very fast but also (relatively) expensive so an array will have a limited amount of it. Spinning disks are large in capacity and relatively cheap but slow. To bridge the performance gap there are the solid-state drives; both performance wise and cost wise somewhere between DRAM and spinning disks. There’s one problem though: a LUN usually isn’t 100% active all of the time. This means that placing a LUN on SSDs might not drive your SSDs hard enough to get the most from your investment. If only there was software that makes sure only the hottest data is placed on those SSDs and that will quickly adjust this data placement depending on the changing workload, you’d have a lot more bang for your buck. Enter MCx FAST Cache: now with an improved software stack with less overhead which results in better write performance.
When troubleshooting performance in a CLARiiON or VNX storage array you’ll often see graphs that resemble something like this: write cache maxing out to 100% on one or even two storage processors. Once this occurs the array starts a process called forced flushing to flush writes to disk and create new space in the cache for incoming writes. This absolutely wrecks the performance of all applications using the array. With the MCx cache improvements made in the VNX2 series there should be a lot less forced flushes and a much improved performance.
“Check your email ;)”. That was the first Twitter DM I read one sleepy morning in June. It’ll suffice to say, a minute later I was wide awake: I was chosen to represent the EMC Elect at the EMC “Redefine Possible” MegaLaunch event in London (UK)! I knew about these launch events because my colleague Rob attended one last year in Milan. Excitement started building and a couple of hours later I figured out I wasn’t going alone…