Help your VNX storage engineer: Enable performance data logging!

Troubleshooting any system requires information about the configuration of the system and how it’s behaving over time. Unsurprisingly, this is also valid when you’re troubleshooting performance on a Dell EMC VNX. So help your storage engineer, and enable performance data logging on the VNX!

Enable performance data logging

The steps to enable performance data logging are very simple. Log in to Unisphere, and browse to System -> Monitoring and Alerts -> Statistics for Block. Next, click the link for Performance Data Logging. This will open an interface for Data Logging.

VNX performance data logging

You’ll see several fields in the interface:

  • Target: the system you’re adjusting the settings on, will allow you to switch to other VNX systems in your Unisphere domain. Be careful when quickly switching between systems; the interface sometimes needs a second or two to refresh (yay for Java!)…
  • Real Time Interval: the interval with which the system will collect statistics and display them in the performance charts. Default is 60 seconds (1 min).
  • Archive Interval: the interval with which the system will collect statistics and store them in archive files for later analysis. Default is 300 seconds (5 min).
  • Periodic Archiving: will ensure that the array creates a performance data archive (or NAR file). Default is enabled, so leave it enabled.
  • Stop automatically after: imposes a time limit on the performance data logging. Typically this is in the range of 1-7 days and enabled.

You can help your storage engineer by:

  • Keeping the Interval settings default at 60s and 300s for the real time and archive intervals respectively. Decreasing the intervals gives you more accurate information because short spikes of IO are better captured. As a downside, it also increases the CPU load of the storage processors and the size of the NAR files. Typically, with default intervals, the added CPU load is near negligible.
  • Keep Periodic Archiving enabled.
  • Keep the performance data logging running 24/7/365 by unchecking “Stop Automatically After” and clicking Start. The status should show “running”.

This will give you a solid basis for troubleshooting performance on a VNX. A performance engineer can use the generated NAR files to look back at your system performance for quite some time, typically several weeks to months. The size of the NAR files depends on the amount of objects (LUNs, disks, ports etc) in your system. Since space to store the NAR files is limited, at some point the VNX will start clearing old NAR files to make space for the new ones.

There is one exception to the interval rule, which is when you’re hunting for intense, short spikes in IO.  This will always be a “step 2” when troubleshooting, and in that case it’s acceptable to lower the intervals as far as needed. Keep in mind though that this might impact system performance, and as such, consider re-enabling the “Stop Automatically After”-option. Then manually restart logging with normal intervals afterwards.

My thoughts on VNX Performance Data Logging

It’s very frustrating to arrive at a customer site for performance troubleshooting and not having any historical information to work with.  First of all, it makes it very hard to spot trends in storage utilization. This impacts the quality of the advice negatively; and we usually want to help the customer the best we can. Secondly, if the problem description is vague, it might make the troubleshooting unnecessarily complex, time consuming and potentially incorrect.

Even if you don’t have any pressing performance issues, you can also use the information for future expansions. For example, assess which type of disks to purchase for a specific FAST VP storage pool expansion. So keep that data logging active!

  • Brad

    Doesn’t this have a small performance impact trade-off on your VNX?

    • Jon Klaus

      At the time of the CLARiiON (= 10 years ago) it would increase SP utilization in the range of ~1%. With the much faster CPUs and the MCx multithreading nowadays, I expect that number would have decreased even further. So in terms of real-life impact, it’s negligible.

      • Brad

        Got it. Thanks