CloudIQ is looking out for your storage system’s health

A few week ago we visited Dell EMC in Boston for Storage Field Day 16. Susan Sharpe presented CloudIQ to us. If you’re unfamiliar with CloudIQ: it keeps track of your storage system performance, health, capacity and notifies you in case of any anomalies. If you’ve got a Dell EMC Unity storage system, you can already use it for free. And it’s also being actively developed, so expect many new features to come into production over time!

Spotting problems before they occur

CloudIQ functions like a fitness tracker for your storage systems. It uses proactive health scores to help you identify risks and problems before they cause a real problem.

Second, it uses predictive analytics to project capacity and performance utilization. This also avoids problems, but more in a data and performance growth area.

Lastly, it detects anomalies on your system, using machine learning. This could be a system that’s suddenly causing a big increase in IOps to your array. Or it could me the opposite: IOps drop because an application has gone down.

CloudIQ anomaly

In the example above I just started to migrate a bunch of VMs to a new, empty Unity storage system. This shows up as an anomaly, because CloudIQ got used to a system load of zero over the last 2 months.

Now if we switch to a system that has been in use for a few months, we can see a different type of graph.

CloudIQ no anomaly

Here, you can see that the bandwidth (blue line) falls within the historic seasonality. You can see that the system expects a spike in bandwidth around midnight (highest grey block), which is caused by additional backup jobs starting.

Over time, CloudIQ will learn and adjust based on the new workloads.

CloudIQ lives in the cloud!

CloudIQ uses ESRS to transfer vital information on the array’s health to the CloudIQ systems. Depending on the type of data, the collection interval varies. SPCollects show the inventory of the system and are more static: they are transferred once a day. Capacity and configuration changes are more dynamic and transferred every hour. Alerts and performance statistics are collected and transferred every 5 minutes, offering you a more granular set of data.

All this data is being transferred to Southborough, into the Dell EMC IT Cloud. In good IT practice, that data replicates to North Carolina for DR. CloudIQ is built on Pivotal Cloud Foundry. Data is retained for 2 years, but it’s progressively less granular than the initial 5 minute collection interval once you start to look at older data. Which makes sense from a data collection standpoint, else you’d end up with a gigantic database.

Active development

CloudIQ currently supports only the Dell EMC Unity storage systems. Nevertheless, there are already 5000+ arrays connected, with about a 100+ new arrays connecting to CloudIQ every week.

Dell EMC sells more than just Unity systems, so it’s no surprise that additional array types are on the roadmap and coming soon: SC Series, XtremIO, VMAX and PowerMax. And the product is in active development. Susan explained that changes are pushed to the production environment every Tuesday and Thursday. At the very least, the CloudIQ team commits to a release every month.

And I can confirm this from experience. CloudIQ advises you on which storage/LUNs you might be able to reclaim. However, it incorrectly showed our LUNs that are replication targets as reclaimable. So I pressed the little button to submit a feature request. Within half a day I received a email confirmation (from an actual human being!), thanking me for reporting it and explaining that they would fix it. And lo and behold, about 1,5 months later:

This was a breath of fresh air. I like to help improve products, so I regularly submit enhancement or feature requests. Usually though, you get the feeling that they are transferred to /dev/null, or sucked into a black hole. The CloudIQ team instead welcomes your feedback, and actually does something with it. So if you’re missing a feature or spot a bug: use that feedback button!

My thoughts on CloudIQ

I use CloudIQ at least on a weekly basis to keep track of our storage systems. While I’m not involved in the day-to-day operations, I’m currently migrating a lot of data to Unity all-flash systems. So I need to be aware of when a target array is approaching its capacity or performance limitations. And it’s easy to pull this data out of CloudIQ.

CloudIQ is free. And you should be using ESRS anyway, from a support perspective. So the barrier is low to start using CloudIQ. And with a very responsive development team constantly adding new features, this could become a very vital tool in your IT operations toolbox.

Check out the presentations over here (bottom two videos). And René was first to write a blog post about the Dell EMC presentations, which you can find here. And if you’ve got any questions or remarks, leave them in the comments below!

Disclaimer: I wouldn’t have been able to attend Storage Field Day 16 without GestaltIT picking up the tab for the flights, hotel and various other expenses like food. I was however not compensated for my time and there is no requirement to blog or tweet about any of the presentations. Everything I post is of my own accord and because I like what I see and hear.