Rebalance VNX storage paths without downtime

Recently I ran into an environment with a couple of VNX5700 systems that were attached to the front-end SAN switches with only two ports per storage processor. The customer was complaining: performance was OK most of the time but at some times during the day the performance was noticeably lower. Analysis revealed that the back-end was coping well with the workload (30-50% load on the disks and storage processors). The front-end ports were a bit (over)loaded and spewing QFULL errors. Time to cable in some extra ports and to rebalance the existing hosts over the new storage paths!

Analyzer Helper Front-End ports

You WILL have to do this at some time…

It’s a pretty common scenario. A new storage system is installed in an environment that has a workload that’s not too hard on the system. To save some FC ports on the SAN switches only a couple of ports are cabled in. Servers are zoned on these ports.

Over time the workloads increase, both in number and intensity. To make the above situation worse, a VNX uses the lowest initially installed FE ports for MirrorView. In this case that’s ports A0 and B0. These ports are often shared with host traffic.
If host traffic increases, replication traffic will usually increase as well. Replication traffic will have to content with host traffic; if the port is overloaded it’s not unlikely that replication traffic will have to wait. In case of MV/S (synchronous replication), this will immediately cause a (write) delay to the rest of the servers.

So at some point in time extra ports are needed and the servers need to be rebalanced over the extra ports. Additionally, replication ports are dedicated for replication traffic alone. To make this happen, every server needs to be migrated to a new set of ports. Nobody likes downtime, so preferably this reconfiguration is done without any disruption to the applications. Luckily this is possible!

No downtime needed!

The good news is that, if all things go well, you will not need downtime to perform the migration: you will be using the basic failover functionality that is part of storage networks.
The bad part is that you will be doing A LOT of checking, double checking and some more checking to top it off. You do not want to unzone the wrong server or click a wrong button and cause a server to crash because you’ve accidentally yanked the disks away.

To start I made an inventory of all the servers attached to the storage systems: this is reasonably easy since you can export this list from Unisphere itself. Add a whiff of Excel and five minutes later I had a list of hosts and their new port assignments. Physically cable in the ports and you’re ready to go.

The basic steps for the migration are as follows:

  • Add new storage paths.
  • Remove old storage paths.

Easy right?! No worries, we’ll look into it with a bit more detail below 😉

Add new ports to the server

First of all: make sure your server is healthy! It should have four paths to the storage system: in our case A0, A1, B0, B1. A functional piece of multi-pathing software is vital (PowerPath is very useful in this). No weird errors should be present, etc.

Next, assign the new paths to the server. In our case all servers had to move since we were freeing up A0 and B0 for replication. This means we had 6 ports available for host I/O: A1-3, B1-3. We had servers that needed two new paths (A2 & B2 or A3 & B3) or even a complete new set of paths (A2, B2, A3 and B3). I spread out all servers across the available paths to make sure each FE port had approximately the same amount of initiators zoned to it.

If performed this migration together with my colleage Rob so we could double-check each others work. We opted to migrate one server at a time: this made sure we would minimize the impact if we ever did something wrong. I performed the zoning on the odd fabric, Rob took care of the even fabric. Additionally, I took care of the Unisphere and host reconfigurations with Rob double-checking.

After zoning in the ports and rescanning on the host itself, you should see the new paths appear in Unisphere (Hosts -> Initiators).
Make sure you’re in engineering mode at this point, since you will need it to get some of the displays and options. Enter engineering mode by pressing Ctrl+Shift+F12 and enter “messner” in the dialog box.

Register the storage paths

Notice the “Registered” column that displays “No” for the new paths. This means that the new paths are not yet assigned to a host in Unisphere. Either you can register them manually or do the sensible thing and let the Unisphere agent register them for you. Restart the Unisphere/Navisphere agent on the server and wait a couple of minutes, then refresh Unisphere. (Note: any recent VMware version (4.0+) will autoregister the paths instantly).

Once the paths are registered, they will still be in the “~management” storage group. You will need to enable them in the actual storage group that contains the host.

Enable the storage paths in the storage group

Go to the storage group that contains the server (hosts tab), press the Advanced button (only visible in Engineering mode). If you have multiple servers in the storage group, browse to the server in question. You will see that the new paths are unchecked: check them to enable access. In this example we added 4 new paths, bringing the total to 8.

Rescan on the host side to make the server aware of the extra available paths to the server. If you have PowerPath installed it might take a minute for it to realize things have changed and update the interface.

Remove the old paths

If you expand the Storage Arrays in the PowerPath Console, you’ll now see the additional storage ports that are available to the server. Next step is removing the old paths and bringing the total available paths back to four.

Little sidestep: you don’t want more than four paths to a server unless there are specific requirements like for example because of bandwith requirements. More paths means more resources are consumed and failover between paths may be slower since each and every available path will be checked. More information available in the various whitepapers on PowerLink/EMC Support.

Gracefully phase out the old storage paths by switching them to standby mode.

I wanted to make this step of removing paths as graceful as possible. So instead of just yanking out the old paths by removing the zoning, I put them in standby mode first. This causes PowerPath to only use them once all active paths are lost. Hence, no I/O is in transit on the old paths.

Once all old paths are in standby mode, remove the zoning. It will take a minute or so for PowerPath to update the console. Clean up the PowerPath console by removing the old storage paths.

At that point you’ll be greeted with a couple of failed paths and degraded LUNs. To clean this all up (these paths will not come back online ever), select the dead paths, right-click and select “Remove from Config”. If all things go well, they will be removed from the console and PowerPath will stop complaining.

If you get a “Could not complete the requested operation” popup, the dead path will stay in the console. This is purely cosmetic: your server will run pretty much indefinitely with the dead path still showing up. If you want to clear this you will probably have to reboot the server (or worst case, reinstall PowerPath). During our migration, roughly 20 servers removed the old paths without problem. The VMware servers didn’t have PowerPath installed. And a handful of Windows servers would refuse to remove one or more path. Perhaps it’s a PowerPath bug or something else in the OS; not really an immediate issue though.

After cleaning up the host, you will have to clean up the old initiators/paths in Unisphere as well.Remove the old storage paths (or initiators) from Unisphere on the VNX.

The old paths are still registered but logged out. Since they will never log in again, go ahead and deregister them using the obvious button. (Note: contrary to the old Connectivity Status in Navisphere, you can now select multiple paths and deregister them simultaneously).

All done!

That’s it! Your server will now have access to the storage system over the new paths. No downtime was required (although it is advisable to still do this in a low I/O / maintenance window kind of time slot, if only from a business perspective in case you click the wrong button). To summarize, you will have to perform the following steps:

  1. Check your server: make sure it’s healthy!
  2. Zone the new storage paths to your server.
  3. Rescan on your host to have it login to the VNX on the new paths. 
  4. Restart Unisphere agent to register the new paths (or do it manually).
  5. Enable the new (now registered) paths in the storage group.
  6. Rescan again on your host to use the new paths.
  7. Put your old storage paths on standby to make sure there’s no active I/O on them.
  8. Remove the zoning for the old paths and perform another rescan.
  9. Clean up your PowerPath config.
  10. Clean up your Initiator config on the VNX.

That’s it! The above steps are for a Windows host with PowerPath installed. Other operating systems may have slightly different steps: for example VMware 4.0 and newer will autoregister so you can skip step 4 (and step 9 if you use native VMware MPIO).

If you have any comments or questions or might even want to share your quick manual for a different OS, feel free to comment down below!