The last couple of months I’ve been busy consolidating a couple of European data centers to one location in The Netherlands. Technically this meant we had to migrate a large number of virtual machines with as little downtime as possible across WAN links with varying speeds (30Mbit up to 500Mbit). There are a number of methods to go about this, but we chose to use the vSphere Replication infrastructure which is included in vSphere 5.x for free. Unfortunately there are a couple of downsides in the management interface which become a pain if you have to manage several hundred replications…
So why did we choose vSphere Replication?
We had a couple of criteria which needed to be met by the replication product:
- Each site was connected to the main site with a primary WAN link running all the production (=user) traffic and a secondary WAN link for backup purposes. We needed to be able to segregate the replication traffic from the production traffic to avoid dropping production traffic due to a full link. This was easy to do with vSphere Replication (VR): redirect the subnet with the VR appliances over the secondary link and Bob’s your uncle.
- The product needed to support a big bang migration instead of a piece-meal approach. This was primarily due to the network layout: after a machine was cut over to the new site it would automatically start using the secondary link and would thus be affected by later replications. With vSphere Replication we were able to do an initial sync of all the data for all the machines and then enter a maintenance cycle which only transferred the changed data. At a later point in time we would then cut over all the machines to the new site, “fix” the network and it would be business as usual again.
- The product needed to be resilient to link failures and/or other events: we couldn’t afford to start all over again if a link flaps after copying 30TB of data. vSphere Replication uses under-water snapshots to make sure an incremental copy is either fully applied or NOT applied at all (hence preserving the old, know-good state).
- It needed to be cheap or free: the customer had Enterprise Plus licenses so that wasn’t an issue.
- And for the sites with the truly tiny WAN links, it would be nice if we could seed the replication in some way to speed things up (e.g. restore a virtual machine from backup at the destination site, sync the latest changes, cut over).
We had several other products that we looked at: VEEAM Backup & Recovery does the same thing as vSphere Replication but needed a pretty steep license. Long distance (storage) VMotions couldn’t do a big bang migration and also couldn’t recover seamlessly from a mid-migration link failure.
vSphere Replication works well, but…
…be prepared for a lot of mind-numbing clicking in the vSphere web client! For example: our customer wanted to move from traditional thick provisioning to thin provisioning. It isn’t possible to do this on a virtual machine basis: instead you have to select the “advanced disk configuration” option and do this on a disk by disk basis.
In the image above you can see the interface for a virtual machine with 5 disks. Each time the web client will load all the datastores registered under the destination vCenter server. For us this took between 10-15 seconds each time due to the large amount of datastores and the fact this data apparently isn’t cached. If you wanted to put certain disks on different datastores (i.e. split a VM across multiple data stores), this would result in another loading cycle. So you’ll be staring a lot at an interface that is constantly loading… especially if you have virtual machines with a large amount of disks.
Which brings be to point 2: say your source virtual machine is split across multiple datastores. It is entirely possible that some of those .vmdk files have identical names, since vSphere starts renumbering on a datastore basis. Datastore 1 has ExampleVM.vmdk and ExampleVM_1.vmdk and datastore 2 also has a ExampleVM.vmdk file. vSphere replication will warn you for duplicate file names and default you to the “advanced disk configuration” option. It’s now up to you to make sure that you don’t end up with duplicate file names on the same datastore or in the same datastore folder: you can’t consolidate everything to one datastore and let vSphere Replication rename the files. If you make a mistake there’s no advance warning that this won’t work: vSphere Replication will get to work configuring the replication, throw an error that there are duplicate disk names and you can start over again. If your VM has 11 disks and is spread over 4 datastores, better start whiteboarding to make absolutely sure you don’t have a mistake in there.
You can set the interval with which vSphere Replication performs the incremental replication in the Recovery Settings; possible intervals are anything between 15 minutes and 24 hours. If you want to change this interval you will need to reconfigure the replication set. Guess what; these settings are on the last page and you will have to traverse the disk configurations which will again show you a loading popup before you can click next. As far as I’m aware there’ only a very limited CLI for vSphere Replication, so there’s no shortcut here… Groan!
Seeding virtual machines on the destination side
For the smaller sites we chose to pre-seed the replications by restoring an image level backup on the destination cluster and then use these .vmdk files as the targets for the replication. vSphere Replication will checksum the files and only transfer the changed data; typical numbers I’ve seen is that for a recent (2-3 day old backup) of a 100GB virtual machine, this would result in 2-4GB of changed blocks being transferred. A lot better than cramming 100GB over the WAN!
Be aware of the following two things:
- Your seeded .vmdk’s will need to have the same file name as the source VM. If they’ve got a different name you will have to rename them: the CLI command vmkfstools -E <old file name> <new file name> will do the trick.
- The disk UUIDs will have to be identical or else you’ll get a Target disk UUID validation failed error. Use the following KB article on how to change the disk UUIDs in the seeded vmdks: basically copy the UUID out of the source .vmdk file and paste it into the seeded/destination .vmdk.
Afterwards you can configure the replication just like a fresh replication. vSphere Replication will prompt you that it has found a file with an identical name and asks you whether you’d like to use it as a seed. Be aware that you cannot change the disk format at this point, so if you’re seeding thick provisioned and you want to be thin provisioned in the end-state, you’ll have to run a storage VMotion afterwards to correct this.
Recovering virtual machines
The cutover approach we used is simple:
- Power down the source virtual machine.
- Select the replication session, click recovery, follow the on-screen instructions (basically select the destination cluster and next, next, finish).
- Edit the settings of the virtual machine, select the correct virtual network for each NIC and connect it.
- Power on the destination machine.
There’s a number of things to be aware of:
- Make sure the replication delta is as small as possible before you start your cutovers (i.e. power down the virtual machine). If the replication delta is large it will of course take a while to transfer that data over. More importantly it will also take a while to apply that set of changed blocks to the replication target: just seeing your sync progress bar reach 100% doesn’t mean the virtual machine will immediately start cutting over. On especially big deltas (like 100GB for a big SQL server) you may have to wait 30 minutes to an hour for it to apply, just like with any big snapshot. This wastes valuable available downtime.
- Additionally it appears that while you’ve issued the Recover command for a virtual machine and vSphere Replication starts syncing over the last data for that machine, it’s not possible to start another recovery. You can start the recovery in the GUI and the task will be created, but it will be in a zombie state until the first recovery is finished. This is very confusing at first, especially if the first recovery shows it’s 100% synced but under water it’s still applying the delta to disk: you might think that vCenter has frozen. Whatever you do, do NOT restart the vCenter server while a recovery is active, because this could cause your replication to get stuck in the Recovering state. At that point you’ll have a hell of a job to get that replication back to a normal or error state so you can retry or cancel it.
My two cents
vSphere Replication is a cool product if you want to replicate a virtual machine across a WAN link to a different vSphere cluster, regardless whether it’s managed by the same or a different vCenter server. It’s free and it’s remarkably efficient with your WAN bandwidth since you can choose to seed your backups. If your link fails during replication, it will pick up where it left off. So technically it’s just great and does the job remarkably well.
From an operational perspective there are quite a few possible improvements. For one it would be nice if you could use a single toggle switch to fully thin provision your destination machine, instead of having to toggle this for each individual disk. The option to consolidate a VM back to one datastore would also be beneficial: there are plenty of cases where I don’t care if vSphere Replication renames the .vmdks. Also having to click through each and every disk of a configured replication session and waiting for the loading screen just to change the replication interval is cumbersome: an option to do this with one right-click of the mouse or via a proper CLI would be useful. And the same for the recovery aspect of a replication; better reporting of the recovery process (to avoid the “recovery at 100% but nothing is happening”-confusion) and recovery parallelism wouldn’t hurt, even though some of these options would probably cut into the SRM market share.
If you’re migrating virtual machines I can recommend using vSphere Replication to do the job. Just be prepared to spend quite some time in the web client…