PSA: Unity VMware VMFS replication limit hit at 64 sessions

Our company recently replaced a lot of VNX storage by new Dell EMC Unity all-flash arrays. Since we are/were primarily a VMware hypervisor house, we decided to go ahead and create the new LUNs as VMware VMFS (Block) LUNs/datastores. This however resulted hitting us a weird and unexpected replication limit at 64 sessions.

 

Dell EMC Unity Replication 101

So for those of you that are unfamiliar with Dell EMC Unity systems, a quick recap on the basics. The Dell EMC Unity is a Unified system, meaning it can present block storage (LUNs) and file storage (NFS exports and SMB shares). It can also present VVols to a VMware environment, over either block or file protocols.

A Unity offers a multitude of replication scenarios. You can replicate LUNs asynchronously and synchronously to other Unity systems, using the built-in replication software. If you prefer a continuous data protection (CDP) kind of solution, it will also accept RecoverPoint (or 3rd party) CDP software. There are a few limitations to these replication mechanisms. We primarily use synchronous replication to achieve an RPO (data loss) of zero. Synchronous replication with the Dell EMC Unity comes with the limitation that you can only create a 1:1 replication relationship between Unity systems. In plain language: Unity system A can only replicate synchronously with one other Unity system B. It’s also bi-directional, meaning B can replicate back to A. This limitation works fine for us; it simplifies replication topologies, and we manage to fill both the primary and secondary system up quite well. No worries there!

If you choose asynchronous replication, the accepted topology changes to N:M, meaning one Unity system can replicate to multiple other systems, and receive incoming sessions from other systems. Basically: whatever you want, as long as there’s an IP connection between the systems. There are some limitations to the maximum amount of systems that can replicate to each other, but not something we’re going to hit.

Our setup

In our case we bought 7 systems. We spread 6 of those systems across two datacenters, paired them together for synchronous replication, and that’s it. Since we’re a VMware house, we decided to go with VMware VMFS block datastores. The primary reason why we chose VMFS datastores in the Unity interface is that it puts these datastores in a separate screen, and a separate part of the capacity pie charts. This means that we can easily differentiate between VMFS and non-VMFS LUNs, e.g. CSV datastores for HyperV. Looking forward to the future, we expected to connect the VMware vCenter and Unity systems together, so we could get some additional integration between these layers of infrastructure. So it makes sense to prepare our infrastructure for this.. right?

Then, several months into the migration, we ran into an issue. When creating a new set of replications between two systems, the Unity threw an error: replication limit reached. We verified the number of sessions 64. I remember thinking: “Hmm, that’s a nice round (in binary at least) number, but it’s a bit too low for number of sessions. Lets check the replication limits again, because this doesn’t make sense.”

Unity Replication maximum

From the matrix I would expect to be able to create 750 synchronous replication sessions. We weren’t creating consistency groups (CGs), so we shouldn’t be hitting the 64 limit.

Long story short…

We reached out to Dell EMC tech support, which passed us on to Engineering pretty quickly. Unfortunately, it turns out that a VMFS datastore is treated as a CG in the Unity code. Which means that you can only create and replicate 64 VMFS datastores per pair of Unity systems. Remember that a system treats either an incoming or outgoing replication as one session. So you could replicate 64 VMFS datastores to the secondary system, or you could replicate 2×32 VMFS datastores in a bi-directional fashion, in for example an active-active datacenter construction. Equally unfortunately, it’s not a bug according to Engineering. They will be appending the documentation to better specify this limit.

For us, this means we’re in the unlucky situation where we migrated 70% of our data to the Unity systems, and now have to switch to regular block LUNs. It’s not the end of the world: we can resume assigning storage to VMware clusters. However it does mean that we will have a mismatch in capacity reporting: 70% of our capacity will be flagged as a VMFS datastore, and the remaining 30% will be hidden between the regular block LUNs for physical hosts, HyperV CSV datastores, etc. Which is kinda annoying, if you want to run a clean, predictable shop.

It is also not possible to easily convert a Unity VMFS datastore to a regular block LUN in the interface. You would have to create new block LUNs and Storage VMotion everything over. With >1PB of data to move, that’s not a job I’m looking forward to, both from an hours and a system load perspective.

Moral of the story

Simple: don’t use the VMware VMFS datastores in the Unity. If not for the replication limit, there are a few other limitations of VMFS datastores that we ran into. For one, you are unable to set a host LUN ID for a VMFS datastore. While VMware isn’t too bothered by this, it could be confusing for an admin to find a VMFS on HLU 10 on one host, and on 5 on another.

Additionally, the concept of Consistency Groups does not exist in VMFS datastore replications. Meaning you cannot group replication sessions together for simultaneous fractures and synchronizations. While we previously abused Consistency Groups a bit too much, it could give you some challenges in data consistency if a VM or vApp spans multiple VMFS datastores and you need to fracture the replications at the exact same point in time.

Regular block LUNs will have to do for now! If you have any questions or comments, leave them down below.