Breaking down storage silos with Primary Data DataSphere

Primary Data logoPrimary Data unveiled there DataSphere product at VMworld US back in August 2015. With DataSphere, Primary Data virtualizes the different types of storages in the datacenter, creating a global dataspace and breaking down the traditional silos of storage. It attempts to do for storage what VMware did for computing: any piece of data can reside on any storage, movable at any time, without interruption. In essence, increasing data mobility by decoupling the logical storage from the physical hardware. The team gave us an update on their product at Storage Field Day 10, so here goes!

Different types of storage (block, NAS, object) naturally create data silos, and it’s difficult (if not impossible) to merge these technologies seamlessly. Primary Data examined the market: 51% of enterprises manage 10 or more storage systems, with older/larger enterprises obviously leading the pack. Nearly two thirds of enterprises manage 3 to 9 different storage vendors, which brings its own unique challenges. As Lance Smith, CEO of Primary Data puts it: “Different flavors is good for ice cream, but not for storage management.”

DataSphere and DSX

The product consists of two components:

  1. DataSphere appliances (virtual or physical) that house the metadata, shares, filesystems, etc.
  2. DSX, which is the universal client that enables the server to access the blobs of storage.

The control plane uses pNFS, regardless of the underlying storage which can be local storage in a server, a traditional SAN LUN or NFS filesystem, or a S3 object store. All the underlying storage in this single dataspace is presented to the clients as a single filesystem; yes, even the LUNs resemble an inode in that filesystem. For that block LUN however, the DSX client will be able to map this inode and access that LUN as a traditional block device.

Primary Data Out of Band Storage Agnostic approach

Primary Data’s DataSphere does NOT sit in the data path, e.g. the client will access the SAN or NAS filer directly. Traffic will not flow via the DataSphere appliances, thus the client can utilize the full performance of the array. The protocols themselves aren’t changed either: for example for SMB, the SMB features are unchanged but the traffic is rerouted by the DSX filter driver to the appropriate location.

If this starts to sound too good to be true: remember that the DataSphere appliances store all the metadata and will be queried for all metadata open and close actions. Thus, if the DataSphere appliances are unavailable, the data is also unavailable. To prevent outages, the DataSphere appliances are highly available, clustered appliances. To provide an extra layer of defense against data loss, the metadata contained in the DataSphere appliances is periodically dumped: should all appliances crash, this metadata tarball can be used to rebuild the environment.

Service Level Objectives

Now we’ve got a single global dataspace, it would be ideal to move data to the correct underlying storage technology to use these devices for the best possible workload. Primary Data calls this Dynamic Data mobility: aligning data needs (objectives) with storage capabilities (service levels). It is done through automated mobility (part of the DSX client), arbitrated by economic value and reported as compliance.

There are three basic service level objectives:

  • Protection: characteristics such as availability, access control and security.
  • Performance: based on either IOps, bandwidth, latency, read:write ratio or sustained vs burst speeds.
  • Price/Penalty: either counted per file, per byte or per operation.

For example, you could create a service level objective for high speed block storage from an XtremIO box and another for low cost S3 object storage somewhere in the cloud.

Where it gets interesting is with the Smart Objectives: these allow you to automatically optimize the objectives for the mix of applications in your storage ecosystem. For example if you detect files with the .psd extension, you can automatically allocate these to the application Photoshop and store these files on a quick NAS filer somewhere. There’s a whole range of objectives available here, including I/O patterns, allowing you to tweak and tinker the data placement as you like it.

My thoughts on Primary Data DataSphere…

I’ve participated and ran plenty of data migration projects over the last couple of years to realize that data migrations can be challenging. Not so much the datastores with virtual machines (it’s easy to storage VMotion these to a new datastore), but usually the NAS shares that are used by applications like medical imaging or just all the home folders for all the users in the company.

If you’ve got a wide range of storage products in your infrastructure, DataSphere can help make the day-to-day management a bit less complex and enable you to move your data to other storage products and technologies with less effort. This could be ideal for companies involved in mergers: these types of business events always drastically increase the number of storage products and technologies in use.

I haven’t quite made up my mind yet whether I’d position Primary Data’s Datasphere as a product I’d temporarily use to ease a transition period, or as a strategic, long-term component in my storage infrastructure.

If you’re building a new storage platform, there are a number of storage arrays that offer offloading into the clouds or tiering across products. Or the data mobility part is handled in a different layer, such as the hypervisor. So in these cases, I wouldn’t really factor in Datasphere as a necessary component I’d want to spend money on.

That said, from a contract negotiation perspective it’s always beneficial to avoid vendor lock-ins. With DataSphere in your infrastructure, you can easily stop buying one vendor/product and replace it with something cheaper/better/faster, without worrying about maintaining a homogeneous farm of storage arrays. Or if your application doesn’t have data mobility features included, it might still make your future tech refreshes a lot easier. Let’s see how the product develops and is received in the market…

You can watch Primary Data’s presentation at SFD10 over here. Max Mortillaro wrote a good piece about Primary Data already, which you can find here.

Disclaimer: GestaltIT paid for the flight, hotel and various other expenses to make it possible for me to attend SFD10. I was however not compensated for my time and there is no requirement to blog or tweet about any of the presentations. Everything I post is of my own accord.