Cloudian HyperStore: manage more PBs with less FTE

Cloudian LogoIf you want to build a private S3 object store, Cloudian HyperStore might be the product for you. Using commodity servers to form a scale-out architecture, you can build your own, fully S3 compliant object storage that’s located in your own datacenter. If you don’t want to supply your own servers, you can opt for the Lenovo Storage DX8200C appliance, powered by Cloudian!

Cloudian started in 2011 and has since grown to a company with 80 employees and over 65 customers, with increasingly more high-end customers that need the scaling aspects of the system. Some of these customers have Cloudian object stores in the range of 10-15PB, whereas some have “only” several 100’s of TBs. HyperStore 6.0 went GA a couple of weeks before Storage Field Day 10 and it contained a lot of features that these big customers requested.

More with less

How do you make a storage system that’s easy to use? This is not a new theme in the storage market. In fact, this has been a primary objective for a lot of storage vendors and has always taken a front-and-center role in their marketing campaigns. The reason behind it is simple: the total amount of data out there is growing rapidly (doubling every 2 years). 80% of that data is unstructured data. That has a couple of consequences, first and foremost that admins will have to be able to manage a LOT more data!

Cloudian HyperStore: more data needs to be managed by fewer storage FTEs

Or, using a different slide to show the same underlying problem:

Cloudian CAPEX is no longer the biggest cost; OPEX is.

Admittedly, there are a few problems with these slides, even with the Gartner stamp:

  1. There is a vast difference between managing block, object and file data.
  2. Is this actual usable capacity or raw capacity on the array? Does it include replicas? And backups? Is it only the storage admin hours or also the Windows DFS/shares/security admin? etc…

Numbers may vary in your own environment, but the general idea is simple. Storage needs to become simpler to manage, from cradle to grave. So what has Cloudian built in their HyperStore product to achieve that?


Hyperstore is a scale-out object store built on commodity servers. If you don’t want to supply your own servers, you can buy a combined package of hardware and software from Lenovo as of June 6th.

Cloudian and Lenovo

HyperStore is a peer-to-peer system, with all nodes sharing all services at all times. There is for example no dedicated naming node. This architecture has big advantages for both performance scaling and availability. Cassandra technology is a key component in the HyperStore software, and it can potentially scale up to thousands of nodes in a single cluster using a pretty efficient p2p protocol called gossip.

You can spread nodes across geographically disparate locations in a distributed and elastic geo cluster. There is no single point of failure, with data, metadata and configuration data distributed across all the nodes. You can manipulate where you want to put your replicas, to make sure data is spread across multiple locations. If desired, you can auto-tier data between the Cloudian system and any S3 compatible cloud.

To simplify the storage management, HyperStore has a number of important features such as easy addition of drives and nodes: the system will automatically rebalance data across the newly available capacity. Likewise, if you need to remove a node (for example because it’s end of life), this can be done on the fly. Cloudian HyperStore supports a multi tenancy environment, where users can provision storage using a self service approach, not unlike AWS or Azure solutions.

In case one of the clusters runs into problems, Cloudian support will proactively contact the customer. And unfortunately I have to be slightly amazed by this: the support engineer will be aware of the customer topology straight from the start. Nothing is more annoying than having a support engineer on the phone, only to have him/her ask the most basic questions. This is basic information that should be periodically polled from the system and accessible for the vendor support department. Kudos Cloudian!

It’s all about the metadata

Photographers know it, the NSA knows it, Cloudian knows it: it’s all about the metadata.

It’s impossible to sift through a petabyte of photos when you’re looking for a specific genre. Instead, with Cloudian ElasticSearch you will be able to query the metadata of all the objects and (provided the metadata is correct), request the number of photos with a car.

Cloudian ElasticSearch

Cloudian hasn’t released this functionality into the final HyperStore product just yet, but showed us the prototype during Storage Field Day 10. While we wait for the official release, you can read Enrico’s thoughts or scroll down to the bottom of read further for the link to the videos.

My thoughts on Cloudian

With HyperStore, Cloudian builds a class leading scale-out, S3 compatible object store that is simple to manage. This in turn enables storage admins to manage more storage at a better FTE:TB ratio. This goal is achieved with technology (for example SmartRepair and the easy cluster operations), but also in actual management of the system with a simple to use interface (for example the replication scheme).

While the Gartner numbers can be debated, I echo the Cloudian message that admins should be able to manage more data. This can only be achieved with systems that don’t need constant turning of knobs and tweaking of parameters. A storage system should run itself and the infrastructure around it should be as automated as possible, with some human verification down the line. Cloudian definitely got that message and is working in the right direction!

You can find the SFD10 videos over here. I’m not the only one that attended the presentation: you can read Dan’s post here and verify Cloudians S3 compliancy compared to AWS in Chris’s post here.

Disclaimer: GestaltIT paid for the flight, hotel and various other expenses to make it possible for me to attend SFD10. I was however not compensated for my time and there is no requirement to blog or tweet about any of the presentations. Everything I post is of my own accord.