Coho Data: storage transformation without disruption

Coho Data logoCoho Data was first to present at Storage Field Day 8 last October, with Andy Warfield (co-founder and CTO) running the entire presentation start to finish. We knew what to expect: last year Andy also presented at SFD6 and cooked the brains of half the delegates. So this time we came prepared, with ample coffee and not too much breakfast in our stomachs. And we weren’t disappointed: Andy gave a crystal clear company mission to us (NOT focused on the ever-so-hyped “disruption”, but instead on transformation), backed with ample of shiny tech and intelligence inside the Coho array. So what is Coho Data trying to do and how?

Coho Data aims at building a private cloud environment for medium to large size companies. IT departments can then cost-efficiently and quickly share out these storage resources to the other departments within the company. In effect, the IT departments takes a CAPEX investment (=buy hardware/software) and then, just like a service provider, turns this CAPEX investment into a OPEX (=monthly fee) charge-out to the departments.

There’s a couple of challenges these IT departments need to tackle when building private clouds. Investing in a massive storage system upfront could be a risk if all of a sudden your data growth slows down: you’ve paid for an system that’s too large for your demand. Further more, there’s a massive amount of development going on in the market: commodity hardware is overtaking proprietary hardware, new flash media (like NVMM) are being invented and existing devices are dropping in price rapidly, causing ‘old’ hardware to be deprecated in something like 18-24 months.

Coho also does not believe in “disruption” – instead it wants to help customers transform from the “old boring but necessary” hardware storage environments to the “easy to scale, flexible cloud” like AWS and also allow the customer to start with the New IT emerging technologies like Docker and Hadoop. Effectively building a bridge between the old and new IT.

Coho Data transformation

So how does Coho Data do all that?

Easy: they build rack scale, scale-out, hybrid storage systems! Scale-out means that you can add capacity and performance in small steps instead of having to invest in one big chassis upfront. Coho systems are not meant to be placed in a rack at the end of your data center row. Instead they are spread across your compute racks with the computing power as local as possible to the storage. This cuts down on a lot of the traffic between compute racks and the “end of the row” storage rack.

Coho Data Architecture

The current Datastream products allows customers to mix enterprise NVMe, SSD and spinning disk in one system, either as a flash+spinning disk system or as an all-flash array utilizing just the NVMe and SSDs. The NVMe is used as primary storage (not cache!) and will absorb all incoming writes, allowing the data to cool off a bit before destaging it back to the slower SSD drives or spinning disk. One NVMe device is so fast it will flood one 10Gbit port easily so Coho had to pretty much rewrite Ethernet to speed up transfers: the SFD6 presentation dived into this in great detail. Since the Ethernet component is so vital to the system, Coho Data usually provides the SDN switches itself although it does allow customers to choose their own Arista switches.

These NVMe cards can be compared to diamonds: they are (relatively speaking) still incredibly expensive and hard to divide into smaller pieces (i.e. share). If you put one of those cards in a regular server, it will be under-utilized since the connectivity to that server and the amount of CPU power in the server cannot keep up with the NVMe card. Putting one or more NVMe cards in a server is therefore a terrible waste of money: you want to share these cards with more than one server. Hence why Coho puts them in their arrays.

Talking about CPU limitations: storage media response times are dropping incredibly fast. With the storage media no longer being the bottleneck, the channels piping data to the media will have to work harder to keep up. This means that data forwarding decisions have to be taken faster. As a result, storage is becoming CPU bound. This conflicts with the hyper-converged storage use cases on the line between high performance and ease of use: if you drive your storage hard, there are no compute resources left for the apps. This does not mean that hyper converged has no future: if ease-of-use is what you want, it can do that. But combining ultra-high performance storage and hyper converged is tricky to say the least…

With this in mind, a lot of attention will have to be given to how a storage system places data and spreads the network load. Ideally you will want to let the system figure it out itself based on the workload and dynamically adjust the data placement if the workloads change. For the Coho Data systems, this will be done by Coho FlashFit. Additionally, Coho OnStream will (if you allow it) send back data about both the system and the workloads back to Coho for support and product improvements.

HyperLogLog

Now to the absolute coolest part of the presentation: HyperLogLogs or HLLs. What’s a HyperLogLog? It’s a small (bytes to kilobytes in size) data structure. You can put some data in there and then ask it how many unique entries it contains. If you insert the addresses of the blocks that you access on the storage system, you can determine the working set size. For example: hit the same 10 addresses a million times in a row and the HLL will report back a working set size of roughly 10 addresses. Write a million unique addresses to it and you’ll get ‘~1 million’ back upon querying the HLL.

If you create new HLLs over time and link these to the old HLLs, you’ll get a sliding time window that shows you how your working set changes over time. It will also show you if the current working set is a subset of the working set at an earlier point in time. To top things off, Coho snapshots these HLLs to allow working set analysis over time. This whole linking of HLLs and snapshotting them is called Coho CounterStacks and it is done at a per-object VMDK level at a 4KB granularity.

Coho Data HLL

As you can see, looking at Andy’s epic smile after baffling the delegates and receiving a small applause from Howard and several “oh sh*ts”, he knows this is gold. Why?

First of all, knowing the working set size is vital for a hybrid array since that’s what you size your fastest tier(s) on. But by knowing the working set and how it changes over time, the array can also “correct” the balance of data across SSDs and spinning disk after a disruptive application (like backup) messes it all up. The analytics could also try and find periodic workloads like the monthly reporting or mailing list generation and adjust the data placement accordingly. This pre-fetching mechanism isn’t ready yet, but Andy should have more information on it in SFD10… so I’d better make sure I’m attending that session in person!

My two cents…

Coho’s focus on transformation instead of disruption is absolutely golden. In my (non-native English speaking) opinion, disruption has mainly a negative connotation. I don’t care how cool your product is, if you start promising disruption, I expect a pause in the flow of data or I expect needing a completely new skill set to manage a system. That’s bad and will result in raised eyebrows if you try to sell/implement the product. And the majority of times its just a blatant lie: dig deep enough and even the newest of technologies build on the successes or failures of a previous attempt.

With this SFD8 presentation and compared to last year, Coho Data looks a lot more mature and capable to execute. They’ve added a number of enterprise features to their systems which tick the RFP boxes and are steadily working on some pretty cool tech that you don’t hear other companies talking about. Like the HLLs and the pre-fetching possibilities that brings. I work with hybrid systems a lot and if there’s one thing that causes problems in hybrid systems, it’s unexpectedly changing workloads or workloads that have IO trends that span more than a couple of days. If Coho can successfully tackle that challenge with CounterStacks, it would make troubleshooting performance drops more efficient and make the system more capable of absorbing infrequent spikes of workloads. Allowing you to position hybrid arrays in areas where you otherwise might choose an all-flash array.

Watch the Coho Data recordings over here: they are really worth your time! Additionally, read Ray’s, Enrico’sScott’s and Mark’s posts to see their sides of the story. In the meantime I’m already looking forward to next year and hope to run into these systems in the field!

Coho Datastream 0.002fDisclaimer: GestaltIT paid for the flight, hotel and various other expenses to make it possible for me to attend SFD8. I was however not compensated for my time and there is no requirement to blog or tweet about any of the presentations. Everything I post is of my own accord. Coho Data provided us with a Datastream 0,002f to tinker with, which is a Raspberry Pi.