Storage Field Day 6 Day 2 – Coho, Nexenta and Pure Storage

Meet... me! Swag from Coho Data, SFD6.Storage Field Day 6, Day 2! Having recovered somewhat from yesterdays program we all hopped in the limo and drove to Sunnyvale, CA. Breakfast would be served at Coho Data who would also give the first presentation of the day. Second presenter of the day was Nexenta Systems talking about their SDS solutions. Another limo ride later we arrived at Pure Storage who would talk about their all flash arrays (AFAs). The day would be concluded with some video games, dinner and a trip in a time machine. Let’s go!

Coho Data

Coho Data aims for the mid to large enterprises, ISPs and application service providers and wants to provide them with a dead simple storage system by removing all the complexity. The DataStream series is a scale-out storage system (the individual nodes are called MicroArrays) that currently offers NFS as a front end protocol. I have seen a slide with SMB, iSCSI and HDFS listed as protocols which would be very welcome since with the current offering of “just vmNFS/NFSv3” the system is only aimed at VMware environments.

The DataStream user interface is very simple, very clean and offers ample opportunity for reporting. As a bit of pun aimed towards Chris Wahl’s blog post about reporting via Twitter DMs, Andy demonstrated the Coho Twitter feed which one of their systems uses to report events.

Coho creates hybrid arrays with two tiers: PCIe flash cards and NL-SAS 3TB drives. Data enters the array on the flash tier and is demoted to spinning disk when space needs to be created in the flash tier. According to the DataStream datasheet the Coho arrays do not simply use an LRU algorithm but instead uses analytics to decide which block to demote. It is written that it can also look for predictable periodic patterns. I’m VERY interested to see what the boundaries or limitations are to this Coho approach: would it behave well with for example an application that only springs back to life once every month? Or would it have to be more frequent than that, like for example a daily login storm? These unpredictable workloads that suddenly have to come from NL-SAS are currently the main problem I run into in the field.

Coho flash returnsThe obvious question is then: how much flash should I add? Simply adding more and more flash will of course produce a much higher flash hit ratio but will do that with diminishing results and a much higher effective cost per “extra” flash hit.

The DataStream UI allows you to effectively monitor the cache usage and will actually suggest you when you need to add more hardware (e.g. flash drives) to improve the application experience, or when the workload is pretty much uncachable and does not benefit from more cache.

This is only a tiny part of what was discussed during the Coho Data meeting. You should know that Andy usually teaches PhD students and that (here come the excuses!) English is not my native language + I had a short, jetlagged night. My ears started bleeding somewhere halfway through the presentation. I will not even attempt to transcribe the video: watch it yourself. Then watch it again.

I’m curious to see how they match up against other hybrid arrays in the market so I’ll definitely keep my eye on them in the future. For now, look at Enrico’s post, John’s post, or the post from Dan.

Nexenta Systems

Second to present on Storage Field Day 6 was Nexenta. IT is always struggeling to run their apps and services at a low cost. Nexenta aims to help the business achieve this goal by lowering TCO: both when comparing it to on-site storage systems and with cloud offerings from for example Amazon.

Storage Field Day 6 - Nexenta Product Portfolio, Jill Orhun presenting.Nexenta currently offers NexentaStor as the true SDS platform. Nexenta sells the software which the customer can run on any commodity x86 hardware they wish to create a unified, block and file, software defined scale-up storage that can scale to roughly 1,5PB. It ships with two types of licenses: community (18TB max, no commercial support, no FC and no HA plugins) and the fully featured Enterprise version. They quoted 5500 customers in total of which 50% runs NexentaStor in the enterprise licensing model. In total that’s about an Exabyte of storage with one customer running >100PB. Impressive numbers!

Nexenta also offers NexentaConnect: this piece of software adds file functionality to an existing VMware VSAN deployment (which is block-only). Now while this sounds very useful in some cases, the major discussion during the presentation was how this will fit in the financial big picture. For big customers the licensing cost for a Windows license is next to nothing, so why purchase NexentaConnect instead of deploying a Windows VM that will then offer SMB and NFS shares/exports? Maybe as a stripped down, don’t-have-to-patch-Windows-every-month method to build a file share? I’m not sold yet, I’d have to see some financial numbers to make sense of this.

Storage Field Day 6 - Nexenta Cupcakes
Scale-out Nexenta cupcakes instead of a scale-up pie!

New for 2015 will be NexentaEdge which will add object storage to the existing block & file mix. It will be a scale-out solution which should scale from 100’s of TB’s to 100’s of PBs and will have Amazon S3 and OpenStack Swift integration. Sounds perfect to me and should nicely complement NexentaStor… think NexentaStor for branch offices, NexentaEdge for large enterprise DC with added object store functionality and offloading to the cloud?

Which leaves NexentaFusion, the automation, orchestration, single-pane-of-glass Nexenta tool for everything SDS related, which is also aimed to be released somewhere in 2015.

If you want to make up your own mind, watch the Nexenta SFD6 videos, read Nigel’s post or the post from Dan.

Pure Storage

A fridge. Make that two fridges. A 19″ rack that has been modified to hold 4 beer taps. A freezer filled with ice cream. Enough candy to cause instant diabetes. And a big pirate flag hanging from the ceiling. Can you say start-up? Pure storage was founded in 2009 and launched their first all flash array (AFA) in 2012. They have a pretty solid vision though and already shipped more than a 1000 systems.

First of all flash arrays should perform well. By designing an all flash array you will avoid the performance dip that is standard on ANY hybrid array as soon as your dataset isn’t in flash. They apply deduplication and compression to reduce the dataset and keep the cost low which is obviously also an important aspect. Furthermore the systems should be always on, be easy to manage for the storage admin and transparent for anyone above the storage admin (e.g. DBAs, VMware admins, etc).

Storage Field Day 6 Pure Storage Simple

The arrays itself consist of two x86 servers (controllers) connected with Infiniband. The Flash shelves, filled with consumer grade MLC drives (cMLC) are attached to these controllers via SAS. All deduplication and compression happens inline and the array is bulletproof: yank cables and modules and your data is still safe.

Pure Storage emphasized their “different than the other vendors” approach to customer experience. The Pure Storage arrays send telemetry back to HQ which is proactively mined for faults or potential optimizations. If your array misbehaves Pure support will contact you (the customer) instead of you having to reach out to them. I really like this approach and I think a lot of vendors could learn something from this.

Check out the Storage Field Day 6 Pure Storage presentations, this post from Enrico or another one from Dan. Jarret also has one of these arrays at home so do read about his experiences.

My thoughts on day 2…

The simplicity theme returned today: easy, customizable reporting from the Coho Data DataStream architecture and proactive support from the Pure Storage as a company. Nexenta completed the day with their SDS solutions to drive down TCO: something that’s always on the agenda of EVERY IT department.

Later that evening we had dinner at an Italian restaurant in Mountain View and afterwards went to the Computer History Museum. Which is just.. plain… awesome! I know my computer history, although maybe not in so much detail as the ol.. more experienced people in the SFD group 😉 . It’s good fun to see where we came from: compute and storage that once occupied an entire building now lies in the palm of your hand with the latest smartphone.

Thoughts, comments, questions? Leave them down here! Or continue reading about the third and last day of SFD6 (boo!).

Disclaimer: Even though GestaltIT paid for the flight, hotel and various other expenses, I was not compensated for my time and there is no requirement to blog or tweet about any of the presentations. Everything I post is of my own accord.

3 Pingbacks/Trackbacks

  • Pingback: Storage Field Day 6 Day 2 – Coho, Nexenta and Pure Storage()

  • Disclaimer: I work as the Sr. Technical Marketing Engineer at Coho Data.

    Hey Jon,

    First of all, thanks for the post; great summary of what was presented last week.

    I just wanted to clarify one point from your analysis… You stated that we include “some flash as cache to accelerate everything”, but this is not actually the case. Unlike some of our competitors (i.e. other presenters at Storage Field Day 6, among others) we use flash as a “Tier 0” read/write layer, so everything is written to flash and then demoted to spinning disk based on a LRU algorithm. See the following post for a bit more information regarding “Tiering vs. Caching”:

    Thanks again!

    • Jon Klaus

      Hi Chris,

      You’re absolutely right! I was actually digging through the DataStream datasheet later last night to look for more information on the data movement in a Coho arrays in preparation for a different post, but neglected to update this one as well. Woops, I have updated the post!

      So according to the datasheet the Coho systems actually do not use an LRU algorithm but use more complex analytics to determine which block to move up or down. Of course there will be also some “LRU”-ness in that algorithm, but can you disclose a bit more about that algorithm? What can we expect from it and what is out of bounds? I work with multi-tier hybrid arrays all day long so I’m all too familiar with apps that have their data on the “wrong” tier when it is actually needed.


  • Pingback: Coho Data, hyperloglog and the quest for IO performance | RayOnStorage Blog()

  • Pingback: Coho Data: SDS and SDN working as one | VDICloud()