Isilon OneFS Upgrade HowTo

You should be seeing this during the upgrade process...Every IT system needs a software upgrade once in a while, either to enable additional functionality or to patch some security holes. Yes, even an Isilon scale-out NAS. Good news: performing an Isilon OneFS Upgrade is peanuts! Including pre-checks and the post-checks our 3-node cluster was upgraded in less than 2 hours without downtime. Curious how to do this?

 Lets get started!

First of all, lets get the formalities out of the way. If you’re unsure what you’re doing, stop and let EMC do the upgrade. You’re performing an upgrade on the most vital part of the Isilon that controls how your data is stored and accessed: OneFS. This is a condensed howto on how I upgraded from 7.0.2.1 to 7.0.2.7. Major or different release upgrades may have different checks, steps, restrictions or dependencies. Always check the latest procedure for your specific system before commiting to the upgrade! If you end up with an Isilon that has 100% free space, don’t blame me. Instead, thank me for all the data you can now store on that brand spanking new Isilon (and while you’re at it, send me some of that IT budget you’ve just saved!).

Aaahh, that feels much better! To start, head over to support.emc.com and download the latest release notes of the OneFS release you want to upgrade to. Trawl through the new features and bugfixes and more importantly, the release dates. I usually adhere to a “60 days after GA” policy, meaning I won’t upgrade to a release that’s only a couple of days old. Decide which release you want, download the OneFS installation image and upload it to the /ifs/data directory on your Isilon cluster.

Isilon OneFS Pre-check progressNext up are a round of pre-checks. FIrst of all, do a manual check of your cluster. Is it healthy or in an attention/critical state? Is there still free space or are you at 99% utilization? If you see a healthy Isilon cluster, continue with the automated pre-check. This will check a couple of things you’ve just checked yourself, plus things like disk utilization, drive stalls and various legacy configuration items. Log in with SSH as root and run the following command: isi update –check-only. When asked for the update image, point it to /ifs/data/<OneFSreleaseName>.tar.gz. Let the pre-check run: if there are errors, resolve them first. Warnings are informational so you can safely ignore those. If there’s no error or warning at all; great job, let’s continue!

Next on the list is checking for running jobs. Either enter an isi stat or navigate to Cluster Management > Operations. If FlexProtect or FlexProtectLin are running you’ll have to wait for those to complete: do not cancel them! Cancel the rest of the jobs to continue.

Enough with the checks, let’s go already!

Okay okay! You can upgrade via CLI or via the GUI; I opted for the GUI. Navigate to Help > About this Cluster and click the upgrade button. Browse to the image which should still be on /ifs/data and click submit. It will start loading the image and return a nice display with the available upgrade types:

Screen displaying the available Isilon OneFS upgrade types

There are two major options: a simultaneous upgrade and a rolling upgrade. The first one will upgrade all nodes at the same time, but your data will not be available during this upgrade since all nodes will be down. The latter (a rolling upgrade) will upgrade the Isilon grid node by node: this means the upgrade process as a whole will take longer but your data is always available. Keep in mind a rolling upgrade might not be possible in all OneFS releases; anything older than 7.0.0 only does big-bang upgrades.

With rolling upgrades you have some TCP options. Keep in mind a node needs to reboot so user sessions on that node will be terminated. Depending on the application and protocol (NFS or SMB) this might be disruptive to the end-user. If you know all those user sessions are relatively short-lived, you could choose to wait for them to expire on their own instead of forcefully terminating them.  The downside is that if the user session does not expire before the time-out specified is reached, the upgrade will fail. In my upgrade scenario I cannot wait for the TCP sessions to expire on their own so I chose to terminate them straight away.

The final option is whether you want to manually control when a node reboots or to let the upgrade process just immediately reboot once the OneFS upgrade has completed. I chose the manual option just because I wanted to see what that dialog looks like…

Isilon OneFS upgrade reboot node dialog
Mystery solved…

The node you’re connected to will be upgraded and rebooted last. This will invalidate your SSH session and/or GUI session. After logging back in, run either an isi stat or look in the upper right corner for the Isilon OneFS version. It should be the new version and your Isilon cluster should be healthy and green. There will be a number of jobs running (for example MediaScan): this is normal and your data is available.

Final steps

We’re almost there. You’ve already confirmed that the cluster is healthy. Remove the Isilon OneFS upgrade package from /ifs/data. As a final step, make sure EMC knows about your upgrade by running the Isilon Gather processes, either from CLI with isi_gather_info or in the GUI via Help > About this Cluster. This uploads the latest logfiles to EMC. Once that’s done: congratulations, you’ve just performed an Isilon OneFS upgrade! Peanuts, right?

While you’re at it, you might want to upgrade the Isilon node firmware or upgrade InsightIQ. Or leave a question or comment, it’s free!

  • dynamox

    Excellent post Jon.

    one note: OneFS 6.5 can be upgraded in-family just like 7.x (rolling upgrade) , it’s when you are upgrading between families (6.5>7.x) that’s when you have to schedule complete outage.

    Another note for folks that are reading this, make sure that CIFS clients connect to a pool that uses “Static” IP Allocation and “Dynamic” for NFS clients. This is very important during upgrades as IP configured in Static pool will simply go away when node reboots where IP configured in Dynamic pool will transition to another nodes in the cluster.

    • Jon Klaus

      Thanks Sergey and excellent addition!