Implementation Day 2 – Start the Isilon

You should see something like this after you start the isilonYesterday we racked and stacked the EMC Isilon systems, prepared most of the cabling and pretty much prepared to start the Isilon systems. Which is pretty uneventful if you consider we’ve been dragging along hundreds of kilograms of equipment all day yesterday… The whole process can be pretty much split in four parts: configure the cluster and initial node, join the remaining nodes, configure the network, configure the rest.

Create the cluster

Assuming all the cabling is in place you should have an Isilon cluster that is not generating any noise or heat and is full of blinking green lights at the back of each node. This means that the system is powered but not yet turned on.

The power button is on the back of the system, near the power supplies. It’s tiny so look for it. Actually a pretty neat place to put it: you can’t really accidentally press it in that location. Push the button once on every node to start the Isilon nodes. Nice to know: if you ever want to gracefully shut down a node, press the button quickly three times in a row. If you want to give it a hard, not-so-graceful shutdown, keep pressing it till the noise stops…

Connect your serial cable to the first node: settings for this connection are in the procedure generator. Use the onscreen menu to configure the new cluster. It’s pretty straightforward and basically guides you through the configuration for the internal networks, external networks, NTP and time settings, cluster name, cluster join mode and some other bits and bobs. All in all if you’ve done your preparations right it should take you perhaps 5-10 minutes to get the first node up and running. Peanuts compared to racking, stacking and cabling… 🙂

Join the remaining nodes to the cluster

Now you’ve got the first Isilon node and thus a basic cluster configured, it’s time to join the remaining nodes. We configured the Isilon cluster in Manual join mode instead of the Secure mode. The difference is in the amount of freedom you get when adding nodes: in manual mode any node can join the cluster, in Secure mode the node has to be invited to the cluster. Since this will be the only cluster on the Infiniband switches, Manual mode is perfectly acceptable and nice and easy.

Joining the nodes is pretty straightforward and can be done in three different ways: via de CLI, via the front panel and via the GUI. We added the first nodes via CLI: move your serial cable to the next node in line, select the “Join cluster” option, acknowledge that you want to join the previously created cluster and press Enter. That’s it… <60 seconds later it should be done!

Method two is via the front panel. Press the little arrow to the left: never EVER press the square/enter button on the front panel if the front panel isn’t showing anything. The node could be in some sort of menu option: if you press the square it will perform that action and you have no idea what that is (the front panel isn’t showing anything, right?!).
So pressing the little arrow to the left will “exit the screensaver” for lack of a better description. Navigate the menu and select Join cluster. Acknowledge that you want to join the previously configured cluster. Wait <60 seconds aaaaaand you’re done!

The last method is via the GUI. Depending on where you’re located (in the DC or at your desk) and what access you’ve got this probably takes more time than pressing the buttons. But then again: if you’ve configured the cluster in Secure Join mode, this might be the only method that works… We couldn’t try it since we ran out of nodes to add… 🙁

Very important: we tried to add the third node while the second node was still joining the cluster. This resulted in a “no quorum found” error and kept the third node in “Please wait…” mode indefinitely. Just connect a serial cable to the node in question, Enter/acknowledge the error and try again (either via CLI or again the front panel). And have some more patience next time!

Configure the network

Isilon Interfaces and their aggregation combinationsNow all nodes are online, it’s time to configure the network. If you want some sort of redundancy and/or performance, you will probably be aggregating your physical interfaces. We were using four 1GigE interfaces per node. When configuring the subnets and assigning interfaces to the pool(s), you have a number of interfaces listed:

  • The individual interfaces, i.e. ext-1 through ext-4. These do not employ any channeling protocol. 
  • ext-agg: this is the aggregate of ext-1 and ext-2. The node will channel these according to the protocol you use further up in the screen (in our case LACP).
  • ext-agg-2: this is the aggregate of ext-3 and ext-4. Again, the node will channel these according to the protocol you’ve selected.
  • ext-agg-3: this is the aggregate of ext-1, ext-2, ext-3 and ext-4. Basically all interfaces!

In our case the primary Isilon is connected to two completely separate switches to provide some added resilience against a switch-stack reboot. Since a LACP channel cannot span two switches, we needed two separate tunnels. In this case we added the ext-agg and ext-agg-2 interfaces to the pool, as shown in the screenshot above.

The secondary Isilon does not have this added network redundancy and is connected to only one switch. In this case only one channel is needed per node: we selected ext-agg-3 accordingly.

Make sure your switch config is prepared in advance: LACP needs some config on the switch side as well. Also, the phone number of your trusty network team could come in handy. In our case we ran into some problems during the configuration of the primary Isilon. We could only ping the IP addressed behind all the ext-agg-2 interfaces from the entire company network: the IP addresses behind the ext-agg interfaces could only be pinged from inside the VLAN itself. This usually reeks of gateway issues, except that this is a Isilon global setting which we configured correctly. After some intense checking and double checking with our network guru Rob Heygele (check channel states, port states, correct cabling, etc), we eventually removed all the interfaces from the Isilon config, re-added them and everything was fine. While doing this, please make sure you still have at least one pool with some active interfaces and IP addresses at this point or you’ll have an unreachable cluster and will have to do some serial cable magic again! 😉

Configure the Rest

Having all the networking squared away it’s time to configure the rest of the Isilon: timezones, AD membership, protection, licenses, etc. Even though we did this today, I’ll make a separate post out of it. To be continued! Let me know how your start of the Isilon cluster went down in the comments!

  • dynamox

    one interesting thing to note is that sometimes you will get nodes with different OneFS code, mostly when you are doing an expansion and not when setting up a brand new cluster. In my case new nodes were running an older code so as soon as they would join the cluster they would be automatically upgrade to match the current version of the cluster. If i am not mistaken the same thing would happen to a new node with higher version, it would be downgraded.

    • Michael

      Correct, make sure you know the version of OneFS running on all nodes before you configure the first node. You wouldn’t want the odd man to be the first node and have to upgrade/downgrade the remaining nodes as you join them to the cluster – or be a version different than your target.

      • dynamox

        sometime you can’t avoid it. I just got 10 new nodes for a brand new cluster. Out of the box they are running 7.1.01 so the first node had to be upgraded manually to 7.1.0.3 and then the rest would get upgraded as they would join the cluster before the format starts.

  • Rob Heygele

    Maybe a small add to your post, while troubleshooting the “network” issue we shutdown the LACP port-channels. This made no difference, only after shutting down and reopening the individual ports of the channels the Isilon boxes became pingable!

    • Jon Klaus

      Excellent Rob, thanks for commenting! We also shoved some interfaces around in the Isilon GUI (removing and re-adding), so I suppose the combination of actions on the switches and the Isilons resolved the issue. Cheers!

  • Rob Koper

    Nice blog posts, Jon! Really shows in great detail what Isilon is about. Also the tip about pressing that front panel button while the screen is still in “screensaver” mode is a good one!

  • higkoohk

    howto restart th ftp service and the httpd service ?

    • venkat

      Isi services ftp enable
      Isi services ftp disable
      Isi services httpd enable
      Isi services httpd disable

  • Yoga

    Hi Jon

    Could you explain a bit on this point ” Since a LACP channel cannot span two switches, we needed two separate tunnels.” How do we create two seperate tunnels. We have an accelerator node with 4 ext ports and planning on to have 2 ext port to swtich-1 and other 2 ext ports to switch-2

    • Jon Klaus

      Hi Yoga, sure thing. An Isilon node has 4 ext ports. You can either connect them without LACP channeling, or you can use LACP channeling in three different modes:
      ext-agg-3 is one big LACP channel of all four ext ports, which you can’t use in your example since you want to use multiple switches.
      ext-agg is the aggregate of ext1 + ext2.
      ext-agg-2 is the aggregate of ext3 + ext4.

      In your case you’re basically doing the exact same thing as we did on our primary Isilon system. So cable ext1+ext 2 to switch1, ext3+ext4 to switch2. Next, switch on LACP on those ports on your switch, throw ext-agg and ext-agg-2 in the pool and you’re done!

      • Yoga

        Thank you for the explanation Jon.
        I suggested the same thing to client yesterday looking at your blog. But they took other direction by doing ext-1 of Node 1,2 to switch-1 and ext-2 of Node 1,2 to switch-2 saying they need switch level redundancy and they have virtual port channeling configured on Nexus 5k which i am not aware of.

        • Jon Klaus

          This is perfectly fine and they could do the same for ext-3 and ext-4. LACP is not needed; if you’ve got SmartConnect Advanced licensed, failover of NFS connections will be pretty much non-disruptive should an interface fail since the IP addresses will move over to the surviving nodes and interfaces.

          • Yoga

            Thank you Jon. I am working on SMB environment and like you said, we have more advantage with NFS3