Yesterday we racked and stacked the EMC Isilon systems, prepared most of the cabling and pretty much prepared to start the Isilon systems. Which is pretty uneventful if you consider we’ve been dragging along hundreds of kilograms of equipment all day yesterday… The whole process can be pretty much split in four parts: configure the cluster and initial node, join the remaining nodes, configure the network, configure the rest.
Create the cluster
Assuming all the cabling is in place you should have an Isilon cluster that is not generating any noise or heat and is full of blinking green lights at the back of each node. This means that the system is powered but not yet turned on.
The power button is on the back of the system, near the power supplies. It’s tiny so look for it. Actually a pretty neat place to put it: you can’t really accidentally press it in that location. Push the button once on every node to start the Isilon nodes. Nice to know: if you ever want to gracefully shut down a node, press the button quickly three times in a row. If you want to give it a hard, not-so-graceful shutdown, keep pressing it till the noise stops…
Connect your serial cable to the first node: settings for this connection are in the procedure generator. Use the onscreen menu to configure the new cluster. It’s pretty straightforward and basically guides you through the configuration for the internal networks, external networks, NTP and time settings, cluster name, cluster join mode and some other bits and bobs. All in all if you’ve done your preparations right it should take you perhaps 5-10 minutes to get the first node up and running. Peanuts compared to racking, stacking and cabling… 🙂
Join the remaining nodes to the cluster
Now you’ve got the first Isilon node and thus a basic cluster configured, it’s time to join the remaining nodes. We configured the Isilon cluster in Manual join mode instead of the Secure mode. The difference is in the amount of freedom you get when adding nodes: in manual mode any node can join the cluster, in Secure mode the node has to be invited to the cluster. Since this will be the only cluster on the Infiniband switches, Manual mode is perfectly acceptable and nice and easy.
Joining the nodes is pretty straightforward and can be done in three different ways: via de CLI, via the front panel and via the GUI. We added the first nodes via CLI: move your serial cable to the next node in line, select the “Join cluster” option, acknowledge that you want to join the previously created cluster and press Enter. That’s it… <60 seconds later it should be done!
Method two is via the front panel. Press the little arrow to the left: never EVER press the square/enter button on the front panel if the front panel isn’t showing anything. The node could be in some sort of menu option: if you press the square it will perform that action and you have no idea what that is (the front panel isn’t showing anything, right?!).
So pressing the little arrow to the left will “exit the screensaver” for lack of a better description. Navigate the menu and select Join cluster. Acknowledge that you want to join the previously configured cluster. Wait <60 seconds aaaaaand you’re done!
The last method is via the GUI. Depending on where you’re located (in the DC or at your desk) and what access you’ve got this probably takes more time than pressing the buttons. But then again: if you’ve configured the cluster in Secure Join mode, this might be the only method that works… We couldn’t try it since we ran out of nodes to add… 🙁
Very important: we tried to add the third node while the second node was still joining the cluster. This resulted in a “no quorum found” error and kept the third node in “Please wait…” mode indefinitely. Just connect a serial cable to the node in question, Enter/acknowledge the error and try again (either via CLI or again the front panel). And have some more patience next time!
Configure the network
Now all nodes are online, it’s time to configure the network. If you want some sort of redundancy and/or performance, you will probably be aggregating your physical interfaces. We were using four 1GigE interfaces per node. When configuring the subnets and assigning interfaces to the pool(s), you have a number of interfaces listed:
- The individual interfaces, i.e. ext-1 through ext-4. These do not employ any channeling protocol.
- ext-agg: this is the aggregate of ext-1 and ext-2. The node will channel these according to the protocol you use further up in the screen (in our case LACP).
- ext-agg-2: this is the aggregate of ext-3 and ext-4. Again, the node will channel these according to the protocol you’ve selected.
- ext-agg-3: this is the aggregate of ext-1, ext-2, ext-3 and ext-4. Basically all interfaces!
In our case the primary Isilon is connected to two completely separate switches to provide some added resilience against a switch-stack reboot. Since a LACP channel cannot span two switches, we needed two separate tunnels. In this case we added the ext-agg and ext-agg-2 interfaces to the pool, as shown in the screenshot above.
The secondary Isilon does not have this added network redundancy and is connected to only one switch. In this case only one channel is needed per node: we selected ext-agg-3 accordingly.
Make sure your switch config is prepared in advance: LACP needs some config on the switch side as well. Also, the phone number of your trusty network team could come in handy. In our case we ran into some problems during the configuration of the primary Isilon. We could only ping the IP addressed behind all the ext-agg-2 interfaces from the entire company network: the IP addresses behind the ext-agg interfaces could only be pinged from inside the VLAN itself. This usually reeks of gateway issues, except that this is a Isilon global setting which we configured correctly. After some intense checking and double checking with our network guru Rob Heygele (check channel states, port states, correct cabling, etc), we eventually removed all the interfaces from the Isilon config, re-added them and everything was fine. While doing this, please make sure you still have at least one pool with some active interfaces and IP addresses at this point or you’ll have an unreachable cluster and will have to do some serial cable magic again! 😉
Configure the Rest
Having all the networking squared away it’s time to configure the rest of the Isilon: timezones, AD membership, protection, licenses, etc. Even though we did this today, I’ll make a separate post out of it. To be continued! Let me know how your start of the Isilon cluster went down in the comments!