I recently installed a new Data Domain DD6300. Part of the whole installation procedure is to run a DD OS upgrade to bring the system up to the target DD OS release. You can find the target releases over here. While running the upgrade to 6.1.2.20, the Data Domain correctly rebooted as part of the upgrade. Logging back in, the system GUI kept throwing an “Upgrade in progress” popup, blocking everything else in view. There is also an alert that shows “DD OS Upgrade is in progress. The system will not be available for backup and restore operations. The alert will be cleared after the upgrade operation is complete.” Which I guess is NEVER when the upgrade is hung…
troubleshooting
I’ve installed quite a few new Isilon clusters in 2019. All of them are generation 6 clusters (H400, H500, A200), using the very cool 4-nodes-in-a-chassis hardware. Commonality among all these systems is an 1GbE management port next to the two 10GbE ports. While Isilon uses in-band management, we typically use those UTP ports for management: SRS, HTTP, etc. We assign those interfaces to subnet0:pool0 and make it a static SmartConnect pool. This assigns one IP address to each interface; if you do it right, these should be sequential.
Recent addition to my install procedure is to create some DNS A-records for those management ports. This makes it a bit more human friendly to connect your browser or SSH client to a specific node. In line with the Isilon naming convention, I followed the -# suffix format. So if the cluster is called cluster01, node 1 is cluster01-1, node 2 is cluster01-2, etc. However, it turns out this messes up your SyncIQ replication behavior!
I recently expanded two 3-node Isilon X210 clusters with one additional X210 node each. The clusters were previously installed with OneFS 7.x, and upgraded to OneFS 8.1.0.4 somewhere late 2018. A local team racked and cabled the new Isilon nodes, after which I added them to the cluster remotely via the GUI. Talk about teamwork!
A brief time later the node actually showed up in the isi status command. As you can see in the picture to the right, something was off: the SSD storage didn’t show up as Isilon L3 cache. A quick check did show that the hardware configuration was consistent with the previous, existing nodes. The SmartPool settings/default policy was also set up correctly, with SSDs employed as L3 cache. Weird…
A while ago I installed two new Isilon H400 clusters. With any IT infrastructure, consistency and predictability is key to a trouble-free experience in the years to come. Cables should be neatly installed, labeled and predictable. Wiring in the internal network cables, it helps if the nodes 1 through 4 are connected to switch ports 1 through 4 in order, instead of 1,4,2,3. While some might consider this OCD, it’s the attention to detail that makes later troubleshooting easier and faster. Like a colleague said: “If someone pays enough attention to the little details, I can rest assured that he definitely pays attention to the big, important things!”.
So I installed the cluster, configured it, then ran an isi status to verify everything. Imagine my delight when I saw this:
Aaargh!
In my previous post I described how to reformat an Isilon node if for some reason the cluster creation is defective. After we got our new Gen 6 clusters up and running, we ran into another peculiar issue: the Isilon nodes lose network connectivity after a reboot. If we would then unplug the network cable and move it to a different port on the Isilon node, the network would come online again. Move the cable back to the original port: connectivity OK. Reboot the node: “no carrier” on the interface, and no connectivity.