Last week we migrated several Oracle databases to a new DBaaS platform. The company I’m working for is in the midst of a datacenter migration to a new cloud provider. Since the Oracle databases were located on old and very expensive Oracle machines, we looked for opportunities to optimize and reduce costs. After much debate, we decided to move all databases to a shared Oracle Exadata platform. Much faster, and much cheaper: the hardware is more expensive, but you win it back with lower licensing costs (less sockets used).
All the Oracle database migrations went pretty well: stop app, export database, transfer to new DC, import & start database. The app teams updated their connection strings and tested the apps. Pretty painless! However there were also some scripts working alongside the databases, mainly for data loads. Server names changed and some scripts had to be moved from the old database servers to the application servers.
The working solution
This setup has been working for quite some time. There’s three folders:
- A files folder which receives the incoming files (who would have guessed)
- A logs folder which collects logs generated by the import process.
- A backups folder which saves the imported files for auditing and analysis reasons.
Pretty straightforward, except someone during the initial install decided to make the logs and backups folders subfolders of the files folder. From a scripting point of view, even I can see this is a pretty bad idea. If you fail to lock your script to just the “files” folder, it might go a level deeper and start processing the “backups” folder. Best case it might throw some errors and stop, worst case it would overwrite vital data and nobody would know. A better setup would be to have all those folders at the same level.
Don’t adjust the scope of your change!
But that wasn’t the change for the weekend: we just had authorization to move the database. Moving the files folder was impossible: the third party might have explicitly defined the target location for their files, so changing that would mean additional testing and reconfiguration by parties not involved in this change. And even moving the other folders might pose problems if some unknown other process was accessing them for auditing reasons.
But the app engineer insisted: the current setup “wasn’t working” and we had to change it, period. After a brief internal conf call we accepted the risk of moving the backups and logs folders one level up. The app guy reconfigured the script and, boom!
This started to look like it would escalate into a full reconfiguration of the script and the Exadata mount setup, so I asked one additional question I should have asked much sooner. “Which error is the script throwing in the old setup?” The answer: “Well, everything is working, but subfolders are not supported.”
After counting to 10 I asked everyone to roll back to the working folder structure.
My thoughts on support status and change scopes
Support status and “is it working” are two very different things. In an ideal situation, your application is both working and supported by the vendor. If you’re building a new application or piece of infrastructure, you should always ensure it’s supported.
In this case however, we had a working application that stopped working when trying to get it into a supported state. The team on call didn’t have enough background history on the application to assess whether this folder structure was created on purpose. For an application in production use, “working” always trumps “supported”.
This doesn’t mean that we shouldn’t try to get the application back in a supported state. But there are multiple ways to get to that state. Maybe the vendor will say “Hey it looks like it worked for several years, so we’ll accept this deviation on the standard and support you”. Or maybe we do have to change the folder structure, but in that case it should be a well-prepared change, with all parties informed. Not something done in a rush without fully knowing the consequences.