Last week at the MIT Sloan CIO Summit in Cambridge, MA, participants on a panel on the role of the CIO were asked about a big failure during their career. Fidelity Enterprise CTO Stephen Neff told a story that provides a lesson for every IT pro.
You're never as secure as you think you are.
Neff shared a story about when he worked at Salomon Brothers, the now defunct Wall Street investment firm, earlier in his career. When he joined the firm, it had a double backup system in place. That meant there was a backup of the backup, so clearly if anything went wrong he was fine because you couldn't corrupt two backups, right?
Well, no. Actually, it was wrong. As it turned out, even though Neff as a young IT manager thought he was covered, he learned that no system is foolproof and this drove home the idea early that he should never make assumptions and to test every system.
That's because he learned that his mirrored site was corrupted and the backup to that hadn't been updated with a crucial piece of software, so that wasn't actually doing the backup as he had believed.
"We didn't know the entire system was down," Neff told audience.
The good news though was that he learned about the issue before anything bad happened and he was able to hire a company to recover the data from his disks, but it was still a valuable lesson.
"Stability is not a given You might think you're stable, but you have to test this stuff constantly," he explained.
He added, "Operational excellence is table stakes, the cost to get into the game."
This story illustrates that even the best laid security and backup plans can be undone by unknown factors that are really out of your control. I've heard IT pros say they won't go to the cloud because they can't trust it because it's out of their control.
But this story clearly shows that even when the systems are right under your nose, things can and do go wrong for a lot of reasons you might not have thought about. IT pros should keep that in mind and understand that it's not just about the systems you have in place.
You have to test them and you have take into account human error, software issues, mechanical problems, you name it. The fact is you can't take anything for granted, whether you're a cloud vendor or running your own private data center.
What Neff's lesson shows is that you can never assume that everything is fine because you've built in redundancies. That alone isn't enough.
Neff has obviously learned these lessons, 18 years into his tenure at Fidelity Investments, and still keeps those early mistakes in the back of his mind, as all good managers do, because he understands you have to be vigilant or everything could come apart in a quick minute.
Photo Credit: (c) Can Stock Photo