Time to stop "just getting by" with Tech Cutovers
by Andy Smith, on 30/07/18 14:13
This recent post on LinkedIn really caught my attention - with news that the BoE and the UK Regulators are placing new requirements on all Financial Services companies to develop and demonstrate improved Operational Resilience.
It's easy to see where the driver for this initiative comes from. The news has been full recently of the impact of technology failures, particularly in the Financial Sector.
The BoE approach recognises that key services will inevitably fail at some point, for a myriad of reasons. The requirement challenges UK FMI institutions to have proven, rehearsed contingency and recovery plans. This resonated with me and promoted some reflections on where, when and why the idea for ICEFLO was triggered.
Where Our Story Begins...
In 2006, before ICEFLO was created, I founded Agenor Technology, a technology consulting organisation. The business started with one global bank as our customer, where I’d worked as an independent consultant for almost a decade. I guess I was a well-known face, trusted by senior management to get some of the tougher projects delivered.
Most of the projects we were involved in related to the refresh or maintenance of existing Production services. This meant fulfilling numerous short project timelines, on systems that were fundamentally critical to the bank.
- Payment Systems
- Internet Banking systems
- Call Centres
- Reporting systems
The single, shared characteristic of all these projects was that they would inevitably mean we’d be working with our Customer to deliver high-risk, time-constrained Production changes – also known as Cutovers.
As our business grew, so too did the frequency of these Cutovers.
Within a year or so, it felt like we were running a Cutover almost every weekend. As anyone who’s been involved in these Cutovers' will know, they are stressful events that typically involve long shifts during unsociable hours. There’s a lot riding on the success of the Cutover and the impact of failure is serious. No-one wants to be on that Monday morning call with Senior Execs when the payments systems have been impacted by your change.
I found myself increasingly anxious when "our guys" were running these Cutovers, knowing that the consequences of failure would be serious for our customer and for our growing business. I had the overwhelming sensation of too much risk, not enough visibility and no way of getting comfortable while these changes were in flight. I’d find myself joining conference calls at 3:00am on a Sunday morning just to check that everything was on track.
The "Near-Miss" Experience
It was Sunday 30th September 2007 when it became obvious that one of our cutovers was in deep trouble. We were upgrading a reporting system that was required by the UK Regulator, with important reports required by Noon the following day.
The 22:00 checkpoint had suggested we’d be finished by 04:00am, with 8 hours contingency in our timeline. Time for bed and a decent sleep. At the 06:00am checkpoint, however, the scenario had dramatically deteriorated. Technical issues meant that the change was now running way behind schedule and there was a real prospect that wouldn’t have the Reporting system up and running in time to ship these critical reports to the UK Regulator.
For the sake of this blog post I'll spare you the colourful language used when we made that discovery. Let's put it this way: Monday was not going to be a good day. Not for Agenor Technology, and not for our Customer.
Fortunately, after a massive recovery effort by a much broader team, the story ended with the full restoration of the Reporting service, just in time to meet the Regulator’s expectations. The stereotypical "near-miss".
I’d long been uncomfortable with the basic tools used to build the runbooks that are used to manage these Cutovers.
As a bit of a spreadsheet geek, I'd created some fairly fancy formulas to try to give me what we needed. "Half-man-half-spreadsheet" was a nickname I’d picked up over the years! However, no matter what I tried, the spreadsheet solution was never good enough to manage the task at hand.
Everyone who's been involved in a cutover will recognise these all-too-common problems:
- Several spreadsheets, with no connection between them
- Multiple copies that are difficult to update
- Status and timing details that are always out of date
- No single, real-time version of the truth
- Error-prone forecasting, often losing sight of an accurate end time
As the old adage states, "there had to be a better way."
ICEFLO, at its core, was conceived as a risk mitigation solution. The goal was to meet the functional requirements of a very specific problem domain and the emotional needs of one person i.e. me.
The brief I gave to my team was, in hindsight, very modest and simplistic:
The Eureka Moment
The team - one designer, one developer and one DBA - set about this challenge with zest. Within a couple of months, the team were ready to show me the prototype of our 'experiment'. I will never forget the buzz I got watching the first demo; the forecast end time remorselessly grinding on as a task continued to run and run and run... way beyond its forecast duration.
The "Eureka" moment came when I saw the status of the runbook transition from Green to Amber as the forecast end of the Cutover strayed beyond a predetermined time! Traffic-light health status demonstrated.
That's how we started what has been a decade-long journey of iteration, enhancements, lessons learned and applied.
71 releases and millions of pounds of investment later, ICEFLO is now a sophisticated solution for even the most complex of Cutover challenges. ICEFLO has been trusted by our Customers for hundreds of successful Cutovers, in various technology scenarios.
By 2016, ICEFLO was recognised by Gartner in 2016 as a "Cool Vendor", bringing something new and powerful into the emerging space known as "DevOps".
Cutovers in Today's World
As we reflect back, it's interesting to note how the world has changed in that period.
Technology is at the core of our world. We are living in a 365*24*7, "always-on" world. Customers expect reliable access to digitally delivered financial services that they use in their personal and business lives.
The teams implementing cutovers are a disparate group, located all around the world and typically a mix of staff and 3rd party providers. While the complexity of the challenge has grown, the impact of any technology failure is greater than ever. Brands, reputations and entire businesses can be seriously damaged by a significant technology outage. In fact, research has shown that 40 to 50% of businesses that have suffered major service outages never fully recover.
In response to this reality, and bringing me back to the opening of this Blog, Regulators across the world are demanding greater operational resilience, specifically of core technology systems.
As C-Level leaders, my guess is that you are experiencing many of the same feelings that I did way back in 2007, albeit on a grander scale. When cutovers loom on the horizon, with all the incumbent risks that major events entail, you crave the 4Cs - a sense of
To get these 4Cs, you need visibility, early warnings and audit trails.
The era of "just getting by", using traditional runbook spreadsheets, is surely over. Regulators expect nothing less than better planning and execution of complex technology change.
Contact us today to learn how ICEFLO can help you and your company get the cutover job done safely, with lower risk, lower costs and much less stress.