In functional terms delivering a through-life operation requires a thorough understanding of the business and an intimate working knowledge of the design, build and technology used in the IT systems that underpin the business. A close working relationship with core staff up to and including board level is essential; extending down to key suppliers of hosting and communication services. Continued delivery of power, web and other communications together with specialist technical 2nd or 3rd line support are fundamental; it all sounds obvious, but is often not the case.
At the operational level and depending on the size of the operation the ‘Live System' should be considered operational, fully documented and locked. A separate “Test and Reference” rig should be available alongside the Live System. This is for testing and development to enable a complete ‘roll back' to the last known good point if unforeseen conditions occur. I cannot stress the importance of this too strongly. The size of the test rig can vary provided it is an exact replica of ‘Live' in every respect, it can ‘scale,' and its hardware, software, database and any other key components replicate the ‘Live' system in miniature.
Software is the functional brain of any modern technology and some of it is constantly being tweaked and improved to meet market demand or business opportunity: that is what it is meant to do. From a purely Operational perspective a good CIO will have to ask the awkward question, “where have all of the component parts of this software come from?” Developing software produced at a price, at speed under commercial pressure leads to shortcuts with various components and routines being downloaded from the web…..all of which contain a backdoor somewhere. My message to operational managers is, be careful! Even internationally reputable suppliers are discovering ‘malign' pieces of software buried in their own O/S the cyber-implications of which are only now becoming apparent.
Any development work, upgrades, patching or suppliers' fixes for vulnerabilities (having been carefully considered and authorised after all the implications are considered) should only be installed and thoroughly tested away from the Live System on the test and reference rig. Only when given a clean bill of health on the test rig should ‘upgrades' be moved to the Live System. Not doing this is, in functional terms, frankly, playing Russian roulette with your business system.
To manage the operational system in all its varying processes and procedures will require an operational focus. This may be a Network or Systems Operations Centre (NOC or SOC) with qualified and empowered staff and a capable manager on shift. Details always depend on budget, scale and size of the operation but with careful planning can be downsized. Most importantly they require up-to-date ‘as built' documentation and configuration of the complete system and all of its components, O/S, application software and Database(s) with on call specialists.
Importantly they need to watch and monitor the system live and not be watching TV with an over-reliance on hardware or software devices to act as doorkeeper and sentry. CIO's and CISO's can spend huge sums on products to defend against ‘cyber'-threats and vulnerabilities. CISO's are always looking for products that reduce your exposure and suppliers are happy to help. The master question has to be “do these help the operations staff know what a probe or an attack on their system looks like?” Not all ‘attacks' are what they seem. Recently I looked at the network logs on a large operational system, the shift had reported 10 minor probes as indicated by their very sophisticated and expensive Intrusion Detection and Protection setup. The logs told us the real story, 410 probes. The IPD/IDS was not configured properly and training skimped.
The CIO must have the ability to spot and forestall problems early and inform senior managers early where they need to know. Operational challenges are not all cyber. Systems can fail because of power or network failure with no obvious standby alternatives. Recognising that you have been hacked or breached is important and very topical and dealing with the recovery and fallout a whole subject in itself. You may not have been hacked; system failure can be a simple failure of an obscure but vital component or a culmination of operator failures.
As an example, consider a complete failure of a highly acclaimed and innovative system that had simply switched itself off and would not restart. This was a failure of the operators to recognise a problem for what it was; system warning flags “impending out of date licence countdown” on the night shifts. These were simply deleted as an aberration. After six months the system announced that the Beta licence had expired and immediately shut down in mid-operation. The supplier was contacted and it transpired that, yes Beta software had been installed for ‘go live' and not replaced with a fully licenced copy. Cutting corners to go live! Operations staff must not simply dismiss an indicator but follow through.
As a final note many problems are not ‘logical', that is purely technological; they can also be:
Physical due to poor security, Human - people who do not follow the rules and corporate policies may be malign (Insiders) or just plain stupid. There can also be Procedural Structural and Organisational issues that get in the way of good Operational and Through Life management which can be very testing all round but also great fun.
Contributed by Tony Collings, chairman, The ECA Group Limited