The 20 second rule is not some sort of lewd joke I espouse about my “performances” but is rather something I use in presentations to talk about effective systems management.
It is 3am, an online banking system that one of your team members setup has crashed or is having an exception and you’re on call.
You need to either talk with 1st level support personnel to identify, quantify and rectify the issue (quickly) or actually log onto the server itself and diagnose in situ.
We manage over 6,500 applications residing over some 1650 databases so you know things go bump from time to time. We train our 1st level support staff (2 people on shift – that covers 24 x 7) to resolve around 95% of all issues before escalating to 2nd/3rd level support.
And we can – when I was on call I used to have a 2 minute rule – whereupon within 2 minutes of listening and asking pertinent questions I could resolve the issue and go back to sleep or whatever I was doing at 3am.
Because of standards and where things fell outside of standards – precise/concise documentation. Where the documentation fell down or wasn’t clear – we all would make a point of updating it, because if I write something it’s in my “dialect” and if you read it – you might not understand it. So it’s important to peer review things AND to continuously review/update documentation.
Too often when I visit a non-managed client on a consulting gig the first 40 minutes to an hour is discovering where the config files are and what app talks to what app/database.
I’m not making this up.
So let’s talk standards that have been in use in the past 38 years my company has been making stuff go.
We treat a database and it’s associated applications as an “environment” or “system”.
So let’s say we’ve developed it for PWC and it’s a Web Application Gateway system of engagement (mobile app) into their legacy back end system.
So we have a code for the client – PWC and we generally shorten the application name down to something meaningful or memorable. In this case let’s call it WAG.
Following some of the methodologies of Continuous Delivery we’re going to have levels of the application:
We name EVERYTHING associated with the environment using this naming standard.
This means that even if things are multi-tenanted on a server – at least I know by looking at the root directory what we have on the server. Immediately.
This means I know the usercode for the applications/services and connections to the database are pwcPwag and if I need to I can generate the secure 26 character password (BTW only a few of us can or need to get said password) as all environments have a unique secure password that is NOT visible in config files (more on this later).
This means I know what Active Directory groups have READ access to files, or have READ access to a database and what Active Directory groups have MODIFY rights.
All within 20 seconds.
By knowing what is already setup – it allows me to look for the anomalies and resolve very quickly.
Of course standards change – and that is why I work in IT – because it is forever moving/changing/improving. Standards need to cope with this. Standards need to be measured against the ever changing landscape.
But the cool thing is:
For an application based in Elastic Bean Stalk – I still know it’s name relates to client XXX and I still know that it will have certain characteristics that it would have if it resided on a hosted server within our data centre.
Yes we had to change some of the ways we manage said application but for the most part – something in AWS is not that different to something residing on a VM that we built via our standard build scripts.
And this leads me to automation or “infrastructure as code” – all of our databases, applications, servers and network devices are scripted. The process of creating them is standard and the scripts are stored in source control – we in operations took some of the stuff developers had been doing for years and we’re far more agile than we were even 3 years ago. Right here is where Marketing say “DevOPs!!”. Yeah it is – I just hoped I didn’t have to state it…
Our deployments to applications are standard – we repeat them to make them reliable.
Standardising the deployment means that what used to take a human 4 hours to build (and get wrong) now takes 10-20 seconds to build.
We had been using Team City for our automated standard builds but we were missing something to get the published packages out to the 8 different levels of environments (DEV to PROD).
Enter Octopus Deploy in 2014 – it was the answer to our operational deployment woes (admittedly I had written deployment scripts in 2012- but it still required manual intervention of picking up the files and running/scheduling it).
- No more hand editing of config files.
- No more logging onto servers to look up config file settings.
- No more people watching deployments at 3am – just in case of unknown changes.
- No more monthly deploys – we starting deploying multiple times a DAY..!!
- No more multiple config files on a server for an application.
- No more wondering if a service was setup correctly by operations.
Octopus Deploy allowed us to have a central repository of our config file contents:
Variables for ALL environments are stored centrally and securely
Octopus Deploy meant that DEV didn’t have to see if OPs were available to deploy to environments – they just could. It meant that once a service/application was verified in CI and Build then it could be packaged and installed into DEMO, UAT etc without ANY unknowns.
Automated reliable, repeatable deployments.
We still had change control processes for prePROD and PROD but it also meant the sign off process was more streamlined as we knew the deploy would work. Every time. We took the standardising we had to do for Continuous Integration (for DEV) and applied it to our Continuous Delivery processes (for OPs).
By using our existing standards and applying them in new innovative ways it has allowed us to continue (and sometimes improve) my 20 second rule.
Which is why standards matter.