dwilson
on Fri May 25 2007 09:58:58 GMT+0100 (GMT)
Contrary to my previous sysadmin blog posts we (unfortunately) don't
spend all our time in the office trying new software and evaluating new
hardware - even we need a lunch break. Instead we're working behind the
scenes on Zimki and our other websites to keep things running happily,
teaming up with our developers to answer any support issues that you
lovely customers send our way and, sometimes, we even complete
milestones in our longer running projects. Honest.
Despite the often recited "herding cats" analogy, creating an operationally
efficient systems team is a pretty straight forward thing to do. Notice that
I didn't say it'd be easy. It mostly requires an understanding that our
workload has two main forms: tactical and strategic. I'm not
including firefighting here - that's a topic for something longer than even
one of my blog posts.
Tactical work is what most sysadmins spend their days doing. Helping
customers (a much nicer term than users), fixing problems that appear,
making small tweaks and changes etc. These tasks are often important to
other people but they rarely help us achieve our own goals or complete
our project work. The projects themselves, which are the strategic part
of the workload, are every bit as important as user requests - they're
just not as visible.
At Fotango the systems team is currently four people
(we're looking for a fifth)
and the work breakdown on a typical day looks a lot like this -
One person monitors the request tracking system. We track customer
issues escalated upwards by our excellent (and very patient) front end
support and all internal requests.
We've found that people have an expected response time for tasks. By
assigning a dedicated person we keep our response time low while not
constantly interrupting any one on a more involved task. The systems
support person can also help with other, less focus demanding, tasks.
The second tactical role is the on-call bunny. She's first line for
issues that crop up from our systems themselves. Problems detected via
Nagios, suspicious lines in logs, performance bottlenecks and load
spikes are all part and parcel of this role.
The other half of the team mostly work on our longer term projects,
attend meetings that require a sysadmin to be present, or perform daily
maintenance. These are often concentration demanding tasks (apart from
the meetings) that are made much easier by having the other two
providing an interruption shield. Of course, a big problem will drag
them back off in to the trenches but there shouldn't be enough big
problems to make this a real issue.
So now you know it's not all glamour in the systems team. Next time your
page loads quickly and with no problems spare a thought for the effort
we've put in so you don't have to.
|