The development of the SRE role in DevOps

Richard Hoedeman
3 min readDec 14, 2021

--

In DevOps we connect development (Dev) with operations (Ops). Developers stand for “Change” — change, improvement and innovation. Operations stands for “Run” — stability, continuity and reliability. DevOps removes the barriers between development and operations, by focusing mainly on why this is necessary, and thereby explaining what needs to be done.

How nice would it be if you use the mindset of the creative, innovative developer to further develop the reliability of operations? And voilà, that’s where the SRE, the Site Reliability Engineer, comes in as a “new” role to give substance to this. SRE is the implementation of how the DevOps philosophy is put into practice.

O’Reilly — Site Reliability Engineering — book cover

When I made the switch from software application programmer to system programmer at the beginning of my career, I didn’t realize that I was changing to Site Reliability Engineer (SRE). That role didn’t exist back then. Recently I started to study the role of SRE, because it comes forward more and more frequently in DevOps training and implementations. I mainly get my information from Google, from their specially set up https://sre.google/ platform.

What is Site Reliability Engineering?

According to Google, “SRE is what you get when you treat operations as if it were a software problem. Our mission is to make the software and systems behind all of Google’s public services (Google Search, Ads, Gmail, Android, YouTube and App Engine, to name just a few) to deliver, protect and improve, with an ever-vigilant eye on their availability, latency, performance and capacity.”

With SRE you can make a very valuable contribution in a DevOps organization. It is certainly not the same, SRE is not a replacement for DevOps. DevOps focuses on Culture, Automation, Lean processes, Measurement and Collaboration / Sharing (CALMS). DevOps can be seen as a philosophy, SRE focuses on the implementation of that philosophy.

What does a SRE do?

An SRE uses development skills to make the system (platform, infrastructure, environment)(more) reliable. In this case, making reliable means taking responsibility for availability, performance, efficiency, change management, monitoring, emergency response and capacity planning of the service(s).

The principles adhered to by SRE teams are;

  • embrace risks,
  • work with Service Level Objectives (SLOs),
  • eliminate mundane repetitive operational work (Toil),
  • monitor as much as possible,
  • automate as much as possible,
  • make sure releases are consistent,
  • and lastly keep-it-simple!

How do you become SRE?

It starts with the drive you must have to discover where things can be improved and to take this up as a challenge. You have to want to continuously improve. You are curious about the (technical) possibilities and use them to improve and further develop the current situation.

In addition to training SREs, advice is also needed on how this role / function can be deployed in the (DevOps) organization. The SRE must of course be recognized and become the most valued colleague.

--

--

Richard Hoedeman
Richard Hoedeman

Written by Richard Hoedeman

Accredited trainer / coach for Sustainable Green Leadership, Design Thinking, Lean-IT, DevOps, PRINCE2, Agile PM, PRINCE2 Agile, OBM en Scrum.

No responses yet