What is Site Reliability Engineering and How Does it Differ From DevOps?

October 8, 2020

Unsure of the differences between Site Reliability Engineering and DevOps? You’re not alone.

To break up our discussion into the basics: DevOps has a strategic, tactical and operational focus, whereas the Site Reliability Engineering (SRE) disciplines’ focus is mainly operational. SRE feeds its monitoring input into the feedback loop, which is an integral part of a healthy DevOps culture and organization. You can read about context and differences here.

Most people involved in software development have come across the word “DevOps.” My own unqualified comment to this is that most people also know that the word DevOps is a contraction of the words Development and Operations.

Some would probably think that it is modish marketing that simply explains the banal: that software development of a gadget and operation of the software gadget are different disciplines.

They are inextricably linked, as both disciplines are each other’s prerequisite. Development and operation merge—hence “DevOps.”

The Agile Manifesto

I have a deep respect for marketing, so I don’t want to judge whether it is modish or not.

I will only note that the term DevOps has gained ground, and that using “DevOps,” an industry has found another, and for some, better, way to articulate the same values as were stated in the Agile Manifesto back in 2001.

The sentence: “People and interaction over processes and tools,” as seen in the agile manifesto, can be perceived as an abstract non-sentence unless you’re intimately familiar with the pitfalls of a software development project.

It’s perhaps easier for some to understand a paradigm like DevOps, which clearly explains that development and operation are connected, and that you must therefore adapt accordingly in your planning and your organization.

DevOps? Site Reliability Engineering? It Gets a Little Confusing

It’s my experience that not everyone who knows about DevOps has heard of Site Reliability Engineering (SRE).

When thinking about the concept and exploring it a bit deeper, you may get a little confused, because it looks, smells and tastes almost like DevOps, when in fact, it is not the same.

The confusion is understandable as DevOps and SRE overlap.

I will do my best to explain and hopefully solve a few mysteries for the curious.

 

In the Slipstream

DevOps emerged from the slipstream of the agile movement that took off in the 2000s.

The concept of DevOps dates back to 2009, cf. Wikipedia.

Since 2014, a team of authors and companies selling DevOps software have released a ‘State of DevOps’ report outlining how DevOps is being implemented by companies around the globe.

If you’re looking for a manifesto for DevOps, you will search in vain.

I suggest thinking about DevOps as a kind of industry standard that everyone agree makes sense and that is open to interpretation, whereas one can in principle just go to www.agilemanifesto.org to read the conclusions on what one bunch of extremely intelligent people in the software industry got out of an extended weekend in Utah back in 2001.

DevOps Frames a Set of Values

DevOps frames a set of values that differ from agile values; although, they do draw on the agile manifesto and LEAN principles.

But it is important to understand that there is no fact list of the values that DevOps incorporates – there is no manifest.

Instead, the values of DevOps came into being when large software developers in the 2000s needed tools and methodologies that could help them scale their organization and the software that would support hypergrowth in companies like Google, Amazon, Netflix and others.

In short, they needed methods and process frameworks that could concretize the abstract paradigms of the agile manifesto into understandable and more action-oriented quantities that could be implemented in the organization.

 

Agile and DevOps focus on both the organization and software development disciplines. On the other hand, SRE primarily concentrates on software development and software operation without underestimating or downplaying the values behind DevOps and agile development.

Tools and disciplines such as continuous integration, continuous deployment, continuous testing, configuration management, application performance monitoring etc. are all concepts that are rooted in agile values.

As such, they are not new disciplines, but the disciplines and tools have been interpreted, matured and further developed by the frontrunners of DevOps to match the needs that will quickly become apparent as the consequence of when an organization choose to follow the DevOps path.

Agile and DevOps focus on both the organization and software development disciplines. On the other hand, Site Reliability Engineering primarily concentrates on software development and software operation without underestimating or downplaying the values behind DevOps and agile development.

 

Since everything is connected, it would be a misunderstanding if someone were to claim that ‘we are an SRE team/organization, DevOps and agile is not relevant for us’.

SRE doesn’t make much sense if the organization or team doesn’t recognize and try to live by agile values – a bit like DevOps also doesn’t make sense if you establish a DevOps department to link development and operations departments together.

Then you just end up with three departments that don’t talk to each other.

And those kinds of organizational shortcomings don’t increase the quality of the end product at all.

SRE Without Knowing it

You can easily have a team doing Site Reliability Engineering without really knowing it, because as you know, SRE is what happens when you hire a software developer to do what was once called ‘operations’.

If you’re a software developer or system administrator and are employed in a DevOps-like position, then you can also write SRE on your LinkedIn profile if you have spent a few evenings studying the subject to understand the small but not insignificant differences.

Site Reliability Engineering is a set of tasks and competencies that you already engage with in your daily work.

As Google itself writes, the ideal SRE candidate is a skilled system administrator with a flair for and interest in scripting and programming.

Here is One of the Small Differences

One of the small differences between DevOps and Site Reliability Engineering lies in what is being measured.

Both DevOps and SRE will measure everything, yet, each discipline is more interested in certain metrics than their counterpart.

For example, a good metric in a team could be the lead time for error correction – how long does it take from the time I check a bugfix into the source control until the error is corrected in production?

Not in a development environment, a test environment or an almost-production environment, but in the actual production environment. Is the lead time 10 minutes? 10 hours? three weeks? More than six weeks?

When discussing why things take a long time, i.e. why there is waste in the value chain, the finger will end up pointing towards the organization.

This will happen at a time when the team no longer has the influence or mandate to make the decisions that can minimize potential waste.

When the curve for lead time on an activity breaks or flatlines, is the team then in a position where they can do something about the problem themselves or will they hit the glass ceiling due to obstacles they can’t overcome themselves?

More Targeted Operation

SRE and DevOps are similar, but SRE metrics, unlike DevOps, are more targeted to the operation of an application, whereas DevOps also focuses on the organization and the organizational friction.

According to Google’s own book on SRE, four golden metrics exist: response times, traffic, errors, and saturation/starvation of resources.

None of these points directly outwards towards the organization, but a review of them at a retrospective meeting will, of course, quickly point to the external framework as an obstacle that the team can’t overcome itself.

This could be the case if the team has persistent problems with response times because the organization, for various reasons, doesn’t see itself as able to allocate the necessary hardware to the amount of traffic received.

A Larger Part of the Value Chain

I will argue that the small difference between Site Realiability Engineering and DevOps also lies in the fact that DevOps measures a larger part of the value chain as opposed to SRE, which has a narrower focus closer to deploying the code for production.

Whether you do DevOps or SRE on a job is in a pragmatist’s and my view two sides of the same coin depending on the context.

If you’re fixing a bug in the infrastructure or automating a manual process, I would call it SRE.

If you’re working to convince the organization that it should invest in, e.g, automated regression testing, then I would call it DevOps.

If you do a little bit of both and have to choose, then I would call it DevOps, if nothing else, because DevOps covers the whole spectrum and it’s what most people can relate to.

The fundamental difference is that DevOps has a strategic, tactical and operational focus, whereas the SRE disciplines’ focus is mainly operational. SRE then feeds its monitoring input into the feedback loop, which is an integral part of a healthy DevOps culture and organization.

After all, the most important thing for the individual developer is to know that there’s a difference and to understand a little about what they each are about.

The fundamental difference is that DevOps has a strategic, tactical and operational focus, whereas the SRE disciplines’ focus is mainly operational. SRE then feeds its monitoring input into the feedback loop, which is an integral part of a healthy DevOps culture and organization.

Further Reading

If you’re curious and can spare five minutes, I recommend What’s the Difference Between DevOps and SRE on YouTube.

If you’re looking for texts and an in-depth explanation of SRE, you can find Google’s own book on the subject for free right here: “ SRE Book ”.

If you want to know more about DevOps, then hurry up and buy “ Accelerate: The Science of Lean Software and DevOps ”, which concludes on years of research into DevOps and provides evidence of how a healthy agile culture and continuous deployment disciplines correlate with high quality and end-user satisfaction in the deliveries provided by the development organization.

DevOps Engineer, Agile Coach

DevOps consultant, Agile coach and full-stack software developer with 15 years of experience in the software industry.

Passion for holistic thinking, automation and implementing data-driven decision processes.

Core competencies

DevOps and Agile practices
Scalability
Security
Software development
Software architecture

Past experience
Energinet, LEGO, DGI, Just-Eat

We can Help!

We provide consultancy on how to create great digital products.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Learn more about our consulting services

Get in touch with us