The difference between DevOps and SRE

2022-08-18 15:42:00
ZenTao PM
Original 1365
Summary : In most companies, we can find an overlap between the responsibilities and capabilities of the development and operations teams. But what are the differences between DevOps and SRE, and what does each mean? Let's find out the answers together.

DevOps and SRE seem to be two sides of the same coin. They both aim to bridge the gap between development and operations teams and seek to improve the efficiency of software deployment and the reliability of software operation.

Image Source: Harness

Development, Operations, and Reliability

Before the implementation of DevOps, the development and operations teams were two separate teams, each with its own goals. The differences and lack of communication between these teams usually affected the product, which ultimately affected the user's experience and the company's effectiveness.


DevOps has become one of the most critical positions in every company to communicate better and build products.


DevOps is defined as "a software engineering culture and practice that aims to unify development and operations". The term was created by Andrew Shafer and Patrick Debois in 2008, and although it took a few years to become a common concept, DevOps is now used by almost every business.


The concept of Site Reliability Engineer (SRE) has been in existence since 2003 and is even older than DevOps. It was created by Ben Treynor, who founded Google. According to Treynor, SRE is " A software development engineer who starts to take on the tasks of an operations personnel".


Like DevOps, SRE integrates development and operations teams, helping them become familiar with another team's work and tasks while providing visibility across the application lifecycle.


DevOps and SRE advocate automation and monitoring to minimize the time from development to deployment in production without compromising the quality of the codes or products.


Google notes that SRE and DevOps are not so different from each other: "They are not in competition when it comes to software development and operations, but are close friends aiming to break down organizational barriers and make it possible to deliver software better and faster."

The difference between DevOps and SRE

As mentioned earlier, DevOps is about bringing development and operations together, defining the system's behavior, and learning what needs to be done to bridge the "gap" between the development and operations teams.


DevOps is about what needs to be done, but SRE is about how it can be done, and SRE is about extending the theoretical part into an effective workflow using the right methods, tools, etc. It is also about sharing responsibility among others and making sure that everyone has the same goals and vision.


To further illustrate the differences, Google has released a series of videos and posts on the differences between DevOps and SRE. In one of the posts, written by two Google employees (Seth Vargo and Liz Fong-Jones), they explained that SRE:

"SRE embodies the idea of DevOps, with a greater focus on measuring and achieving reliability through the work of software engineers and operations staff."

Seth Vargo and Liz Fong-Jones have explained the similarities and differences between DevOps and SRE through the following five areas.

1. Minimizing the independence of organizational projects

Large companies with complex organizational structures often involve many teams that work independently. Each team takes the product in a different direction and does not communicate with the rest of the company, so they do not get a holistic view of the product in its entirety. This can cause problems during deployment.

The work of DevOps is to reduce silos and ensure that the different teams end up with the same goals. Teams are organized through a common vision.

SRE is no longer talking about how many separate projects there are in the company, it's talking about how to get everyone involved. This is done by using the same tools and technologies across the company, and in return, this can help teams to share ownership among everyone.

2. Accepting failures

Despite the concept of DevOps to prevent failures before they occur, we cannot avoid them. DevOps treats them as something that is bound to happen.

In SRE, failures are counted by developing a formula. In other words, SRE hopes that there aren't too many errors or failures. Two key identifiers measure the formula: Service Level Indicators (SLIs) and Service Level Objectives (SLOs).

SLIs measure the failure of each request by calculating the request latency, the throughput of requests per second, and the number of failures. SLOs sources represent SLI can succeed within a certain time.

3. Implementing a gradual change

More and more companies want to release frequently, update and iterate their products constantly, and keep team members up to date with new and relevant technologies.

DevOps aims to do the same but in an incremental and manageable way. Both DevOps and SRE want to develop quickly, and SRE emphasizes the reduction of failure costs along with doing so.

4. Instrumentation and automation

As mentioned earlier, automation is one of the main focuses of DevOps and SRE, encouraging the addition of tools and automation wherever possible to reduce error rates for developers and operations by eliminating human intervention.

5. Measuring everything

DevOps and SRE teams need to ensure they are progressing in the right direction by measuring everything.

The main difference here is that SRE is based on the concept of "operations are a software problem", which has led them to define some availability metrics.

SRE also ensures that everyone in the company knows how to measure reliability and what to do in the event of a failure.

What does reliability mean?

We have discussed the division of responsibility, the acceptance of failure, and the measurement of everything. Now we need a way to ensure everything is working properly and reliably. In other words, there should be a consistent way to measure each level of reliability.


SLIs and SLOs measure SRE, DevOps teams measure failure and success rates over time, and both are often done using different tools and methods. Reliability is not only related to infrastructure but also related to application quality, performance and security.


Problems can occur in different aspects of the application, and when a failure occurs, we need to have reliable data to know why the problem occurred. If we divide the data into multiple parts, this could include the following:

  • Stack information
  • Variable status
  • JVM status: threads, environment variables
  • Related logging statements (including DEBUG and TRACE in production)
  • Event analysis (frequency, failure rate, deployment, application)

Since this data is vital information, we must ensure that it is reliable and operable.

Conclusion

Image Source: Container-Solutions

SRE has a clear definition and a set of clear expectations. However, DevOps is more of a 'free spirit', with definitions and perspectives varying from one organization to another.


However, there is not much difference between DevOps and SRE teams. Both help integrate developer and operations teams while taking on similar responsibilities and focusing on achieving automation and reliability.


Most importantly, it's all about the data. You need data information to measure success and failure and how to get continuous reliability throughout the application.

Write a Comment
Comment will be posted after it is reviewed.