Site reliability engineer

Site reliability engineer (SRE) is a job description given to software engineers focused on reliability, scalability, and the development of cloud computing infrastructure. SREs develop, maintain and operate software that automates the traditional roles of the system administrator at large scale, such as configuration and cluster management systems, and that support reliability and scalability goals, such as container virtualization and the systems architecture of microservices.[1][2]

Companies employing the SRE title or involved in the SRE profession include Google, Dropbox, Mozilla, Airbnb, Netflix,[2] eBay, LinkedIn, Facebook, Adobe, Microsoft, Uber, IBM and Twitter.[3] At Google, SREs have been responsible for developing the Google File System (GFS), its successor Colossus (CFS), and the Borg cluster management system.[4]

See also

References

  1. Site Reliability Engineering: How Google Runs Production Systems. O'Reilly Media. 2016.
  2. 1 2 Donald FIscher (2016-03-02). "Are site reliability engineers the next data scientists?". TechCrunch.
  3. USENIX. "SREcon".
  4. Charles Babcock (2015-11-11). "Kubernetes Yields 'Operations Dividend,' Still Working On Scalability". InformationWeek.
This article is issued from Wikipedia - version of the 10/28/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.