Site Reliability Engineering

GPTKB entity

Statements (49)
Predicate Object
gptkbp:instanceOf gptkb:academic
gptkbp:abbreviation SRE
gptkbp:appliesTo software systems
gptkbp:coinedBy gptkb:Ben_Treynor_Sloss
gptkbp:emphasizes blameless postmortems
collaboration between development and operations
proactive engineering
gptkbp:focusesOn automation
reliability
scalability
gptkbp:goal maximize system availability
minimize downtime
reduce toil
gptkbp:hasConcept service level agreements
service level indicators
service level objectives
gptkbp:hasRole Site Reliability Engineer
https://www.w3.org/2000/01/rdf-schema#label Site Reliability Engineering
gptkbp:originatedIn gptkb:Google
gptkbp:principle incident response
monitoring
capacity planning
accept failure as normal
automate wherever possible
automation of operations
balance change and stability
eliminate toil
embrace risk
error budgets
implement gradual change
learning from failure
leverage tooling and automation
measure everything
measure service health
monitoring distributed systems
postmortems
prioritize reliability
reduce organizational silos
share ownership
gptkbp:publishedBy gptkb:O'Reilly_Media
gptkbp:publishedIn gptkb:Site_Reliability_Engineering:_How_Google_Runs_Production_Systems
gptkbp:relatedTo gptkb:DevOps
software engineering
IT operations
gptkbp:uses monitoring tools
automation tools
incident management tools
gptkbp:bfsParent gptkb:DevOps
gptkbp:bfsLayer 5