Our SRE team is experiencing serious on-call fatigue. We've tried reducing alert noise and improving runbooks but the pager still goes off too much. What strategies have actually worked to reduce burnout?
We restructured on-call to have a "primary" and "secondary" rotation with a dedicated "on-call engineer of the week" who does nothing but handle incidents and improve observability. Reduced burnout significantly because people know they have a full week of focus work ahead.
3/8/2026
Two things that helped us: (1) Blameless postmortems with actual follow-through on action items, and (2) An "error budget" policy — when we burn through our error budget, we freeze feature work until reliability improves. Leadership buy-in was key.
3/8/2026
Controversial take: if your SREs are burned out, you might have too few SREs or too many unreliable services. We hired 2 more SREs and also started requiring service owners to be on secondary on-call for their own services. Game changer.
3/8/2026
Sign in to answer this question.