incident management

Charity Majors - On database outages, journey as a co-founder, thriving under pressure and growing as an engineer - #7

Charity Majors is the co-founder and CTO of We had a lot of fun speaking with Charity in this lively conversation! We learned about her journey from being an engineer to co-founding Honeycomb, what it was like being on-call when she was only 17, and staying calm during production incidents. We talked about various production outages throughout the episode. Charity also shares what it takes to build an awesome engineering culture, the engineer/manager pendulum, and qualities Charity looks for when hiring senior engineers.

Oliver Leaver-Smith - On how "just a monitoring change" took down the entire site and resilience engineering - #5

Ols is a Senior Devops Engineer at Sky Betting and Gaming. In this episode, we discuss how a seemingly simple monitoring change ended up taking down the entire site. We also talk about chaos and resilience engineering. We discuss how the team at Sky Betting and Gaming conducts fire drills (chaos engineering exercises) where they not only test the resiliency of their software systems but also their people systems. We walk through a recent example of a fire drill, how they have evolved over the past few years and the lessons learned in the process.