Tammy Bryant Butow - On failure injection, chaos engineering, extreme sports and being curious - #6
The easiest way to learn about a really complex system is to inject failures in it and see how it behaves - Tammy Bryant Butow
Tammy Bryant Butow is a Principal SRE at Gremlin where she works on Chaos Engineering. She previously led SRE teams at Dropbox responsible for Databases and Storage systems used by over 500 million customers. Prior to this, Tammy worked at DigitalOcean and one of Australia’s largest banks in Security Engineering, Product Engineering and Infrastructure Engineering.
We had a great time speaking with Tammy! We discuss how her curiosity led her to the world of infrastructure engineering, an outage from her early days where a core switch took down half the datacenter, her experience running a disaster recovery test and how it taught her about the importance of injecting failures into a system to make it more resilient. We also touch on advanced failure injection techniques, how chaos engineering is evolving and how extreme sports help Tammy keep calm under pressure. Lastly, Tammy has some great advice for teams looking to get started with chaos engineering.
Please enjoy this super educational and highly entertaining conversation with Tammy Bryant Butow!
You can read the episode transcript here.
- Tammy on Twitter
- Tammy on LinkedIn
- Tammy on GitHub
- Blogs by Tammy
- Kubernetes failure stories
- Slides from Tammy’s talks
Music CreditsVlad Gluschenko — Forest
License: Creative Commons Attribution 3.0
- Oliver Leaver-Smith - On how "just a monitoring change" took down the entire site and resilience engineering - #5
- Cory Watson - Leading observability teams at Twitter & Stripe, how to succeed in a new org, effective ways to advocate for your team and more - #16
- Bruno Connelly - Building and leading the global SRE org at LinkedIn - #14
- Lorin Hochstein - On how Netflix learns from incidents, software as socio-technical systems, writing persuasively and more - #13
- Todd Underwood - On lessons from running ML systems at Google for a decade, what it takes to be a ML SRE, challenges with generalized ML platforms and much more - #10