Charity Majors - On database outages, journey as a co-founder, thriving under pressure and growing as an engineer - #7 | Transcript
You can see the show notes for this episode here.
This transcript was generated by an automated transcription service and hasn’t been fully proofread by a human. So expect some inaccuracies in the text.
Ronak: Alright charity super excited to talk to you today. Welcome to the show. So one of the things we wanted to start with is we’d heard somewhere, you went on call when you were 17, or you started going on call at the age of 17. So I have two questions there. What was this first job and how did you get it? And what was it like to be on call at that age?
Charity: It was fine. I mean, when you’re young, you literally don’t know any better. They can do anything to you. And you’re like, yeah, this is just how the world works. Okay. No, it was, I was just sitting in for my university. I don’t know if they still do this, but back in those days they just gave students route. And I, my first, I think my very first job was running the math department computers. And from there I moved up to. We’re running all of the university’s computers. And then later I ran the CS department computers. And in between there, I worked for a local web development firm running their computers and all before I ever owned a computer of my own.
Ronak: And also that you, I think you were studying music and then you also studied electrical engineering. So it, Oh dude,
Charity: I also studied ancient Greek and Latin and you know, literature. I really, I was diagnosed last year with ADHD. So, you know, it’s a belated acknowledgement of my life story.
Ronak: When you were on this job, managing these computers, did you pick things up along the way or first of all, did they interview you for the job? I’m curious.
Charity: No, the honest, the oddest truth is A dear friend of mine had been the math stat department’s computer admin. First, he was graduating. They asked him who he recommended. And he told them that if they hired me, he would back me up at any time. I didn’t know something, he would help me out. So I owe a huge debt of gratitude to the spread, to just kind of like randomly saw me struggling and was like, she could use a hand. So
Ronak: that, that is very sweet. So when you were on this first job as you mentioned, you’re managing all these computers and you had wrote, and you said every other
Charity: student, I knew nothing. Let’s be clear. I Jack Jack shit.
Ronak: Do you remember something that happened? You wish hadn’t happened?
Charity: Oh, goodness. Yes. I don’t know, man, like every, like every day is a new horror. When you work with computers, it’s, you know, it’s it’s, you never know what you’re doing, you know? And, and I, I, I mean, I think that th the most of these stories have in common is, you know, so yeah, like the hard drug story, like I had never used hardware. I had never owned a computer. Right. From my perspective, they were things that I SSH did from the terminal and my dorm basement. Right. And ending, faced with hardware. And then, you know, there’s the time when, you know, night, when I’m looking at the little web development shop and, you know, their custom software goes down, it’s called like info arc or something. And like the old Testament had gone and nobody knew where it was deployed it. And, you know, all of these stories end was just like, people just like. Looking at me expecting me to help and me being at least as lost as dead and just kind of going well, put somebodies, gotta do it. And you know, and I guess the thing that I learned from all of them is just like a great key to success in life is just to be willing to be the person that they can rely on to figure it out.
Ronak: Oh yeah, that is well said for sure. So moving on to some of the writing part I think I came across her blog and there are some really good articles at, is I learned a lot from them. One of the things that. Also I found interesting. Was the domain off your personal website? It’s charity.wtf. Yes, it is. It certainly is. So I think the first time I saw that TLD was on your website and I was like, Oh, that’s really interesting. I’m sure the content is also super interesting, which research it is. I’m curious what prompted that TLD? Was it in response? I just, it,
Charity: and I knew it had to be mine
Ronak: makes sense. You’ve been, you’re the co-founder and CTO of honeycomb. And I read somewhere that you always liked to be more on the side of let’s do stuff. You mentioned somewhere that you, you want, you’ve been the person who is always a person where if someone has an idea, I’ll make it work for you and we’ll run the study. Yeah.
Charity: I’ve never been ideas per se. I’ve never been an ideas person. I’ve always been a person who if it’s worth doing. I guess I’m an implementer, right? Like that’s, that’s where my heart is. I love optimizing performance tuning. I like figuring it out. I don’t even like writing software. I like understanding it. You know, I like making it better. I, it, which is why it’s very, it’s, you know, I I’ve never been one of those kids. Who’s like, just start a company someday. Like I really kind of lows the whole founder industrial complex room and just like, Oh my God, you started something you wasn’t so much better than no, honestly, you know, whenever I see a founder with a sealable title, now I like internally roll my eyes and go, huh? You didn’t deserve that shit either. You just gave it to yourself. Nobody thought you deserved that for you.
Ronak: All right. Where were we? Oh, no, no, no. That’s okay. What, what I was trying to ask is this is something that I wondered as a CTO, where does the role of a CTO look like? Like what, what does your typical,
Charity: whatever the hell you want it to be literally there. Isn’t like, I mean, and this is true for, this is not true for every C-level role. And this is more true for the CTO role than I think any other C-level role that I’ve ever seen because. You know, I will say that there is kind of a broad differentiation between, you know, CTOs who grow up with the company who, you know, they’ve started it and, you know, versus the ones who are hired in the ones who are hired in, I think have a bit more of a template, but it’s just, I think it was a reflection of the fact that, you know, every company is a technology company these days, but what that looks like and what the needs of the company are for the person in that role are so different. Like for, for some companies, you know, it’s like the person who writes all the hardest code, you know, and all this stuff. And, and, and I think that that’s, that’s, that’s losing favor because I think. You know, it kind of inevitably has to become more of a organizational and visionary and people roll in a little bit higher up the stack and everything. But still it can be, it can be, you know, like the person who is intently involved in like the what’s the next generation for this company’s, you know, needs and like figure out all the things will actually make it work and filed times three weeks. Or it can be, you know, a largely ceremonial role, like, like where I’m just like giving talks. It’s like, you know, just bullshit that, like, I can’t even remember the last time he has on a machine, which makes me very sad. And it’s like, it’s almost a marketing role there. Right? Where, where it’s education and it’s about, you know, and honestly, the reason that my role as CTO is so outward, focused go brag a little because we have our shit together. There’s literally nothing for me to do. I’m our engineering work because it runs like it’s, they’re, they’re so good. And they’re so tight. And my, you know, And this, this has been you know it’s been a process like everything, but I think that your role as a leader is always to do ever needs to be done. And it’s always to look for, you know, it, your team, your team’s job is to execute on the incredibly full plate that they have in front of them. And your job is to be looking at it. What’s next? Like how do we get more customers? And for us, for honeycomb, you know, so much of our success is going to be tied to can the world get better at writing software? Because most teams can’t really make effective use of honeycomb fully, you know, because they aren’t doing continuous delivery. So I am totally focused right now. I’m trying to help everyone else, you know, get like a decade farther along in their journey to writing back or software because cyber we’re talking about CTO and I’m just like off the record, all different directions. It’s, you know, I don’t think there’s. I don’t think there’s any one way to do it, but I, that’s not the same thing as saying that there’s no wrong way to do it. There are many wrong ways to do it many, many, probably way more wrong ways to do it than right. Ways to do it. And it’s just that there’s no general answer.
Guang: I am curious the sort of what that progression looked like. Right. Because I imagine in the early days, both of you and your co-founder y’all were both technical, pretty, you know, head down and just like building it out.
Charity: I’ve been, I was CEO for the first three and a half years. Interesting. And Christine and I just swapped places about a year, year and a half ago.
Guang: And how ha ha what has that been like?
Charity: Oh, way better. Oh my God. I got the grade. I really won that trade CEO is the worst fucking job. I hated every second of it. I think I cried every year for every day for almost two years.
Guang: What made it so tough? Is it just like so many things that you never knew how to do and you had to learn on the job or.
Charity: The answer is many things, but I think to really stand out for me personally first of all, I really, really, really emotionally challenging to give up or sacrifice some of my cherished identity as a technical person, be a person in the lead technical role and, and watch my team like being tasked with solving those problems and have it be my job to go deal with lawyers and rent and, and hiring sales people, you know, and it just really fucked with me. I had nightmares. I had about being unemployable about how nobody would ever hire me again to do anything but PM work about how there’d be this gap of the resume. And like I would fall behind the rest of the world and stuff wasn’t rational, but it was all like very bubbly and myself subconsciously let me be very unhappy. And secondly, Wow, let me do three things. Secondly you know, it, it is this just ultimate stress of if the company fails, it’s your fault. You can’t share that burden with anyone. And no one really understands it and you’re kind of getting the shit kicked out of you all the time, because all the problems come to you. Like you’re never, you’re never able to spend time doing things that are working well. It is your jobs when a hundred percent of your time on the most fucked up shit. And you can really lose perspective about overall health and how I, you come to starting to look like a success story. I will remind you for the, for the first three and a half to four years, we were like skin of our teeth surviving, like just beating the odds every year. And everybody, every year I was just like, We haven’t failed yet. Like, this’ll be the year. I know this is the year when we fail. I just believe that from the Duke and most other people did too. I think. And, and so that was on me. Like that was my fault. I had dragged the lead, these beautiful people who I love so much often do this failed crazy thing with me. And I was responsible. They could have been making like three times as much money. They could have all these things and instead they were beating me or following me, and I was leading them off this cliff. And all right. The third thing that was difficult was that I had to be traveling 50% of the time. And so I was never doing a good job. And honestly, the other thing, that’s five things I realized, but then the biggest thing was that I’m just not temperamentally suited for that role. And, and this has been a real, it’s kind of a painful journey of self discovery and self knowledge, but. I like, I don’t know if you are familiar with like the four tendencies, which is like, what motivates you deeply there’s, there’s like external expectations for you that what everyone else expects of you and there’s the internal expectations and goals you set for yourself. And a lot of what defines me personally is how do you respond to those motivations? And you know, Christine is, is like the upholder type where she gets off on, like, if someone else has a goal for her, she will fucking hit it and feel great about herself, but she has a goal for herself. She will fucking check that thing off and feel great about herself. She is like the ultimate, like checklist maker, structured person loves all this stuff. I am the literal opposite of that. I am the type that. Rejects external expectations of me, what exactly the opposite of them. And also psychologically tricky, but resect rejects and resist my own expectations of myself. As soon as I set a goal for myself, the last thing I’m going to do. And I think it is very challenging for my personality type to be in the CEO role, because I think companies like small children dependent thrive on structure and predictability and, you know, showing up at the same time at the same place every day, you know, and I just, I was doing constant, doing deep psychological warfare with myself for three and a half years to try and fit into that box.
Guang: Interesting. And was there like an aha moment where you were like, shit, you know, it’s time to try something else and then,
Charity: Yeah, I got so deep in it that it was, the choice was made for me. And it was the right choice. Because I was so unhappy and I’m just like, but I get so fucking stubborn, the harder I’m failing, the more, the more I dig it, I do not know how to quit. And I do not say that to brag as ultimately, it’s not a good thing if you don’t know how to quit. But you know, it worked out the way it worked out. Things have stabilized since been better. We would just survived if I hadn’t been CEO for the first three and a half years and we, I don’t see what it survived if we hadn’t done this switch when we did
Ronak: so. Yeah. Thanks. Thank you for sharing that. And being so open about it. I, I, I don’t think it’s easy to talk about this.
Charity: It’s a pathology of my personality. I think it’s not actually, I was raised not talking about me, not sharing feelings and all this stuff, and it’s been like, Part of my self-work as an adult has been like over coming that. And someone say that I have overcompensated so
Ronak: well, there’s so much to unpack when and what you shared. And I’m I’m, I want to come back to some of those things, but one thing which you touched on just now is that you identified that you are, you are very stubborn. I’m I’m curious, has in, in which situations has it served you really well? You, you mentioned that you wish you wouldn’t touch the it in some cases, but in what cases you’re like, yes, this characteristic of mine helps so much.
Charity: Oh, I’m ultimately unstoppable. If I want to do something because I will, I will do it. Damn the consequences and the side effects and everything. It’s not necessarily a great personality, but I. I’m a woman in tech and, and, you know, and that just kept me here. And, and in fact, the, the aspect of my personality that thrives that feeds is fueled almost solely by fuck you. That’s not a bad thing for me. Like the war people tell me to get out of check, the more Dubberly I am here. Right. Like I feed off that shit. It’s crazy. But yeah. Tell me I’m going to fail. Do it.
Ronak: Well at this point honeycomb is succeeding. So, and congratulations on the recent CDs beat on that. He just did. I’m curious, how involved are you in fundraising as a CTO?
Charity: Okay. That’s the amazing thing about seeing being Nazi. You, you had to pick and choose what the fuck you want to do. It sounds like freedom. It really is.
Ronak: So changing a subject a little bit. I was reading one of the interviews and I want to read a small part of the answer that you said I want to, and want to follow that up with the question. So, one part where you read was, Oh, you said was I’m really good at firefighting and staying cool in the middle of a crisis. I never panic or freak out and make bad decisions under extreme pressure. That is an incredible skill. It doesn’t come naturally to a lot of people. Well, not to me, for sure. So I’m curious, how did you develop that skill? Or
Charity: so honestly, last year I diagnosed with ADHD. I know the world’s least surprising diagnosis. It was, to me, it was it. I had never considered the fact that I might have an attention. Disorder. But apparently that’s kind of, it’s one of the side effects is like your brain is so just like buzzing all around, but would you, when you give it adrenaline or crystal meth? Yeah, Adderall, it slows down and it can focus. So I can’t really take any credit at all. It’s just kind of a byproduct by psychology, but yeah, like the moments that I, I remember feeling like the most alive are the ones with the sights. Dad, if I don’t fix it, The company will go under and there’s no one else who can do it. And I just got, I just go to my happy place and I’m just like, cool. And it’s just like, I could focus. And I just like, I, you can’t stop me. I, it, it’s, it’s wonderful. And I, I love those moments, but I know that’s terrible. But they’re wonderful.
Ronak: It’s amazing that the entire team can rely on you to be so common, stable in those moments. So for people who are,
Charity: it’d be great, if they could rely on me to calm and stable and all the other moments
Ronak: w w w w what I was going to ask is for folks who are kind of dipping their toes in the operational journey and whoever. Been on on-call, but haven’t seen many outages yet. And who are still learning to maintain their composure in these kinds of moments where there is too much pressure. How do you recommend the devil? Oh,
Charity: that’s such a great question because first of all, although I do think this is like, this is to some extent, biologically determined. I also think that is entirely a learnable skill. Like I have seen, I’m sure that I’ve improved it, you know, just by being, you know, doing it so many times and I’ve seen other people just dramatically. I’ve seen, you know, interns come in who just like my friend, Jeff, who you work with, although you’ve never met has these great stories about being an intern, just like fucking freezing, you know, I don’t need, you know, yes, it is a learnable skill and I think it’s a great skill to learn because when, when, when your adrenaline is pumping, it generally makes people make way worse decisions. And the worst thing you want to do in the middle of the crisis. Is have two crises or make it worse. Right. And so, yeah, like I think that, you know, training yourself to like, not react as the first date, like, right. Like adrenaline spikes and, and our, our little lizard brain is like, ah, I must jump. I must react. I must do something. I just training yourself to like, stop, take a breath. Take two, nothing is going to the chances of you making the situation worse by not doing something in five seconds. There are very small chances. You can make them way, way worse are way higher, right? So like just take a few seconds and like deep breathe, deep, deep breath until you feel your pulse kind of return. And, and also like take control of the environment around you. If people, if the reason you’re freaking out is because everyone arrives, right. Who’s bringing out, like we’re kind of herd animals. It’s going to be incredibly hard to keep your head. If people just like are just like over your shoulder. Like, if people are like, if people are causing other people to become tense, just like, you know, take control of the situation, say, you know, speak in a, in a, in a slow, calm voice, you know? Let’s let’s sit down. Let’s take breasts. Let’s think about what’s going on. Can we have a moment of silence, two or three deep breaths, you know, and, and then you regulate your voice. W we respond to feedback loops, right? And so when you’re in the middle of a feedback loop, that is winding people up, that your job is to consciously take control of the situation and start a feedback loop that winds people back down right. Gets them back to their normal self. So anything you do to kind of slow dampen modulator is, is really going to have an impact well beyond you. And you’re saying it’s, it works. Even if you fake it, it works 1000% as well. It’s a faking it, and your heart is beating the entire time. So.
Ronak: Well, you ones that are great, greater tricking themselves. And this is such great advice. I’ve actually seen a lot of seasoned engineers who, when there is an outage and everyone is kind of like, Oh, this might be the hype. Everyone is just trying to help and share hypo. It might be the end of
Charity: phase one, if we can’t get the data back. And they’re just like, okay, and you hear them, some of them do there’s total boys. It’s like, they’re at a kindergarten. Right? Just like, okay, let’s take a look at this. This is an interesting question. What can use, you know, and just like very slow. And I imagine like, if you like cosplay or whatever, slip into the role of a fireman, right. Who just like rive to the scene, just been like, okay, what what’s going on here? You know, is it, is today a good day today? Today’s a great day to die. You know?
Ronak: I, when I, I remember when I hired my first outfit, that was still shadowing someone as an on-call and. The person I was shadowing was a seasoned engineer and he had lots of experience with dealing with incidents. And when things broke, it was Saturday, 6:00 AM in the morning. And I freaked out out of bed saying what just happened. And when we got on Slack, we got on a call and this person is like, okay, let’s look at these logs. Let’s do X, Y, and Z. And I was like, how can you be so calm right now? Where I’m just freaking out thinking it’s not working, it’s not working. We need to get this back up. But, or time just, you know, taking that breath or two is an incredibly helpful thing. Yeah. Talking about crisis, we love discussing or stories or production outages. Yeah. On the show. And I want to lead with one. So, so I was listening to one of the podcasts that you were on. And you mentioned at some point in your career, you had to drive to a colo and flip a DB switch. W what, what happened.
Charity: Oh, this is very routine. This is just like, you know, another Saturday night. I did this for, for years, you know, I worked at like at a remote hosting company, but this is in the days before we had like re remote hand software. There’s during the day there was a guy sitting in the, who would do this, but if it happened after hours, well, that was on. And that was on us to like call a cab, go to the colo, do this. Or, you know, when I was at Lindsay lab for four years, we, you know, we had our own, our own co-space with, you know, with racks of machines and you know, something went down, we scheduled like a monthly trip to the colo to do, you know, we didn’t go down there every time. There was a hardware problem, but we had some points of failure, like the private, my sequel server, for which yeah. If it went down no day or night and just had to go down there and flip the switch. I, I, I will never, again, Touch a server. And sometimes when I think the world’s just going to shit, I just, I just think about that and go, you know, not all bad. Yeah. Th
Ronak: there were days when he has to drive to a colo it’s, it’s much better with the cloud these days. Someone else has to do that. So talk, talking about my SQL and some of the outages I, you have been our honeycomb as a company has been really open about sharing incident reports. I read, I think you wrote the first major outage incident report on your blog, which was amazing. Yeah. When I read that, I was like, Oh, that is so cool that you’re willing to share and talk about that
Charity: have my entire career I’ve been, so I’ve shaped it. Like when I’ve been instructed to say, you know, publicly about our outages. And I just remember like, Having to, you know, have be vetted by the CTO and the CEO and they’d go over it and just like, you know, wordsmith and, Oh, we can’t see this, we can’t venture the vendor’s name and other shit. And so, you know, when we had, how do you come with like a, huh, finally, I get to fuck this up my own way. I’ve never, proof-read a single post board. I trust my team. They do stellar work. I would. And I’m always like more detail. Like the more I have learned so much from like AWS as postmortems fucking phenomenal. And the thing is that like all, I think everyone who’s editing their, you know, doing the micro-managing things is doing their own teams, a disservice, because nothing builds confidence with other engineers. Like just being. Assuming you’re not to dumb shit, just like on the regular, nothing does confidence by Jen, like just being transparent and telling him exactly what we’re at
Ronak: work. Oh yeah. Yeah. It better stressed with people so much. How, how has it helped you in hiding? I know I’m digressing a little bit, but
Charity: no it’s been, it’s been tremendously helpful in hiring, you know, it’s, it’s hard for me to tease that out from, you know, how do you come? I’m going to brag from it here too. We have never had a problem with hiring. And this is the moment when an industry like RVC is tell us every time, every time we have a board meeting, are you guys able to hire, keep up? We’re always like, yeah, not a problem, man. We’ve never had any recruiters. The only the oldest between engaging recruiter for, I think was our our sales PTP. But I think that people are drawn to honeycomb because we try to practice. Oh, I see humility. And we would rather, there, there are a lot of things we’re just angry about in the tech industry. And so we don’t do them, we do them differently. And then we talk about them and we don’t claim to be perfect. We’re not perfect. And I think that sometimes people come to honeycomb and they’re disappointed because it’s not perfect. But I can guarantee that we will be transparent and we will fail differently.
Ronak: Well, it’s amazing. So coming back to some of the crisis again well, I should stop saying crisis. So
Charity: shit storms should attract buyers. The dumpster needs.
Ronak: Oh, those are much better than just the word crisis. Are there any other outrageous our war stories that you could share with us?
Charity: Ah, yes. Absolutely. Would you like me to open my mental file? Mark Mongo DB for awhile. So when honey, honey comb started out, it was, it was, you know, in the early days of Mongo DB, there was just one lock per replica set. And it, and we were basically multi-tenant system, right? Like we had 60,000 mobile apps. After, you know, a couple of years over a million, by the time I left. One lack. They’re all, you know, trading for. And, you know, the sharding stuff didn’t work for us because sharding works really well. When, you know, you have big datasets, you can S you can Stripe across sharp, but we had with lots and lots of little ones. And so like the hot stuff , the problems were manifest. That said like parts would never have existed as a company if it wasn’t for Mongo DB. So I w I will give them some credit. Like, they, they were definitely honest, some stuff and. Side note, product marketing, I believe is the reason that Mongo DB is still alive as a company, the marketing, the community building it, it bought them time for the technology to grow up, to fulfill its promise. I still have the t-shirt that says mom going to be as web-scale. So like let’s not forget their catastrophic first decade or so. Yeah, so like, let’s see the, the, the kinds of outages that we had, you know, if someone’s app would hit the the iTunes top five and it would take us down immediately and then we’d have to go figure out what was wrong. God, it’s almost hard to come up with it. Like there were, there are the ones where like, we hit the replication bug Oh, this, sorry. I’m just like, Oh God press traumas. Here is a story about my SQL. I think maybe the mommy stuff is still a little too fresh in a process. So at Linden lab, we use SQL for all of our, you know, all of our user data. And we tried upgrading from four, one to five and we’ve been running on Five-O for all the secondaries for an entire year. And we had done all of the benchmarking and the, you know, the suspension and all that stuff. All of their benchmarks showed that definitely my simple five, five was going to be way faster than four one. And so we flipped the switch and the entire world, what day? And. You know, we kind of got it back up. It was limping along, you know, and then finally we realized for whatever reason, it was not faster for our workload. So we had to, because there was no backwards, you couldn’t do backwards migration on the data. So we had to roll the world back for a day or two and bring it up on the old 4.1 primary phobic, all the secondary. It was, it was, Oh my God, it was so painful. And so my job after this kid catastrophe was to figure out why and make it safe. And I spent almost a year you know, I wrote some, some software do like a capture replay, so I could catch the 24 hours with the traffic and then replay it at various speeds, you know, using a bunch of clients and all this stuff and then train them. So, okay. I validated that for whatever, you know, for our workload, it was actually like. It was actually 1.4 times slower, not, you know, 20% faster. And then started, you know, and this was around the time that, you know, Mark Callahan and the team on my SQL at Google published the NODB patches, but you buy more than one core and a bunch of the state. So like I upgraded to pre-coded my SQL and I tweaked with those things and bought so, and so got it up to about parody. And it’s most up with the disc and whatnot whatever. And finally got it to a point where you know, I was like, yes, this O w we six queries that were, you know, the, the, the, the query planner, queer planner had some bugs and we actually did some queries that were underperforming. We rotated. And a year later I was like, yes, I guarantee you, this, this, this will work. It will, it will. It, it will be faster. And so we flipped and. It did only a year worth of work. And of course the reward is nothing happened and everybody else is like, what? And then of course the part of the story that puts the cherry on it is, and six weeks later we got SSDs and of my work would totally pointless if we had just waited and just did the SSDs. So ops work.
Ronak: Yeah. And what do you mention about the migrations and nothing happened? It’s amazing, right? Like migrations that don’t make any noise are one of the best migrations, but then they get talked about so rarely because we’ll no one knows a migration happened. Yeah.
Charity: Well, you know, and the thing is the difference now is like a decade ago. Right. And the difference is that now. You know, it’s cause it’s time I wrote this big, long blog post and, and it was great. I wish I could find it. It’s it’s been destroyed the way back machine, but there was no community then, you know, there wasn’t really the Twitter community there. Wasn’t really the, you know, and I feel like we’re getting better at this. You know, I feel like we’re getting better at sharing, you know, more widely the, the, the, the story of the things that went well. So we can learn from each other there and, you know, reusing some of that work because all that year of work that I put into my CQL also, you know, it all like poof, because there was a, you know, I tried to get someone interested in taking over the tools. I couldn’t maintain it. I moved jobs really after that. But yeah, and this, I feel like this is this, isn’t obviously a trainable skill because we have all learned to be like ecstatic when it’s quiet. Right. Were like, yeah, this is amazing. And so like, so many things like. It’s just a question of being conscious and aware of it and then choosing what you celebrate and choosing what you, you know, what you value and, and like your body’s, your body’s internal reward systems like dopamine and all this stuff. Like they catch up as long as, as long as you do that. So I think so I kind of love that we are the, we are the, we’re kind of the, the contrary engineers, w w we celebrate when everyone else is quiet and we’re quiet when everyone else is celebrating. I like that about us.
Ronak: Yeah. Like the red deaths, not the green ones. So I was reading one of the incident reports published on the honeycomb blog post. And one of the paragraphs, it was mentioned that honeycomb does burn rate based alerts or kind of SLO burns. So for our listeners who might not be completely aware of burn rate, can you briefly describe what a burn rate is?
Charity: Yeah, it’s that thing that makes you only get paged in the middle of the night, if it’s really important. It’s that thing that makes you able to do most of your engineering work during the day instead of it, 2:00 AM. So I think the Escalade and bird based alerts are, it’s kind of the next, seeing a lot of interests. Like it’s, it is one of those things where you kind of do have to be this tall to ride this ride. If you just, if you have a, like an org where, you know, everything’s just a mess, it’s not going to do any good, not going to help. It’s not, you should spend your time. But if, if things were working pretty well you know, but you just have a large and noisy, you know, system, like many of us do, it’s a really impactful way to invest in engineering effort and get back a lot of a lot lower frustration, you know, higher time, time invested to value. So what it is is, is just the concept of You don’t page about symptoms like, Oh, this just fired, you know? Oh, this CPU like. Fuck a CPU alert, no one should ever, ever have to think about a CPU alert ever get turned them off. They’re worse than useless. They’re literally worse than useless. They’re burning people out and not doing any good, but, but like there’s also the next generation of like individual nodes that go down and the system behind the load balancer that didn’t give a shit about that at 2:00 AM. There are two categories of things that can go wrong. There’s the things where it’s perfectly fine for it to wait till the morning. And then there’s the things where either. Users are currently being impacted or they will soon, right. Or users are currently being impacted. Let’s stick with that. Right. Because you shouldn’t actually page anyone about are going to be impacted soon, should paid when they’re impacted now. But, but like when your systems are well, architected enough that you’re, you’re kind of in this zone, just bullying to such a state where nothing pages anyone out of hours, unless it is, you know, you’re in a state where you’re burning your budget. Right. You’ve got your, your SLO for your company, which is, we think it’s pretty much fine if 99.9% of the time people are getting an okay. Right. Well, if that’s, if that’s 99.8 I don’t think I want to know about that until morning, right? But if it’s 60%, yeah. I’d like to know about that now. And you know, and if it’s, you know, if it’s, you know, 98, well, I don’t know that’s up to you, you know? How long can you, can you go with an elevated error rate before, which is investigated, but it’s so much more, you know, it’s a way of like stepping back from the front lines of the firefighting and being like, all right, we’ve grown up a little bit. We were now dealing with thresholds, not up or down, we’re now dealing with gray areas. Right?
Ronak: Yeah. Makes sense. It reminds me of one of the things that you said nines don’t matter if users are not happy. Yeah. It was so much Strutton that how, what prompted this, by the way, this, this predictor
Charity: statement, it’s just a random thing that came out of my mouth one day and I wrote it down and and I, and I put it on my slide at which conference was that and the woman that’s in like. Well, red state one flood, not sorry, strangely. Yeah. And, and you know, it just it’s cordless. This is one of the proudest moments of my life was that year. At SRE card, every single presenter had that quote on their slide card. Like every single one of them had that closure slide. I was just like, I don’t think anything is ever going to top this career moment for me.
Ronak: Yeah. I mean, I certainly think it should be a binder in front of, or on every office.
Charity: I made a bunch of like stickers with like rainbows and hearts and unicorns and stuff. If you guys send me your addresses, I will send you a sticker. Like look back.
Ronak: We love that. That goes
Charity: to the host, not to everyone listening to this podcast. Yes,
Ronak: yes, yes. One quick bit on the error budgets actually. So you mentioned that on the burn rate, you’re seeing, how soon are you going to Eileen theater budget on bad days? We use one word while in their budget. So in those cases, like what are some of the practices you’ve seen work? Well, when you’re trying to protect the next cycle and say, okay, these are some of the things we’ll do so that we don’t violate it again in the next cycle.
Charity: I mean, this is just your reliability work, right? Like it’s like, how can we perform better in a slightly degraded state? Right? Like if you, if you’re going down every time, well, that’s obviously where you start. Right. But then if we’re staying up, but you know, you’re losing 30% of all requests. Well, you know, how can you, how can you not do that? Right. How can you, like, you know, do a fail over more, more rapidly? Or how can you, like, you know, you know, offload those feeling queries or retry or something, right. It’s just like the nuts and bolts of like what we do. Like resiliency is not about, you know, having less errors. It’s really not. It’s about being able to, to absorb like more air impacting events and recover from them more gracefully and with less human intervention
Ronak: in, in distributed systems where something is almost always breaking. It’s just that sometimes
Charity: you get a weird layers of fuck. That is the problem. More layers of it mean it’s just like, this is the world that we live in and the endless, this is the ground upon which we stand and there are many holes in it and it’s fine because together we’re strong.
Guang: Changing gears a little bit. I think a question that every engineer asks themselves at some point is, do I stay icy or do I try management? That’s actually how I first came across your blog a few years ago. You described this engineer management sort of pendulum. Can you tell us more about that?
Charity: Yeah. That’s the most popular blog posts that I’ve ever written. I like it. I love that people are still coming up to me years later, like, Oh really? And that makes me really happy because I think we’re really right for sort of a re-imagining of the relationship between engineering and management. Yeah. I, I, I just, I firmly believed that, you know, the best engineering leaders that I’ve ever worked with or known or learned from all the people who’ve spent some time in management, but don’t like go there and stay, right. They go back and forth a few times over the course of their career. And I started thinking about, and I was like, you know, the best managers that I’ve ever worked with the best line managers in the world, I think are never really more than. Three four, five years removed from doing real hands on work themselves. There’s a, there’s an intimacy with the familiarity, with the subject dollars that you just can’t fake. You know, and I think that as much as, as much as some people love to proclaim that it’s great when they have a non tactical manager who just like gives them the keys to the kingdom and never criticizes, or has an opinion about their technical work. To me, that’s not a great thing. You know, it’s not a great thing. It’s not a thing to brag about, and it’s not a great thing. It’s a great thing to have engineering managers who, you know, don’t own the decisions, but who have good taste and good judgment and, and can help room and suggest and, and, and help you grow. And that’s something that, you know, I think that, I think that there’s really two, if you want to go into management, you can do one or two things. You, you should either. Go in and decide to climb the ranks, or you should go into management and then go back and forth a couple of times because otherwise you lose your edge and you’re not as effective for the people that you support and you serve. And likewise, I mean the other side, like the best senior engineering leaders, that the ones with the most, you know, the most empathy, the most, the best, the best sense of how to motivate people and inspire them the best sense of like how to take this massive project and break it down into sub projects that somehow for everyone on the team gives them something that challenges them. But doesn’t overwhelm them. You know, that’s like something that will push their boundaries, but not so much that they, that they saw like that isn’t. Art form in a human form, just, just as much as it is a technical thing. And all of those engineering leaders, evenly expense and time and management, right. They spend some time where it was their job to do nothing, but think about the human interaction parts and how to connect, you know, our engineering work to the business side. And like, I just, I think that. I think that I also think it builds so much empathy for band, for among engineers who have spent a little bit of time, it demystifies it. Right. And when you haven’t done it, it’s kind of like, you’re just like, you’re waiting to be tapped for the promotion or the opportunity to move up, be better than your peers and all the shit acting doesn’t want you to like, Oh, this is just a different pile of shit. You don’t actually have that much power, you know, it’s just different. And it’s just like the glow vanishes from it. And you have so much more empathy for your own manager. Who’s like stuck, like pressed between two, between the little shits like you below and like their manager, they’re just trying to do the best they can. And you know, you’re just like, Oh yeah, poor guy. So I’m a big believer in, you know, Try it, if you have, it’s also, it’s nothing but a collection of skills and practices. It can be broken down and done through many different configurations. Right? And so even if you don’t try it be a manager with a title, I think you should learn some of those managerial skills ask to go to your manager and ask what, what, what can you delegate to me if your managerial skills, can I run meetings for a quarter? Something like me running. I shit, you not is like one of the most underappreciated undervalued skills. It’s so much time is wasted for people to sit in. Meetings are not run by somebody who knows what the fuck they’re doing. And it frustrates everyone. And nobody really understands why, but it’s just like those grains of sand that kind of gum up the works, you know? And of course if your Workday, so yeah, learn some of those skills.
Guang: Yeah, what, what I really took away from that, I think is the empathy aspect that you talked about. I think you really can tell, like, as an engineer, when the when the manager is empathetic, like they’ve gone through it and then they understand how these things work versus when they’re just kind of indifferent where they don’t know what the hell is going on. So they’re just relying on, you know, whatever you come up with. And that’s like a
Charity: different, like, I, I believe that managers who don’t, they should be on call if they can, if they can’t, they should regularly pinch it, including overnights and weekends, or they don’t deserve to be called sales manager. Because if you’re asking someone else to wake up on behalf of the company, you’d be the first to jump off that cliff.
Guang: I’m going to need to quote you on that at some point very nice. So, so, and also, I really liked this series on your blog posts called questionable advice where it’s like a vice column. And you know, where you show like the emails that you’ve got asking for advice, and, you know, you write about like what you would actually tell them. So, first of all, what, what makes a good question? You know, I used to get a ton of emails and I really want to know how I can make my email one day, like stand out and, you know, get featured on your blog.
Charity: It’s just something that makes my mind just sort of, you know, it start, start spinning. And I, and I usually like write back to them first and then I respond to almost all questions that I get so seriously, if you want to ask me a question, just DME and I will respond to pretty much all of them. It’s mostly the ones where I was just like, Oh, that was kind of a good answer or something. Or if it’s just like a really common one and I’m like, ah, I can put this on my blog. And then fewer people will ask you this. But yeah, I, I encourage all questions. I I’m very, I, I responded pretty much, but, but honestly, like it’s another, another thing that does like really, it’s just when the person is clearly in some angst or some, you know, crisis or some, you know, like this is why I also have a link on my Calendly for people to sign up. If they want to just like, have a phone call and talk about their career trajectory or, or something. I will, I have benefited so much from so many people in this industry just like giving me their time and their advice. And I’ve been so fortunate. This is something that’s been kind of weird for the past couple of years is, is like someone asks you a question. Shit starts coming out of my mouth. And then I stopped thinking of. Oh shit. I didn’t know. I knew that that kind of sounds like more people should benefit from all this fucking knowledge. And so, you know, to the extent that I can sign up, I’ll have a call with you. And I will, I will just talk about with, because people don’t usually reach out unless they’re in some kind of like turning point in their career or they’re not happy and they aren’t sure what to do next. And I’m happy to share at least what I, what I know.
Guang: Let’s be nice. Mike usually comes out the other way, but
Charity: that happens to you. I just, I just immediately ignore it, forget about it.
Guang: So, so a recent post in those series is titled the trap of the premature Sr. I was like, Oh my gosh, that’s me. And my first job, can you,
Charity: can you tell us more about the situation sort of your house. Yeah, totally. Yeah. I was talking to this kid and, and, you know, he’s at this and this job, first job I’ve been there for like two or three years. And he was like, at the top of the fucking mountain, like he ever, you know, he was the most senior person in there. Everything about how the system works. He gets pulled into like all the high level planning meetings, like, you know, it just, and he’s able to do less of those IC work of his own, but he’s, you know, it’s very validating to be needed and to be wanted. And, and, and he’s like, I feel like, you know, this might not be the best thing for me, but you know, how, how do I, I feel like maybe we should switch jobs, but then how could we be sure to get the same comp, how can I be sure to get the same stature? I don’t wanna just end up at the bottom of the totem pole. Like, you know, you know, doing Aaron board stuff, you know, low-level stuff when I’ve gotten used to having this very high level of influence and, and, you know, stay in what I do. And my advice to him was like, get the fuck out of there. And it’s too soon for you to feel that way, honey. Like you haven’t earned it yet. You know, and it’s not going to be good for you. You’ll start your growth, right? Like first, you know, five, six, seven years of your career, at least you need to be optimizing for, you know, your career is the single biggest, greatest. It is a multimillion dollar appreciating asset, and you should manage it for the long run. Right? If you get hooked on those feelings of being, knowing everything and like be having a high step and everything, the first couple of years, it’s not going to be good for you in a couple of years. So don’t get used to it. Get out of there.
Guang: I think, yeah, like what really struck me was like the familiarity versus like,
Charity: we’ve all been there, man.
Guang: so what, one thing that you touched on, you know, is like, you know, having worked at different places and, you know, different sort of problems, it gives you perspectives and that’s sort of a, what makes you more senior, which I think is absolutely spot on. What other aspects skill sets do you expect from a senior engineer? So if you were hiring for one.
Charity: No. And I want to be clear that in that article, I was like, quit your job. And I’m not saying that everybody has to quit their job for two or three years, ways to maintain that growth within a single company, by moving teams, you know, by, by experiencing different things. But, but the point is just to be mindful of it. Right. And, and optimize for that. Yeah, for me, like a senior engineer first and foremost and this college, my backend bias, which I think is partly fair, but not entirely it’s rooted in production. Right. It’s, it’s, it’s, it’s for their instincts. You know, I, I, I want to be able to trust their instincts, which means that they’re, they’re a little data Corpus needs to be trained on reality production. Right. Like, I don’t think, I don’t feel anyone can code as a senior engineer if they don’t know what happens to their code after they hit merge. Right. Like, you need to know how does it get out there to users and, and, and, and be able to like, you know, look at it in production and say, you know, what’s happening? Is it doing what I expected it to do? Cause anything else, if weird and do some, you know, at least basic debugging, I think that’s table stakes for, for anyone. Eh, I, I think the, some, some knowledge of exposure to data modeling, you know, some, some knowledge of, you know, cause cause code is, and this is all about like, what, what do you do, you know, beyond the code itself? Right? Cause there’s, there’s like the data structures and algorithms, but like, you know, that’s just step one. There’s also like, you know what? You need some, some sense of like what data is backing this and what the code that you’re shipping is going to do to it and, and where those, you know, edges are. I think for sure, it’s also table stakes. That’s mostly it, you know, I think that I think would be on that. It’s just like personally, I, I like to see that someone has, has been at least two jobs, you know, and they sometimes call this T-shaped, you know, that you’ve shown that you can and have gone really deep in one area and that you have a broad understanding. I liked that. It certainly benefited me. I think that my, my tea, I went really deep on, you know, my SQL debugging. And then later on, I was able to very quickly and efficiently, like, reproduce that on my mama to be in Cassandra because I had that, that depth of context and stuff.
Guang: What about the self skill side? Like
Charity: what, what does that mean? Yeah, I mean, you. You, you, you have to be, you can’t be an asshole. You know, people can’t read Dabi to talk to you, you know, because that creates it’s like, it’s like, it’s like having a, you know, a note as a service that regularly turns requests away. Like you wouldn’t accept it and your system, you can’t have people that regularly just warp and distort the flow of communication by causing people to avoid them. Right. I think that, like, I do think that there are many archetypes for senior engineers. Some people will get very dogmatic about you must do more mentoring than writing code and it’s, you know, I really don’t think so. I do think that, you know, you need to be able to. Here, here’s the here’s here’s you must be this tall. You must be able to explain what you’re doing. You must be able to explain it to people who aren’t experts in it. You should be able to talk through your code. Here’s the interesting thing. So we select very heavily for communication skills, I would say above and beyond technical skills, right. If somebody can clearly and, you know, and, and, you know, basically talk us through their solution and why they chose it and, and what the trade-offs were and you know, what else they might’ve tried. And came up with the wrong conclusion. I would, I would rather hire that person than a person who came to the right conclusion and, and couldn’t really explain why or how they got there a hundred percent of the time. And, and that’s actually, we’ve found, been very it’s it’s as excluded a lot of people who write break code. But like, we, we won’t compromise on that because it’s such a part of our, of our teams cultures is to be able to explain. And I think that leads to a greater understanding and it certainly leads to greater shared understanding for the team. So I think that that’s a bare minimum for senior engineers and just to be able to clearly explain yourself and, and to, you know, be willing and eager and friendly to help you know, those around you. But then, you know, some, some icees get more and more and more powerful, but they, they, they just grow at being, you know, masters of right. Tons of code very fast, or they, you know, there there’s an archetype. Do you find mostly at large companies where they may only write five lines of code in the quarter, but they saved the company $10 million and no one else could have found them and wrote them, you know? So there’s that archetype then there’s, the architect is much more, about, much less about, you know, writing product software, product features, but it’s much more about understanding and, you know re you know, the master of the migration and the rearchitecture like, get more performance out of the system on the second or third rewrite, you know? And, and, and then there’s the, there’s the type that is, is very much about coaching and mentoring and about spreading those skills and being the glue in the team may not write much code at all, but cause like, you know, bring up all of the engineers around them. And I think all of those architects, no shame of me, any of the, be who you are, who you want to be, but just be sure to find a place that needs that type from you. That’s
Guang: well said. And I want to go back to the first post of your vice column, where someone asked you after being a manager, can I be happy as a cog? But I want to change it up a little bit and ask instead, and this is something that we touched on a little bit earlier before it’s like, after being a technical co-founder or a CTO, can you be happy as a call?
Charity: I can’t stop. Wait. I, I, I started this company. My, my serious intention was to go in a quarter and write, go code for two or three years. Cause I was so gotten burned out. And so I was starting to feel so wildly my technical skills. I’m just like, I just wanna. I just want to like put my headphones on and write software for a couple of years. I got to do that for three months. So my hopes and dreams are on hold. I, I want to be a cop, you know, some of the easiest people to manage are people who’ve been managers, some of the most wonderful people to have your team, people who have been co-founders because they’re so chill and so happy to not be them anymore. They’re just like, dude, whatever you need from me. And then they clock out at 5:00 PM and go home, you know? And they’re just happy as a clam with their wife and their daughters. I’m just like, eh, hate, you know, I, I, it it’s, this is it’s just like being a manager, right? Like once you’ve been there and it’s, the shine is off, you know, I. I these the last five years then the hardest of my life and I sacrifice my marriage and it’s just like, I’m going to be so stoked to be somebody’s cogs someday. You have no idea. So the, the, the, the flip side
Guang: of that, I think I was also interested in kind of knowing is that I feel like a lot of the friends that I talk to who are engineers who want to start a company, they kind of have that same fear. I think that you mentioned a little bit in the beginning where it’s like, okay, you’ve say I end up doing a more, you know, maybe CEO as a founder to kind of help out with the business side, why I need to learn how to. Do sales, then it’s going to be really tough. Like, you know, three, four years, maybe it doesn’t work out and then come back and you know, how hireable I am. Am I like, after all that? Like, do you have any sort of, I guess advice because to me, I think having gone through that, like very quick, try to try my luck and starting a startup. Like, it definitely gives me a lot of perspectives in terms of like what matters and how to think about problems on a high level is such that you benefits the business from, you know, like, but I wanted to kind of get your, like, do you have any advice for people who are kind of in that position
Charity: where like, you don’t really have to worry about it? Like, I really think this is mostly pathology that affects women because they’re seen as, so dubiously tactical. They start with I’ve never seen a dude get most because we’re just like, Oh wow, you tried to do a startup. Right. And, and that’s so cool. Right? Like, I don’t think you have to worry about it. If you’re, if you’re a male, I’ve never seen men get dig for it. You’re just like, Oh yeah, you’re a little bit, you know, you can brush up on your, your skills, but you know, I don’t think you need to worry about it. You know, starting a company is hard. There’s not a lot of glamor to it. I, I don’t recommend it to anyone. It’s really, it’s a stupid thing to do. But you know, people get the bug and they want to try it. They have an idea that they believe in only do it as long as you believe in what you’re doing. Right. But if you believe in what you’re doing, Then you have to be willing to do whatever it takes. And yeah, that means sales. That means marketing. And that means not being snobby about this shit. Right. Like I honestly think that we, one of the things that we did right at honeycomb was being very vocal, but how much we respected business people to bring, you know, like it, I expressed to the snobbery about sales or marketing instant out in our interview process. We don’t hire people who don’t look their business counterparts as peers. And this is incredibly rare. And, and I think that it’s like, it just rolls off of us and fumes when this is how we feel. So I think there’s a retraining that you need to do if you’re starting a company, retrain yourself to really see the value and the necessity and, and, and the glory and these things that we have been kind of taught to like shit on our entire careers. And, and like be part of the narrative that’s pushing back about against, you know, sales being lesser.
Guang: Yeah. That’s that’s that’s really well said. Okay. Charity, we’re we’re done with all the easy questions. Now I have a very hard questions for you next. So we know that you’re a very experienced podcaster. You have your own show, the observability podcasts. We will then kidding.
Charity: No, you’re very experienced at whiskey.
Guang: We haven’t figured out the shipping yet, so you have to wait for that. But the running and us and I, we just started, you know, we were a complete news. So filming experienced the veteran podcast or to us toddler at cast a podcast podcast or to podcast, or how would you rate our podcast on a scale of one to 10? There’s no wrong
Charity: answers. This has been one of my two or three favorite podcasts of all time. So you guys are on a roll.
Guang: That was you know, to our listeners that was not paid for, you
Charity: know, completely spontaneous. And yeah, I, I, I will say you had the edge on pretty much any other topic because I’m such a sucker for horror stories and production. So I showed up already pretty like, yeah, this is going to be a good way to, my sweet spot.
Guang: I was hoping you say like, Oh, I’ll give it a six. So then I can do the whole line about, Oh my mom, you know, kind of give it like a slightly better score, but we’ll save that joke for later, before we wrap up. Is there anything else you’d like to share with our listeners?
Charity: We didn’t talk about observability at all, which is really refreshing. But you know, if people have, you know, production problems and who does it you should check out honeycomb because It’s all it’s. If you have large systems, you know, problems and you probably need observability. So I think I’m not IO. Oh, and we have a really generous, free tier, which is cool. Like you can run actual, real production workloads on it and not pay for anything. So that’s pretty cool. Right. And ha topical SLS. We have the only implementation of SLO that I have yet seen in the wild that, that does it correctly. Like according to the Google SRE book where, you know, it’s, which is cool, because it means you can go from very high level, you know, the, the number of burning down directly to, you know, which events are failing and what is worked about them from the baseline events and just at a glance, you go, Oh, All of these areas are because of this one note in this one, or put aside to do and blah, blah, blah, or this, you know, it’s just super dope. It says he built really quickly from high level to low level and get back to bed.
Ronak: Yeah, that sounds really awesome. One thing which I would just say is we purposefully skipped the observability topic because we know you have a podcast on it. You wrote about the slots. We wanted to keep this a little interesting for you.
Charity: Yeah, this is great, man. I love it. I’m so glad you did. It’s refreshing.
Guang: Thanks. Awesome. Thanks so much, charity. Really appreciate your time.
Ronak: Yeah. Thank you so much.
Charity: Thank you. It’s really great.