Yours, sincerely embarrassed
July 11, 2013 Leave a comment
Last week a ghost of christmas coding past came to haunt me. Some code that I wrote two years ago was found to be broken. I’m not surprised. When I first heard about the issue, despite the fact that I no longer work in that team (or indeed on that code base, language or even CPU instruction set) one of the first things I did was pull up the code to take a look.
This particular area of code is quite neglected and I knew for a fact that other people had only touched it in passing, without needing to make the modifications needed for real understanding. I’m not saying that other people couldn’t understand it, but as a release was due to happen the following day and that, despite the bug being two years old, when the symptoms are described, it sounds quite scary and speed was going to be important for fixing it. About 1 in 20 times our security handshake would fail. The legitimate user would be temporarily unable to configure the product. Because of this, an impending release (due to be finalised the following day) would be put at risk. So my experience in this area meant that I could immediately narrow the bug to two or three files in the whole repository and I knew more or less how they fitted together. Speed boost, achieved.
So one of the guys who currently looks after this code came and sat next to me while we looked into the problem together. He even talked me out of an incorrect fix once, but I talked him into a different incorrect fix. And then I got (un-)lucky when I was testing it. So when we were called into a meeting the following morning (the day the build was expected to be done) to explain the bug, the risks surrounding the fix and what impact this might have on the previous testing done to at least 5 managerial types and a couple of hangers-on, we were feeling pretty good about it. Yes, there is a problem, but hey, here’s the solution. But as I mentioned before, it actually wasn’t.
The same guy came over to my desk again and wanted further explanation of what we changed so he could help the QA staff get a handle on how to test it. Great! I love to see the developers and test analysts requiring that level of detail for working with each other. But in the process of showing him some ways I thought it could be exercised, we tripped the bug again. I triple checked that I was running the “fixed” version, but alas, I was.
I didn’t want those 5 managers to all stress out unnecessarily, so I decided that alerting them at this point wasn’t a smart move and would likely only garner urgent phone calls, emails and hovering managers, which would distract me from fixing the problem at hand. So we got back to where we were and began to see why our fix wasn’t working. A third engineer came over and started working through the problem with us. He’s a senior, but it doesn’t feel like it, because he’s younger than me (remember that I’m only 28). It therefore didn’t feel like an adult being sent in to change the diapers we had so eagerly filled. Though looking at it now, I’m guessing it wasn’t a coincidence he was passing by my desk (as there is a road separating the buildings in which the two of us work). When we showed him the failing test, he sent the notifying email. The dangers of smartphones, I guess. He then stayed with us and then very competently worked through the problem on the whiteboard and even with the 3 of us there, it still took us a couple of hours to find the fault. Fixing it was trivial, thankfully. We only managed to have one manager on our backs, and we had more or less fixed the problem when he came to hover.
We weren’t too far from being right the first time. We had correctly diagnosed the problem, but our original solution was at the wrong end of the problem. This is not something that comforts me, though. I had sat in a meeting with the PHBs and told them that we had fixed it. When I had to front to a subset of them and explain the current understanding, fix and, more thoroughly, the testing surrounding it I felt like a total fraud. Two years ago, shortly before writing this code, I had won an award in part because of my work in this area of the code base. And here I was, with two extra years of experience, a slightly fancier title and I still making rookie mistakes. Everyone gets code wrong, but announcing fixes inside of code that isn’t exercised as part of the scenario, is not a good look. I was so distraught by this breakage, that I slipped into work on the weekend and wrote a unit test to nail down the shape of that broken function. I’m pleased to announce that I didn’t find any other problems with it.
And this is all well and good, until I noticed something almost identical in my current team’s work. Except that this one was only from about 2 weeks prior. Both pieces of code were written to fix tangential issues, which ended up breaking the original functionality. But the thing that sticks in my mind, is that after I was explaining this to my current PHBs (as the two projects were related and the current team’s project was also going to be released, thus requiring the extra scrutiny on the fix) as I walked away, I heard one of them say something about professionalism. That was the only word in the conversation I heard, so it might have been a conversation about something entirely unrelated, but goodness did it sting.
Once again, some code I wrote didn’t have a unit test around it and had broken in the most pedestrian of ways. And to make matters worse, he is a casual reader of this blog, where I harp on about my thoughts on the usefulness of testing one’s work. So I think that’s the take-home message of this post, each time I fixed something and got tunnel-vision. Both times (I think; definitely at least one of them) the code was reviewed. Writing a test to capture the bug can not only help you prove that the bug has been fixed (thus avoiding the embarrassment of fronting to people who previously thought your work was of a high standard), but can help you without much extra effort, capture the current state of the code. Test-first bug fixing is probably more important than test-first coding (which I often find cumbersome, incidentally, but that’s ammunition for a different post).