Friday, February 17, 2017

Cache Killer

Incoming email.

We'd hired a new programmer right out of college. He'd just graduated with a CS degree from one of the "elite" universities.  He was assigned to my team, where he made himself comfortable criticizing the codebase, everyone else's code, and how no one else knew what they were doing.

To be fair, our codebase was ugly. The company had experienced sudden growth over the previous two years and we hadn't made the time to get to a lot of our technical debt. And it wasn't necessary to.  At that time the crap in the codebase has not gotten in the way of making money, and had not fouled the user experience. That came with a lot of "hot" solutions we'd implemented  for features, for example: we had three layers of cache.

It looked like a madman had designed this architecture, but it was due to a constant push to do the minimum required to get product out the door for a panic-charged startup. Now that we weren't a startup, the architecture looks horrible - but it worked.

Mister Know-It-All declared that this was "stupid" and offered to "clean it all up" by deleting two of the three cache layers.  

Everyone on the team freaked out. We told him in very certain terms not to do it - there was a reason for all three caches. We walked him through the history of the codebase, and the product, and the whys, and advised that at this time not to change it. It wasn't worth the time of our group to do work that didn't add to the growing income of the company.

Five people, including our CTO, told him that although we agreed with him that it was ugly, he should not to do it, but if we ever got to changing the cache system, we'd include him in the project. That wasn't enough for the master complainer, who bitched and moaned about how dumb we all were and how he could improve everything in a few hours of work.  He'd done harder problems in Uni.

"No", was the answer.

A day later, every person on every eng team got a series of system alerts on their phones. Response time, database load, queue length, all of it had gone to shit.

To make a the story of a long and horrible night short:  the junior developer had gone ahead and removed two of the three cache layers and deleted the corresponding code; all without looking for code dependencies, API use, you name it.  Slash and burn. And he pushed it to production (we'd had an honor system up until that point -- that changed the day after this debacle).

"I think I fucked up." he said.  We didn't address that, and simply reversed his "work".

When the CTO came around the next day, the first thing out of elite-school-graduate's mouth was "Your systems sucks. We need to rewrite the whole thing, this is insane and stupid."

Clearly we'd mis-hired.  He was let go that day. Not for making a mistake, but for not heeding everyone's rational calls NOT to do it and for fighting with us every step of the way.

It made my life crazy for a few days, repairing things in the aftermath of the screw-up. But the office was calmer and less stressful without someone constantly reminding us how stupid we all were.

No comments:

Post a Comment

Comments are moderated. Abuse isn't tolerated, you know the drill.

No company names or real names please.