You have two people who know how the system actually works. Call them Alex and Jordan. When something breaks at 2am, the first question isn’t “who’s on-call” — it’s “is Alex available?” When a new engineer gets stuck, they don’t check a wiki, they Slack Jordan. When someone asks how the authentication system works, the answer is “ask Alex.”

Alex is tired. Jordan is already looking at job listings.

This is not a documentation problem. It’s a concentration-of-knowledge problem, and it has real business risk attached to it. You’re two resignations away from an engineering team that can’t operate. You already know this. What you don’t know is how to fix it without stopping everything else.

Why Documentation Projects Fail

You’ve probably tried this already. You announced that the team needed to write documentation. Maybe you blocked out a documentation sprint. Maybe you added it to every ticket: “definition of done includes updating docs.” And then six months later, the wiki has twelve pages, eight of them outdated, and nobody looks at it.

Documentation projects fail because they treat documentation as a separate activity from engineering work. They require engineers to stop doing things and then write about the things they just did. Engineers hate this. It feels like overhead, it interrupts flow, and the results are almost always written for the author rather than the reader — full of assumed context, missing the things a newcomer would actually need to know.

The fix is to change where documentation happens, not to demand more of it.

What Actually Works: Documentation as Residue of Real Work

The best engineering documentation I’ve seen is created as a byproduct of work that needed to happen anyway.

Onboarding as documentation creation. When a new engineer joins, their job for the first two weeks is not just to learn the system — it’s to document what they learn, in real-time, as they learn it. They’re the perfect author because they have no assumed context. Everything that confused them, every question they had to ask Alex, every thing that wasn’t written down — they write it down. At the end of week two, they produce an onboarding guide that becomes the starting point for the next hire.

This does two things: it creates documentation written by the person who most needed it, and it creates a cultural expectation that documentation is part of how you join the team.

Architecture Decision Records for new decisions. Going forward, any significant technical decision gets a one-page ADR: what the decision was, what options were considered, why this option was chosen, and what tradeoffs were accepted. This isn’t a history of how you got here — it’s how you capture the “why” going forward so the next person doesn’t have to guess.

Runbooks for incidents. Every time there’s an incident or a production issue, the resolution process generates a runbook: what symptoms appeared, what was checked, what the fix was. After three incidents, you have three runbooks. After a year, you have a meaningful operational knowledge base — and you got it for free, because the alternative was having Alex on-call forever.

The Triage Problem: What Do You Document First?

You can’t document everything at once, and trying to will produce low-quality documentation of everything rather than high-quality documentation of the things that matter.

Prioritize by bus factor and by consequence. Ask two questions for every piece of knowledge:

  1. If the person who knows this left tomorrow, how bad would it be?
  2. How often does this knowledge get needed?

High consequence, frequently needed, held by one person: document this first. Deployment process, production database access, third-party API credentials and configurations, the business logic for your core billing or data processing workflows. These are the things that, if Alex leaves on a Friday, will cause a crisis on Monday.

Low consequence, rarely needed, held by multiple people: document this last or not at all. Not everything needs to be written down.

The Conversation You Need to Have

There is a version of this problem that is also a management conversation you’re avoiding. If Alex and Jordan are the only people who know how things work, it’s partly because knowledge concentration benefits Alex and Jordan — it makes them indispensable, and indispensable people feel secure. They may not be actively hoarding knowledge, but they’re also not actively distributing it.

Have the conversation directly: “I need you to help transfer this knowledge to the rest of the team. Not because I’m worried about losing you — because I’m worried about what happens to you if you’re the only one who can answer these questions. The on-call burden, the interruptions, the inability to go on vacation — those get better when more people know what you know.”

Most of the time, this lands well. Alex doesn’t want to be the person who gets paged at 2am forever. Jordan doesn’t want to be the only one who knows how deployments work. They just haven’t had anyone make the case clearly.

A Realistic 90-Day Plan

Month one: identify the three most dangerous knowledge concentrations and schedule pair programming or structured knowledge transfer sessions for each. Record them if the team is amenable.

Month two: implement onboarding documentation creation for your next two hires or new project starts. Start the ADR practice for any new architectural decisions.

Month three: review what you have, fill the gaps on the highest-risk items, and establish who owns ongoing documentation maintenance.

You’re not going to have perfect documentation in 90 days. You’re going to have less risk and a team with a better habit.


If your engineering team has critical knowledge locked in one or two people and you’re not sure how to change that without disrupting delivery, this is exactly the kind of structural problem I help companies work through. In a 15-minute call, I can help you figure out what to prioritize and what a realistic plan looks like for your team size. Book a free 15-min call.


Related: Engineering Maturity Assessment | Managing Engineering Teams: Real Talk | Scaling Engineering from 1 to 20