“We inherited a messy system and we don’t know what’s safe to touch.” I hear this in discovery calls at least twice a month. The story varies — an acquisition, a founder who left, an offshore agency that delivered code and disappeared — but the anxiety is always the same. There’s a running system that makes money, nobody fully understands how it works, and the team is terrified of breaking it.
The instinct is to rewrite. That instinct is almost always wrong.
The First 48 Hours: Observe, Don’t Touch
Before you change a single line of code, you need to understand what the system is actually doing. Not what the documentation says (if documentation exists). Not what the previous developer told you. What the system is actually doing right now in production.
Set up application performance monitoring if it doesn’t exist. At minimum, you need: request logs showing which endpoints are being hit and how often, error rates by endpoint, database query patterns (what’s being read, what’s being written, how often), and basic infrastructure metrics (CPU, memory, disk, network).
I’ve had clients discover entire features they didn’t know existed just from reading access logs. One company found that 40% of their database load came from a reporting feature that three people used. Another discovered an undocumented API that a partner had been using for two years.
Map the Money Paths
Not every part of the codebase is equally important. Your second step is identifying the 3-5 workflows that directly generate or protect revenue. For a SaaS product, that’s typically: user signup and onboarding, the core value-delivery feature, billing and subscription management, and whatever integration the biggest customer depends on.
These are your critical paths. Everything else can wait.
Walk each critical path manually. Use the application as a customer would. Note every step, every interaction, every place where it feels fragile or wrong. Then trace each path through the code. You don’t need to understand every function — you need to understand the flow. What calls what, what data goes where, where are the external dependencies.
Test Before You Touch
The most dangerous thing you can do with an inherited codebase is start making changes without tests. Not because the code is bad — it might be fine — but because you don’t yet understand the implicit assumptions the original developer baked in.
Write integration tests for each critical path. Not unit tests (you don’t understand the units yet), not comprehensive tests (that would take months), just enough tests to tell you whether the critical paths still work after you make a change.
I call these “guardrail tests.” They’re not testing that the code is correct — they’re testing that the code still does what it was doing before. That’s a much lower bar and a much faster win.
The Distinction That Matters: Messy vs. Fragile
Every inherited codebase is messy. Not every inherited codebase is fragile. The distinction matters enormously for prioritization.
Messy code is ugly but functional. Variable names are terrible, there’s copy-paste duplication everywhere, the architecture doesn’t match any textbook pattern. But it works. It’s been working. It will keep working. Messy code offends your engineering sensibilities. It does not threaten your business.
Fragile code breaks when you look at it sideways. It has hidden dependencies — changing a function in one module breaks something in an unrelated module because they share mutable global state. It has hard-coded assumptions about data formats, time zones, or environment configurations. It has no error handling, so any unexpected input causes cascading failures.
Fix fragile first. Clean up messy later. Or never — sometimes messy code that works for years is the best code in your system.
Incremental Improvement, Not Revolution
Once you have observability, critical path maps, and guardrail tests, you can start improving. The approach I use is what I think of as tightening the screws rather than rebuilding the engine.
Start with the fragile parts of the critical paths. Add error handling. Extract hardcoded configuration into environment variables. Replace in-memory sessions with something persistent. Add database connection pooling. Each change is small, testable, and immediately reduces risk.
Resist the temptation to modernize the stack while you’re at it. “We’re already in there, might as well upgrade to the latest framework version” is how small improvements become six-month migrations that break production.
When Rewriting Actually Makes Sense
It’s rare, but sometimes a rewrite is the right call. The signals: the technology platform is end-of-life (no security patches, no community support), the system literally cannot scale to handle current traffic let alone growth, the codebase has security vulnerabilities that are architectural (not just missing input validation), or you need to add capabilities that the current architecture fundamentally cannot support.
Even then, I recommend the strangler fig pattern over a big-bang rewrite. Build the new system alongside the old one, migrate traffic gradually, and decommission the old system piece by piece. This lets you ship value continuously while migrating, rather than going dark for six months and praying the rewrite works.
Related: The Prototype-to-Production Gap | Tech Debt Translation: Making Your CFO Care | What a Fractional CTO Actually Does
