Your load test just came back and the news is bad. The system falls over at 10x expected launch traffic. Engineering says they need six months to fix the architecture. Marketing has already started the campaign. Launch is eight weeks away.

This is not the time to panic. It is the time to get very clear about your options, because you have more of them than it feels like right now.

First: Understand What Actually Failed

“The system doesn’t scale” is not a diagnosis. Before you make any decisions about launch timing or engineering priorities, you need to know exactly where the bottleneck is.

Scaling failures almost always concentrate in one or two places. A database that serializes writes under load. A service that makes synchronous calls to a dependency that can’t keep up. A caching layer that was designed for 1x traffic and provides no relief at 10x. An autoscaling group that takes four minutes to scale and your traffic spike lasts three.

Have your engineering team run the load test again, this time with observability turned all the way up. Where does latency first spike? What’s the CPU and memory doing on each service? What’s the database wait time at the moment things start degrading? What errors are you seeing and where are they originating?

Once you have that, you’ll often find that “the system doesn’t scale” means “one service doesn’t scale, and it’s taking everything else down with it.” That’s a different problem — and a much faster one to fix.

The Options You Actually Have

Option 1: Launch with guardrails. If the bottleneck is concentrated and you can instrument a meaningful traffic limit, you launch — but you control the traffic. Invite-only. Waitlist. Geographic rollout. Capacity caps that prevent you from exceeding the load your system can handle. This isn’t a failure; it’s a responsible launch strategy that many successful products have used.

The key question here is whether you can be honest with your marketing team and investors about what “launch” means. A controlled rollout that goes well is better for your company than an uncontrolled launch that falls over on day one. The internet does not forget.

Option 2: Targeted architectural intervention. If the bottleneck is identifiable and addressable, eight weeks may be enough time to fix it — not to rewrite the architecture, but to eliminate the specific bottleneck that’s causing the failure.

I’ve seen companies add a read replica and a caching layer to a database-bottlenecked system in two weeks and go from “falls over at 5x” to “handles 50x” without touching application code. I’ve seen a queue-based architecture drop in front of a synchronous overloaded API and solve the same problem in three weeks. These aren’t architectural rewrites — they’re surgical interventions.

This option requires you to actually know what the bottleneck is. Which is why the first step matters.

Option 3: Defer launch. This is the option nobody wants to say out loud, but it’s a legitimate choice. If the bottleneck is genuinely architectural — if fixing it requires restructuring how multiple systems communicate — eight weeks is not enough time to do it safely.

A launch that fails publicly is worse than a launch that’s delayed. You can message a delay. You cannot un-ring the bell of a high-profile failure.

The Conversation Engineering Is Not Having With You

When engineering says “we need six months,” they usually mean one of two things. Either they’ve correctly identified that the full architectural fix is a six-month project, or they’re scared and have quoted the maximum safe number.

Neither of these is useful for your decision-making. What you need is: what is the minimum viable fix that gets us to a load level we can control our way through launch? That’s a different question, and it often has a different — shorter — answer.

If engineering can’t answer that question, it may mean the problem isn’t well enough understood yet. It may also mean they need someone with more experience in production scaling to come in and frame the options.

What I’d Do in the First Week

Get the actual bottleneck identified. Not estimated — measured. Run the load test with full observability and know exactly which service, which query, or which dependency is the constraint.

Separate the six-month architectural wish list from the eight-week survival list. You don’t need a perfect architecture. You need an architecture that handles your launch load with a margin of safety.

Have an honest conversation with your marketing team. They may have more flexibility than you think, especially if you can frame a controlled rollout as a strategic choice rather than a technical failure.

And bring in outside perspective if your team is stuck. This is the kind of situation where someone who has stared down scaling crises before — and knows which interventions actually work under time pressure — is worth the cost many times over.

If you’re eight weeks out and the load test just failed, a fractional CTO can help you triage what’s real versus what’s fear, and build a plan your whole company can get behind. Book 15 minutes here.


Related: The Prototype-to-Production Gap | Database Scaling Strategies | Observability and Monitoring for Growing Teams