“We need to break up the monolith.” It’s one of the most common architectural proposals I hear in my fractional CTO engagements. And in about 70% of cases, my answer is: not yet.
Not because microservices are bad. They solve real problems at real scale. But because most teams propose the move based on feelings — “the codebase feels unwieldy,” “deploys feel slow,” “we feel like we should be more modern” — rather than measurable constraints that a monolith structurally cannot address.
The Wrong Reasons
The code is messy. Messy code in a monolith becomes messy code in microservices, except now it’s distributed messy code with network calls between the messes. If your problem is code quality, the fix is refactoring, code review standards, and architectural boundaries within the monolith — not a new deployment topology.
Engineers want to use different languages. This is a preference, not a requirement. Polyglot architectures sound appealing in theory but multiply your hiring complexity, tooling investment, and on-call burden. Unless there’s a genuine technical reason one component must use a different runtime — ML workloads in Python, for example — keep the stack unified.
A new CTO wants to modernize. I’ve been this CTO, and I’ve coached others through this temptation. Your job is to deliver business outcomes, not to build a résumé-worthy architecture. If the monolith is delivering features and the business is growing, the monolith is doing its job.
Conference talks made microservices sound better. Those talks are from companies with 500-2,000+ engineers and dedicated platform teams. They’re solving problems you don’t have yet.
The Right Reasons
These are the signals I look for when a client asks whether it’s time to decompose. Each one is measurable, not subjective.
Deployment contention is blocking delivery. When 3-4 teams are regularly waiting on each other to merge and deploy, and you can measure the delay — features sitting in a queue for days because another team’s changes aren’t ready — you have a genuine coordination bottleneck. First try feature flags and trunk-based development. If contention persists, targeted extraction can help.
One component has radically different scaling needs. Your API handles 100 requests per second and it’s fine. But your image processing pipeline needs to burst to handle 10,000 images when a customer uploads a batch. Running 50x the compute for the whole monolith to scale one function is wasteful. Extracting the pipeline into its own service lets you scale it independently with spot instances or serverless functions.
Fault isolation is a business requirement. If a bug in your reporting module crashes the entire application — including the core transaction processing that generates revenue — and you’ve tried process-level isolation within the monolith without success, extracting the reporting module provides genuine fault isolation.
Compliance or security boundaries. Some regulated environments require that certain data processing happens in isolated environments. If your PCI-compliant payment processing must run in a separate security boundary from your general application, extraction isn’t optional — it’s a compliance requirement.
Team size exceeds 25-30 engineers. This is roughly the threshold where the coordination cost of a shared codebase begins to outweigh its simplicity benefits. Below this number, a well-structured modular monolith almost always wins. Above it, the organizational pressure for independent deployment starts to become real.
How to Do It: The Strangler Fig Pattern
When the reasons are genuine, the method matters enormously. Big-bang rewrites fail more often than they succeed. The strangler fig pattern — named after a tree that gradually grows around and replaces its host — is the approach I recommend and have used across multiple client engagements.
Step 1: Identify the extraction candidate. Pick the module with the clearest boundary and the strongest standalone reason. Usually it’s something with distinct scaling needs or a different operational profile — a background job processor, a notification service, a data pipeline.
Step 2: Define the interface. Before writing any code, define the API contract between the service-to-be and the rest of the monolith. This contract must be stable enough that both sides can develop independently.
Step 3: Build the service alongside the monolith. The new service runs in parallel. Traffic is gradually shifted from the monolith’s internal module to the new service. Both paths work simultaneously.
Step 4: Verify and cut over. Once the new service handles 100% of traffic reliably, remove the old code from the monolith. You’ve extracted one service with zero downtime and the ability to roll back at any point.
Step 5: Evaluate before continuing. Don’t extract the next service immediately. Live with the new architecture for a few weeks. Did it actually solve the problem? Did it create new problems (data consistency, debugging complexity, deployment coordination)? Let the team absorb the operational overhead of one new service before adding another.
The Platform Tax
Every service you extract adds operational cost: its own CI/CD pipeline, its own monitoring, its own alerting, its own on-call considerations, its own scaling configuration. If you extract 10 services, you need the infrastructure to manage 10 services. At Google Cloud, we had dedicated platform teams with hundreds of engineers supporting this infrastructure. Your team of 20 does not.
Before you extract your first service, make sure you have: centralized logging, distributed tracing, a service discovery mechanism, health checks and automated restarts, and a deployment pipeline that can handle multiple services. If you don’t have these, build them first. Extracting services without observability is like driving at night without headlights — you’ll crash, and you won’t know why until it’s too late.
Related: Microservices vs. Monolith: The Pragmatic Decision Framework | What Is a Monolith Architecture | When to Replatform or Rewrite
