“We’re moving to Kubernetes” is something I hear from engineering teams the way I hear “we’re moving to microservices” — as a statement of progress rather than a response to a specific problem. And like microservices, Kubernetes is a powerful tool that solves real problems, but creates real complexity when adopted without those problems.
What Kubernetes Actually Does
At its core, Kubernetes (K8s) is a container orchestration platform. Here’s what that means in plain terms:
Containers package your application and its dependencies into a standardized unit that runs the same way everywhere — your laptop, a staging server, production. Docker is the most common container runtime, but the concept is what matters: your application runs in an isolated, portable environment.
Orchestration is what happens when you need to run containers at scale. You have 15 services, each needs multiple copies for redundancy, they need to find each other on the network, and when one crashes, something needs to restart it. Kubernetes handles all of this: it decides which machines run which containers, routes traffic between them, restarts failed containers, scales up when demand increases, and scales down when it drops.
Kubernetes also provides: service discovery (services find each other by name, not IP address), configuration management (secrets and config maps injected at runtime), rolling deployments (update containers without downtime), and health monitoring (automatic restart of unhealthy containers).
Originally built by Google based on their internal container orchestration system (Borg), Kubernetes is now the industry standard for running containerized workloads. Every major cloud provider offers a managed Kubernetes service: GKE (Google), EKS (Amazon), AKS (Azure).
When You Actually Need It
Kubernetes solves specific problems. If you have these problems, it’s worth the investment.
You’re running 10+ services that scale independently. If your application is a collection of services with different resource needs and scaling profiles — your API gateway handles 1,000 requests/second while your background job processor runs at 10 jobs/second — Kubernetes lets you scale each independently and efficiently. Running all of these on individual VMs or managed services gets unwieldy fast.
You need self-healing infrastructure. In production, containers crash, machines fail, and deployments go wrong. Kubernetes automatically restarts failed containers, replaces unresponsive nodes, and rolls back bad deployments. This automation is valuable when your system is complex enough that manual intervention can’t keep up.
You deploy across multiple environments with strict consistency requirements. If your compliance framework requires that staging and production are structurally identical, Kubernetes manifests provide that guarantee — the same YAML files define both environments, with only configuration values changing.
Your team needs to deploy multiple times per day without coordination. Kubernetes’ rolling deployment model lets different teams deploy their services independently, with automatic traffic shifting and rollback capabilities. This becomes essential when 4-5 teams are shipping daily.
When You Don’t Need It
Your application is a monolith or 2-3 services. Kubernetes’ value scales with the number of services you’re managing. For a small number of services, the overhead of learning, configuring, and maintaining a Kubernetes cluster far exceeds the operational benefit. Use a PaaS instead.
Your team is under 15 engineers. Kubernetes requires at least one engineer who understands it deeply — networking, storage, RBAC, monitoring, upgrades. In a small team, that’s a significant percentage of your engineering capacity dedicated to infrastructure rather than product.
A managed PaaS meets your needs. Heroku, Railway, Render, Fly.io, Google Cloud Run, AWS App Runner — these platforms handle deployment, scaling, and operations for you. They’re more expensive per unit of compute, but far less expensive in engineering time. If your scaling needs are straightforward (scale horizontally based on traffic), a PaaS is almost certainly the right choice until you outgrow it.
You’re adopting it because it’s on the job description. Engineers love Kubernetes because it’s interesting technology and it looks good on a résumé. That’s a fine reason to learn it in a side project. It’s not a reason to introduce it into your production infrastructure.
The Real Cost
The cloud provider bills for managed Kubernetes are only part of the cost. The bigger expenses:
Learning curve. Kubernetes has a notoriously steep learning curve. Networking (services, ingresses, network policies), storage (persistent volumes, storage classes), security (RBAC, pod security policies), and observability (Prometheus, Grafana, or cloud-native alternatives) are all essential knowledge. Budget 2-3 months for your team to become productive.
Operational overhead. Cluster upgrades, node pool management, resource quota tuning, and debugging pod scheduling issues are ongoing tasks. Even with a managed service like GKE or EKS, you own the cluster configuration and everything running inside it.
YAML engineering. Kubernetes configuration is verbose. A simple deployment might require 50-100 lines of YAML across deployment, service, ingress, and config map files. Multiply that across 15 services and you have thousands of lines of infrastructure configuration to maintain. Tools like Helm and Kustomize help, but add their own complexity.
The Middle Path
If you’re not ready for full Kubernetes but need more than a basic PaaS, consider managed container services: Google Cloud Run, AWS Fargate, or Azure Container Instances. These run containers without requiring you to manage a cluster. You get the portability benefits of containerization — consistent environments, Docker-based workflows — without the orchestration overhead. When you genuinely outgrow these services, migrating to Kubernetes is straightforward because your applications are already containerized.
Related: AWS vs. GCP vs. Azure | Cloud Cost Optimization | The Prototype-to-Production Gap
