When I do infrastructure reviews for clients, the cloud bill is almost always the first place I find money. Not because teams are being wasteful on purpose — because cloud providers are designed to make spending easy and optimization hard.
A $15M revenue SaaS company I assessed last year was spending $38K/month on AWS. After a two-week optimization effort, we brought it to $24K/month with zero performance impact. That $168K annual savings funded two additional engineering hires.
Here's where the money usually hides.
The Big Three Cost Leaks
Oversized compute. Your production instances are probably bigger than they need to be. Engineers size instances based on peak load estimates, then add a safety margin, and the instances run at that size 24/7 even though peak load happens for 4 hours per day. Auto-scaling groups, right-sizing based on actual utilization data (not estimates), and reserved instances or savings plans for baseline capacity typically reduce compute costs by 40-60%.
The easiest win I consistently find: development and staging environments running the same instance sizes as production, 24 hours a day, 7 days a week. A staging environment that matches production configuration is good practice. A staging environment that costs as much as production is waste. Schedule non-production environments to shut down evenings and weekends, and right-size them to the minimum needed for testing.
Forgotten storage. Cloud storage accumulates like clutter in a garage. EBS snapshots from instances that were terminated months ago. S3 buckets with log files that nobody will ever read. Database backups retained far longer than your retention policy requires. RDS instances for applications that were decommissioned.
One client had $4,200/month in EBS snapshots alone — snapshots of instances that hadn't existed for over a year. Another was paying $1,800/month for an RDS Multi-AZ instance that served a internal dashboard used by three people.
Data transfer. This is the sneaky one. Cloud providers charge for data moving between availability zones, between regions, and out to the internet. Applications that aren't architecturally aware of data transfer costs — for example, services that make cross-AZ calls for every request — can accumulate surprisingly large transfer bills.
I've seen data transfer costs that were 25% of the total cloud bill, simply because the application architecture put services in different availability zones without considering the communication patterns.
The Cultural Fix
A one-time cost optimization is a temporary win. Costs creep back within 6 months unless you build cost awareness into your engineering culture.
Tag everything. Every resource should be tagged with the team that owns it, the environment (prod, staging, dev), and the project or service it supports. Without tags, your cloud bill is an undifferentiated blob that nobody can take responsibility for.
Dashboard it. Each engineering team should see their cloud spend on a weekly basis, with trend data. When engineers can see that their service costs $X per month and it went up 15% after last week's deploy, they make different architecture decisions. Cost visibility changes behavior more effectively than cost policies.
Include cost in architecture reviews. When your team proposes a new service or a significant architectural change, "what will this cost to run?" should be a standard question alongside "will this scale?" and "is this secure?" Not as a gate — as a design consideration.
Quick Wins Checklist
These are the first things I check in any cloud cost review, because they almost always yield savings with minimal effort: unused Elastic IPs (charged when not attached to a running instance), idle load balancers, unattached EBS volumes, previous-generation instance types that could be swapped to current-generation (often cheaper and faster), S3 buckets without lifecycle policies, and CloudWatch log groups with no retention policy set (defaulting to infinite retention).
Running through this list takes a few hours and typically saves 10-15% of the monthly bill immediately.
Related: Engineering Metrics That Actually Matter | Tech Debt Translation: Making Your CFO Care | Build, Buy, or Partner