AWS Cost Optimization: What Actually Moves the Needle

AWS bills grow faster than expected. Experiments become permanent, dev environments live on through staff turnover, and the monthly statement creeps up while everyone stays focused on shipping. Most “AWS cost optimization” guides treat every line item as equally worth your attention. Most of them aren’t. Here are the three optimizations we reach for first on every engagement, and two we don’t bother with.

#1. Right-Size First — Everything Else Is a Distraction

The single biggest gap on most bills is instances that are larger than they need to be. AWS offers dozens of types and sizes; the friction of picking the right one means teams default to over-provisioning and never come back to it.

Pull 30 days of CloudWatch CPU, memory, and network utilization. Anything sitting below 40% sustained is a candidate to drop a size. AWS Compute Optimizer surfaces the same recommendations and is usually right when it suggests “downsize” — less so when it suggests “burstable”, which we second-guess on production workloads.

Right-sizing is not a one-time project. Workloads grow, shrink, and change shape; the right size today is the wrong size in six months. We schedule it quarterly as a calendar item.

#2. Savings Plans, Capped at the Baseline

If you have any steady-state workload — and almost everyone does — Savings Plans and Reserved Instances buy you up to 72% off list compared to on-demand for one- or three-year commitments. This is real money.

The trap is buying coverage above your baseline. We cap committed coverage at roughly 80% of the workload’s true steady-state, leaving 20% headroom for on-demand and Spot. The math is asymmetric: a Savings Plan that goes unused for half its term wipes out the savings it would have produced; an uncovered hour of on-demand costs maybe 20% more than a covered one. Optimize for downside protection, not theoretical maximum coverage.

We start conservatively — Compute Savings Plans, one-year term, ~60% coverage of the obvious baseline — and expand only after a few months of stable usage. Three-year terms and EC2 Instance Savings Plans (which lock you to a family) come later, if at all.

#3. Storage Tiers, Starting With gp2 → gp3

The gp2-to-gp3 EBS migration is the easiest cost win on the board. gp3 is faster than gp2 at the same size, has independent IOPS and throughput knobs (so you stop overpaying for capacity just to get IOPS), and the conversion is live — no detach, no downtime, no snapshot. aws ec2 modify-volume --volume-id <id> --volume-type gp3 and the volume re-tiers in place while it’s still mounted.

We’ve yet to find a workload that doesn’t benefit. Run the migration on a Friday afternoon if you’re nervous; you’ll see savings on the next bill.

S3 lifecycle policies are the next layer. The defaults are reasonable for most accounts:

S3 Standard for the active tier.
S3 Standard-IA for anything untouched after 30 days.
S3 Glacier Instant Retrieval for anything untouched after 90 days when you still need millisecond access.
S3 Glacier Deep Archive for compliance retention.

We don’t reach for Intelligent-Tiering except on buckets with truly unpredictable access patterns. The monitoring fee adds up on small-object workloads.

#What We Don’t Bother With

Spot Instances for anything stateful. The savings are real and the operational headache is also real. We use Spot for batch processing, build farms, and stateless web workers we can drain. We do not use Spot for databases, queue workers with at-most-once semantics, or anything whose interruption blocks the rest of the platform.
Aggressive Savings Plan coverage past ~80% baseline. See above. The marginal savings on going from 80% to 100% coverage are smaller than the expected loss from a single workload retirement during the term.
Auto-scaling as a headline cost strategy. Auto-scaling is a reliability tool that happens to save money on edge cases. Right-sizing the floor and capping the ceiling does most of the work. We turn it on by default and stop talking about it.
Resource cleanup as transformation. Untagged EBS volumes and orphaned snapshots add up to a few percent of the bill at most. Hygiene matters — we run weekly cleanups — but no one ever cut 20% off the bill by deleting unattached volumes.

#What’s Left Is Cadence

Set up Cost Explorer dashboards, configure a billing alert at 110% of expected spend, and put right-sizing and Savings Plan coverage reviews on a quarterly calendar. The compounding wins come from never letting the bill get away from you, not from one big project.