Cut AWS EC2 Costs by 60% with Instance Scheduling: A Complete Guide

Non-production EC2 instances running 24/7 is one of the most common — and most fixable — sources of AWS waste. Here's the Lambda + EventBridge scheduler that fixed it, with full Terraform and code.

The problem hiding in plain sight

When I started working with one client on their AWS cost optimisation, the monthly bill was significantly higher than it should have been — and nobody had a clear picture of where the money was going. After enabling Cost Explorer with proper tag filtering, the answer was immediately obvious: non-production EC2 instances were running 24 hours a day, 7 days a week. Instances that developers used during business hours were sitting idle every night, every weekend, every public holiday — fully running, fully charged.

The fix wasn't complicated. But the savings were significant — over 60% reduction in EC2 costs for non-production environments, with zero impact on any production workload or developer workflow.

Why this happens at most organisations

Non-production environments get created quickly — someone needs a dev or staging environment, it gets spun up, work gets done. What rarely happens is someone sitting down to configure lifecycle management for that environment. The instance runs, the work continues, and nobody notices the idle time accumulating on the bill.

At scale, this adds up fast. A t3.large running 24/7 costs roughly $60/month. The same instance running 10 hours a day on weekdays costs about $18/month. Across 10 non-production instances, that's $420/month in waste — $5,040/year — for compute that nobody is using.

The solution: Lambda + EventBridge scheduler

The architecture is straightforward: EventBridge triggers a Lambda function on a schedule. Lambda checks for instances tagged with AutoStop: true and stops or starts them based on the time. No third-party tools, no agents on the instances, no changes to how developers use them.

# Tag your non-production instances
aws ec2 create-tags   --resources i-xxxxxxxxxxxxxxxxx   --tags Key=AutoStop,Value=true Key=Environment,Value=dev

The Lambda function

import boto3
import os

def handler(event, context):
    ec2 = boto3.client('ec2', region_name=os.environ['AWS_REGION'])
    action = event.get('action', 'stop')  # 'stop' or 'start'

    # Find instances tagged AutoStop=true
    response = ec2.describe_instances(
        Filters=[
            {{'Name': 'tag:AutoStop', 'Values': ['true']}},
            {{'Name': 'instance-state-name',
              'Values': ['running'] if action == 'stop' else ['stopped']}}
        ]
    )

    instance_ids = [
        i['InstanceId']
        for r in response['Reservations']
        for i in r['Instances']
    ]

    if not instance_ids:
        return {{'message': 'No instances to act on', 'action': action}}

    if action == 'stop':
        ec2.stop_instances(InstanceIds=instance_ids)
    else:
        ec2.start_instances(InstanceIds=instance_ids)

    print(f"{{action.upper()}}ED: {{instance_ids}}")
    return {{'actioned': instance_ids, 'action': action}}

EventBridge rules — the schedule

# Terraform: stop at 8pm, start at 8am — weekdays only
resource "aws_cloudwatch_event_rule" "stop_instances" {{
  name                = "stop-dev-instances"
  schedule_expression = "cron(0 20 ? * MON-FRI *)"  # 8pm UTC weekdays
  description         = "Stop non-prod instances after business hours"
}}

resource "aws_cloudwatch_event_rule" "start_instances" {{
  name                = "start-dev-instances"
  schedule_expression = "cron(0 8 ? * MON-FRI *)"   # 8am UTC weekdays
}}

resource "aws_cloudwatch_event_target" "stop_target" {{
  rule      = aws_cloudwatch_event_rule.stop_instances.name
  target_id = "StopInstances"
  arn       = aws_lambda_function.scheduler.arn
  input     = jsonencode({{ action = "stop" }})
}}

resource "aws_cloudwatch_event_target" "start_target" {{
  rule      = aws_cloudwatch_event_rule.start_instances.name
  target_id = "StartInstances"
  arn       = aws_lambda_function.scheduler.arn
  input     = jsonencode({{ action = "start" }})
}}

Timezone note: EventBridge cron runs in UTC. If your team is in IST (UTC+5:30), 8am IST = 2:30am UTC. Adjust your cron expressions accordingly or you'll be stopping instances at 1:30pm local time.

Manual override for critical testing

The first thing developers ask when you implement this: "What if I'm doing a late deployment or need the instance outside business hours?" The answer is a simple tag override:

# Exclude a specific instance from auto-stop for 24 hours
aws ec2 create-tags   --resources i-xxxxxxxxxxxxxxxxx   --tags Key=AutoStop,Value=false

# Re-enable it when done
aws ec2 create-tags   --resources i-xxxxxxxxxxxxxxxxx   --tags Key=AutoStop,Value=true

The Lambda function only acts on instances where AutoStop=true, so flipping the tag is all it takes. No pipeline changes, no exceptions list to maintain.

What else I found while investigating

The scheduler was the main fix, but the cost investigation surfaced three other silent budget drainers that are common across AWS accounts:

Orphaned EBS volumes — volumes that persist after their EC2 instance was terminated. Charged at full price for storing nothing useful. Find them with aws ec2 describe-volumes --filters Name=status,Values=available.
Unused Elastic IPs — AWS charges for EIPs not associated with a running instance. Find them with aws ec2 describe-addresses --filters Name=domain,Values=vpc and check the AssociationId field.
Oversized RDS instances — dev databases running on db.r5.2xlarge because that's what production uses. A db.t3.medium handles development workloads at a fraction of the cost.

The required IAM policy for the Lambda

{{
  "Version": "2012-10-17",
  "Statement": [{{
    "Effect": "Allow",
    "Action": [
      "ec2:DescribeInstances",
      "ec2:StartInstances",
      "ec2:StopInstances"
    ],
    "Resource": "*",
    "Condition": {{
      "StringEquals": {{
        "ec2:ResourceTag/Environment": ["dev", "staging"]
      }}
    }}
  }}]
}}

Scope the IAM policy with a condition on the Environment tag. This ensures the Lambda can only start and stop non-production instances — it cannot accidentally touch production even if the tag logic has a bug.