AWS Auto scale Instance-Based on RabbitMQ Custom Metrics
Home
In this article, we will go over how to auto-scale in/out an AWS instance using RabbitMQ metrics using Terraform.
Prerequisite:
- AWS User with programmatic Access.
- You already have Terraform installed on your system.
- You must configure RabbitMQ as an event source for Lambda and send metrics to CloudWatch Metrics. To do that, you will need to follow my previous blog, Ingesting and monitoring custom metrics in CloudWatch with AWS Lambda
In a Domain Driven Design
architecture, we are using RabbitMQ to publish messages.
In RabbitMQ, each subscriber
is a queue that binds to different events from different domains.
We have a RabbitMQ and workers deployed across multiple EC2 instances.
So, we wanted to implement auto-scaling, which will add machines to burn-down queues with many messages — this is critical for the business to achieve fast eventual consistency and reduce RabbitMQ load operationally.
Problem:
Handling events in handlers can range from quick and light to CPU-intensive operations that take several minutes. AWS’s auto-scaling group enables simple configuration based on CloudWatch metrics.
We couldn’t find a metric indicating the need for additional worker nodes to scale in/out the worker nodes.
Solution:
To scale in/out our worker machines, we created custom CloudWatch metrics and alerts based on them.
Please follow this article for this: Ingesting and monitoring custom metrices in CloudWatch with AWS Lambda.
The final step is to set up the auto-scaling group to scale in and out based on this metric using terraform.
The launch template for this would be like this:
resource "aws_launch_template" "launch_template" {
name_prefix = "worker-node"
image_id = var.ami_id
iam_instance_profile {
arn = aws_iam_instance_profile.instance_profile.arn
}
monitoring {
enabled = true
}
instance_type = var.instance_type
instance_initiated_shutdown_behavior = "terminate"
key_name = var.key_name
vpc_security_group_ids = [aws_security_group.instance.id]
tag_specifications {
resource_type = "instance"
tags = {
Name = "Worker-server"
}
}
lifecycle {
create_before_destroy = true
}
}
Create autoscaling group as:
resource "aws_autoscaling_group" "asg" {
name_prefix = "worker-node"
min_size = var.min_size
max_size = var.max_size
desired_capacity = var.desired_capacity
launch_template {
id = aws_launch_template.launch_template.id
version = aws_launch_template.launch_template.latest_version
}
vpc_zone_identifier = var.private_subnets
instance_refresh {
strategy = "Rolling"
preferences {
min_healthy_percentage = 50
}
}
lifecycle {
create_before_destroy = true
}
tag {
key = "Name"
value = "Worker-server"
propagate_at_launch = true
}
force_delete = true
}
Creating scale-up CloudWatch alarm:
resource "aws_autoscaling_policy" "scale_up_using_q" {
name = "worker-node-scale_up_using_q"
autoscaling_group_name = aws_autoscaling_group.asg.name
adjustment_type = "ChangeInCapacity"
scaling_adjustment = 1
cooldown = 300
}
resource "aws_cloudwatch_metric_alarm" "scale_up_using_q" {
alarm_name = "worker-node-scale_up_using_q"
alarm_description = "Monitors RabbitMQ queue size for server ASG"
alarm_actions = [aws_autoscaling_policy.scale_up_using_q.arn]
comparison_operator = "GreaterThanOrEqualToThreshold"
namespace = "QueueMetrics"
metric_name = "TotalMessages"
threshold = var.KPI/var.Avgprocessing-time
evaluation_periods = "1"
period = "300"
statistic = "Average"
}
Creating scale-down CloudWatch alarm:
#scale down using queue size
resource "aws_autoscaling_policy" "scale_down_using_q" {
name = "worker-node-scale_down_using_q"
autoscaling_group_name = aws_autoscaling_group.asg.name
adjustment_type = "ChangeInCapacity"
scaling_adjustment = -1
cooldown = 300
}
resource "aws_cloudwatch_metric_alarm" "scale_down_using_q" {
alarm_name = "worker-node-scale_down_using_q"
alarm_description = "Monitors RabbitMQ queue size for server ASG"
alarm_actions = [aws_autoscaling_policy.scale_down_using_q.arn]
comparison_operator = "LessThanThreshold"
namespace = "QueueMetrics"
metric_name = "TotalMessages"
threshold = var.kpi/var.avgProcessing-time
evaluation_periods = "1"
period = "300"
statistic = "Average"
}
You will have to give manual inputs for kpi
, which is the most prolonged acceptable latency and avgProcessing-time
, if we exceed the threshold for scaling the worker node.
Read more about: How to Host Static Websites on AWS S3?
Give the necessary permissions and IAM roles to run this terraform code and the commands terraform init
and terraform apply
. And you’ll have an instance that scales in and out based on your RabbitMQ metrics
.
Share This Article On:
Other Related Topics:
Cost-effective Use cases & Benefits of Amazon S3
Cost-effective Use cases & Benefits of Amazon S3 Home Introduction Amazon Simple Storage Service (S3) is a scalable, secure, and long-lasting cloud storage solution provided by Amazon Web Services (AWS). Due to its extensive features, it has become one of the most popular cloud storage solutions today. Market giants like Sysco and Nielson have also […]
How To Insert Data Into a DynamoDB Table with Boto3
How To Insert Data Into a DynamoDB Table with Boto3 Home DynamoDB is used for many use cases, including web and mobile applications, gaming, ad tech, IoT, and more. It is particularly well-suited for applications that require low latency, high scalability, and flexible data modeling. It is designed to be highly scalable and performant and […]
How to Install and Upgrade the AWS CDK CLI
How to Install and Upgrade the AWS CDK CLI Home AWS CDK CLI, a command-line interface tool, is an essential part of the AWS CDK that helps developers and operations engineers interact with the AWS CDK framework. This article will provide a step-by-step guide on installing and upgrading your machine’s AWS CDK CLI. We will […]
Microsoft Azure vs AWS vs Google Cloud – Comparison
Microsoft Azure vs AWS vs Google Cloud Home The three cloud giants, Microsoft Azure Vs. AWS Vs. Google Cloud, have been in a fierce race to become a market Leader for a long time, and the battle is still going tough daily with their incredible services and unique features. Moreover, each has its own set […]
How to Host Static Websites on AWS S3?
Hosting a Static Website on AWS S3 has a lot of benefits. Additionally, you must complete some steps correctly while hosting your website on S3.

Saugat Tiwari holds a BSc. in Computer Science and Information Technology and has technical skills in programming languages and database management systems. Saugat has completed coursework in cloud computing and has hands-on experience in AWS, excelling as a DevOps Engineer at Adex International.