
In this article, we will go over how to auto-scale in/out an AWS instance using RabbitMQ metrics using Terraform.
Prerequisite:
- AWS User with programmatic Access.
- You already have Terraform installed on your system.
- You must configure RabbitMQ as an event source for Lambda and send metrics to CloudWatch Metrics. To do that, you will need to follow my previous blog, Ingesting, and Monitoring Custom Metrics in CloudWatch with AWS Lambda
In a Domain Driven Design architecture, we are using RabbitMQ to publish messages. In RabbitMQ, each subscriber is a queue that binds to different events from different domains.
Problem:
Handling events in handlers can range from quick and light to CPU-intensive operations that take several minutes. AWS’s auto-scaling group enables simple configuration based on CloudWatch metrics. The problem is that we couldn't find a metric that would indicate the need for additional worker nodes to scale in/out the worker nodes.
Solution:
To scale in/out our worker machines, we created custom CloudWatch metrics and alerts based on them.
Please follow this article for this: Ingesting and monitoring custom metrices in CloudWatch with AWS Lambda.
The final step is to set up the auto-scaling group to scale in and out based on this metric using Terraform.
The launch template for this would be like:
resource "aws_launch_template" "launch_template" { name_prefix = "worker-node" image_id = var.ami_id iam_instance_profile { arn = aws_iam_instance_profile.instance_profile.arn } monitoring { enabled = true } instance_type = var.instance_type instance_initiated_shutdown_behavior = "terminate" key_name = var.key_name vpc_security_group_ids = [aws_security_group.instance.id] tag_specifications { resource_type = "instance" tags = { Name = "Worker-server" } } lifecycle { create_before_destroy = true } }
Create autoscaling group as:
resource "aws_autoscaling_group" "asg" { name_prefix = "worker-node" min_size = var.min_size max_size = var.max_size desired_capacity = var.desired_capacity launch_template { id = aws_launch_template.launch_template.id version = aws_launch_template.launch_template.latest_version } vpc_zone_identifier = var.private_subnets instance_refresh { strategy = "Rolling" preferences { min_healthy_percentage = 50 } } lifecycle { create_before_destroy = true } tag { key = "Name" value = "Worker-server" propagate_at_launch = true } force_delete = true }
Creating scale-up CloudWatch alarm:
resource "aws_autoscaling_policy" "scale_up_using_q" { name = "worker-node-scale_up_using_q" autoscaling_group_name = aws_autoscaling_group.asg.name adjustment_type = "ChangeInCapacity" scaling_adjustment = 1 cooldown = 300 } resource "aws_cloudwatch_metric_alarm" "scale_up_using_q" { alarm_name = "worker-node-scale_up_using_q" alarm_description = "Monitors RabbitMQ queue size for server ASG" alarm_actions = [aws_autoscaling_policy.scale_up_using_q.arn] comparison_operator = "GreaterThanOrEqualToThreshold" namespace = "QueueMetrics" metric_name = "TotalMessages" threshold = var.KPI/var.Avgprocessing-time evaluation_periods = "1" period = "300" statistic = "Average" }
Creating scale-down CloudWatch alarm:
#scale down using queue size resource "aws_autoscaling_policy" "scale_down_using_q" { name = "worker-node-scale_down_using_q" autoscaling_group_name = aws_autoscaling_group.asg.name adjustment_type = "ChangeInCapacity" scaling_adjustment = -1 cooldown = 300 } resource "aws_cloudwatch_metric_alarm" "scale_down_using_q" { alarm_name = "worker-node-scale_down_using_q" alarm_description = "Monitors RabbitMQ queue size for server ASG" alarm_actions = [aws_autoscaling_policy.scale_down_using_q.arn] comparison_operator = "LessThanThreshold" namespace = "QueueMetrics" metric_name = "TotalMessages" threshold = var.kpi/var.avgProcessing-time evaluation_periods = "1" period = "300" statistic = "Average" }
You will have to give manual inputs for kpi, which is the most prolonged acceptable latency, and avgProcessing-time, if we exceed the threshold for scaling the worker node.
Read more: How to Host Static Websites on AWS S3?
Give the necessary permissions and IAM roles to run this terraform code and the commands terraform init and terraform apply. And you’ll have an instance that scales in and out based on your RabbitMQ metrics.