fbpx Skip to content

AWS Auto scale Instance-Based on RabbitMQ Custom Metrics

Home

In this article, we will go over how to auto-scale in/out an AWS instance using RabbitMQ metrics using Terraform.

Prerequisite:

In a Domain Driven Design architecture, we are using RabbitMQ to publish messages.
In RabbitMQ, each subscriber is a queue that binds to different events from different domains.

We have a RabbitMQ and workers deployed across multiple EC2 instances.
So, we wanted to implement auto-scaling, which will add machines to burn-down queues with many messages — this is critical for the business to achieve fast eventual consistency and reduce RabbitMQ load operationally.

Problem:

Handling events in handlers can range from quick and light to CPU-intensive operations that take several minutes. AWS’s auto-scaling group enables simple configuration based on CloudWatch metrics.

We couldn’t find a metric indicating the need for additional worker nodes to scale in/out the worker nodes.

Solution:

To scale in/out our worker machines, we created custom CloudWatch metrics and alerts based on them.

Please follow this article for this: Ingesting and monitoring custom metrices in CloudWatch with AWS Lambda.

The final step is to set up the auto-scaling group to scale in and out based on this metric using terraform.

The launch template for this would be like this:

resource "aws_launch_template" "launch_template" {
   name_prefix = "worker-node"
   image_id = var.ami_id
  iam_instance_profile {
    arn = aws_iam_instance_profile.instance_profile.arn
  }
  monitoring {
    enabled = true
  }
  instance_type                        = var.instance_type
  instance_initiated_shutdown_behavior = "terminate"
  key_name                             = var.key_name
  vpc_security_group_ids               = [aws_security_group.instance.id]
  tag_specifications {
    resource_type = "instance"
    tags = {
      Name = "Worker-server"
    }
  }
  lifecycle {
    create_before_destroy = true
  }
}

Create autoscaling group as:

resource "aws_autoscaling_group" "asg" {
  name_prefix      = "worker-node"
  min_size         = var.min_size
  max_size         = var.max_size
  desired_capacity = var.desired_capacity
  launch_template {
    id      = aws_launch_template.launch_template.id
    version = aws_launch_template.launch_template.latest_version
  }
  vpc_zone_identifier = var.private_subnets
  instance_refresh {
    strategy = "Rolling"
    preferences {
      min_healthy_percentage = 50
    }
  }
 lifecycle {
    create_before_destroy = true
  }
  tag {
    key                 = "Name"
    value               = "Worker-server"
    propagate_at_launch = true
  }
  force_delete = true
}

Creating scale-up CloudWatch alarm:

resource "aws_autoscaling_policy" "scale_up_using_q" {
  name                   = "worker-node-scale_up_using_q"
  autoscaling_group_name = aws_autoscaling_group.asg.name
  adjustment_type        = "ChangeInCapacity"
  scaling_adjustment     = 1
  cooldown               = 300
}
resource "aws_cloudwatch_metric_alarm" "scale_up_using_q" {
  alarm_name          = "worker-node-scale_up_using_q"
  alarm_description   = "Monitors RabbitMQ queue size for server ASG"
  alarm_actions       = [aws_autoscaling_policy.scale_up_using_q.arn]
  comparison_operator = "GreaterThanOrEqualToThreshold"
  namespace           = "QueueMetrics"
  metric_name         = "TotalMessages"
  threshold           = var.KPI/var.Avgprocessing-time
  evaluation_periods  = "1"
  period              = "300"
  statistic           = "Average"
}

Creating scale-down CloudWatch alarm:

#scale down using queue size
resource "aws_autoscaling_policy" "scale_down_using_q" {
  name                   = "worker-node-scale_down_using_q"
  autoscaling_group_name = aws_autoscaling_group.asg.name
  adjustment_type        = "ChangeInCapacity"
  scaling_adjustment     = -1
  cooldown               = 300
}
resource "aws_cloudwatch_metric_alarm" "scale_down_using_q" {
  alarm_name          = "worker-node-scale_down_using_q"
  alarm_description   = "Monitors RabbitMQ queue size for server ASG"
  alarm_actions       = [aws_autoscaling_policy.scale_down_using_q.arn]
  comparison_operator = "LessThanThreshold"
  namespace           = "QueueMetrics"
  metric_name         = "TotalMessages"
  threshold           = var.kpi/var.avgProcessing-time
  evaluation_periods  = "1"
  period              = "300"
  statistic           = "Average"
}

You will have to give manual inputs for kpi, which is the most prolonged acceptable latency and avgProcessing-time, if we exceed the threshold for scaling the worker node.

Read more about: How to Host Static Websites on AWS S3?

Give the necessary permissions and IAM roles to run this terraform code and the commands terraform init and terraform apply. And you’ll have an instance that scales in and out based on your RabbitMQ metrics.

Share This Article On:

Other Related Topics:


Cost-effective Use cases & Benefits of Amazon S3


Cost-effective Use cases & Benefits of Amazon S3

Cost-effective Use cases & Benefits of Amazon S3 Home Introduction Amazon Simple Storage Service (S3) is a scalable, secure, and long-lasting cloud storage solution provided by Amazon Web Services (AWS). Due to its extensive features, it has become one of the most popular cloud storage solutions today.  Market giants like Sysco and Nielson have also […]





How To Insert Data Into a DynamoDB Table with Boto3


How To Insert Data Into a DynamoDB Table with Boto3

How To Insert Data Into a DynamoDB Table with Boto3 Home DynamoDB is used for many use cases, including web and mobile applications, gaming, ad tech, IoT, and more. It is particularly well-suited for applications that require low latency, high scalability, and flexible data modeling. It is designed to be highly scalable and performant and […]





How to Install and Upgrade the AWS CDK CLI


How to Install and Upgrade the AWS CDK CLI

How to Install and Upgrade the AWS CDK CLI Home  AWS CDK CLI, a command-line interface tool, is an essential part of the AWS CDK that helps developers and operations engineers interact with the AWS CDK framework. This article will provide a step-by-step guide on installing and upgrading your machine’s AWS CDK CLI. We will […]





Microsoft Azure vs AWS vs Google Cloud – Comparison


Microsoft Azure vs AWS vs Google Cloud – Comparison

Microsoft Azure vs AWS vs Google Cloud Home The three cloud giants, Microsoft Azure Vs. AWS Vs. Google Cloud, have been in a fierce race to become a market Leader for a long time, and the battle is still going tough daily with their incredible services and unique features. Moreover, each has its own set […]





How to Host Static Websites on AWS S3?


How to Host Static Websites on AWS S3?

Hosting a Static Website on AWS S3 has a lot of benefits. Additionally, you must complete some steps correctly while hosting your website on S3.