Setting up autoscaling for Cloud 66 and AWS


Cloud 66 is an infrastructure orchestration tool that abstracts away the common features of Cloud Service Providers such as Amazon Web Services (AWS), Google Cloud Platform (GCP) etc. and provides a few-clicks solution for provisioning servers for web applications. It’s an alternative to Heroku, which is one of the most well-known SaaS applications.

The issue

One of our Clients whose project was deployed on Cloud66 seemed like it might benefit from having servers autoscaled, so we went ahead with researching about it.

Unfortunately, Cloud66 does not natively support the autoscaling feature just like Heroku does. The Cloud Service Provider being used in this case was AWS.

A solution

We found that Cloud66 has APIs that are authenticated with OAuth2, so we decided to utilize that for the job.

Two of the endpoints that seemed well suited for the task at hand were:

  1. Scale-up Server group and
  2. Scale-down Server group

The application’s infrastructure included one load balancer with two servers added manually. One approach to add or remove servers dynamically would be to monitor the load balancer’s load and setting up an Alarm on AWS CloudWatch. Once the Alarm’s state changes, we could run scripts on AWS Lambda to add a server (scale up) or remove a server (scale down) depending on the Alarm’s state change.

We used the TargetResponseTime metric under the AWS/ApplicationELB namespace to gauge the load balancer’s response time and used a Threshold of TargetResponseTime > 1 for 4 datapoints within 5 minutes, using the p99 Statistic for the Alarm. That way, we know that for 99 percent of the requests, response times are at or below a certain number.

After the Alarm is created using the Metric, we move on to creating a few Lambda functions, which will either add or remove servers from the Cloud66 Application Stack’s Web Server Group.

We used the net/http and uri libraries for making the requests to the Cloud66 APIs, using the OAuth2 Access Tokens we created from within Cloud66’s Access Tokens dashboard.

Lambda function to Scale Up

require 'json'
require 'net/https'
require 'uri'

TOKEN = '<oauth token>'
STACK_ID = '<stack id>'
SERVER_GROUP_ID = '<server group id>'
END_POINT = "https://app.cloud66.com/api/3/stacks/#{STACK_ID}/server_groups/#{SERVER_GROUP_ID}"
PARAMS = {
  subtype: 'web',
  server_size: 't3.small'
}

def make_scale_up_request
  uri = URI.parse(END_POINT)
  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = true
  http.verify_mode = OpenSSL::SSL::VERIFY_PEER

  request = Net::HTTP::Post.new(uri.request_uri)
  request.body = PARAMS.to_json

  # Tweak headers, removing this will default to application/x-www-form-urlencoded
  request['Content-Type'] = 'application/json'
  request['Authorization'] = "Bearer #{TOKEN}"

  response = http.request(request)
end

def lambda_handler(event:, context:)
  response = make_scale_up_request
  p JSON.parse(response.body)

  {
    statusCode: response.code,
    body: JSON.parse(response.body)
  }
end

Lambda function to Scale Down

require 'json'
require 'net/https'
require 'uri'

TOKEN = '<oauth token>'
STACK_ID = '<stack id>'
SERVER_GROUP_ID = '<server group id>'
END_POINT = "https://app.cloud66.com/api/3/stacks/#{STACK_ID}/server_groups/#{SERVER_GROUP_ID}/scale_down"
PARAMS = {
  subtype: 'web',
  server_count: 3
}

def make_scale_down_request
  uri = URI.parse(END_POINT)
  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = true
  http.verify_mode = OpenSSL::SSL::VERIFY_PEER

  request = Net::HTTP::Post.new(uri.request_uri)
  request.body = PARAMS.to_json

  # Tweak headers, removing this will default to application/x-www-form-urlencoded
  request['Content-Type'] = 'application/json'
  request['Authorization'] = "Bearer #{TOKEN}"

  response = http.request(request)
end

def lambda_handler(event:, context:)
  response = make_scale_down_request
  p JSON.parse(response.body)

  {
    statusCode: response.code,
    body: JSON.parse(response.body)
  }
end

A few things to note:

  • Cloud66’s Access Tokens UI isn’t linked on their Dashboard, so you’ll have to use the link provided in their API Docs
  • You will need to get the Stack ID and the Server ID using their Stack List and Server Group List end-points and identify which particular Stack and Server Group you need to scale-up/scale-down. Those could be run via a separate script locally to simplify things.

To put everything together, we now need to use AWS EventBridge to create two separate Rules which listen to the AWS CloudWatch Alarm State Change event and runs the appropriate AWS Lambda function.

Rule Event pattern for Scale Up:

{
  "source": ["aws.cloudwatch"],
  "detail-type": ["CloudWatch Alarm State Change"],
  "resources": ["<AWS CloudWatch Alarm ARN>"],
  "detail": {
    "previousState": {
      "value": ["OK"]
    },
    "state": {
      "value": ["ALARM"]
    }
  }
}

Rule Event pattern for Scale Down:

{
  "source": ["aws.cloudwatch"],
  "detail-type": ["CloudWatch Alarm State Change"],
  "resources": ["<AWS CloudWatch Alarm ARN>"],
  "detail": {
    "previousState": {
      "value": ["ALARM"]
    },
    "state": {
      "value": ["OK"]
    }
  }
}

Finally, we add the appropriate AWS Lambda functions as the Rule’s Targets and we should be setup for scaling our servers dynamically.

Caveats

Ideally, we’d be using AWS EC2’s Autoscaling groups feature to tackle this issue, but we were relying on configurations managed by Cloud66 on the servers. We also wanted the servers to show up on Cloud66 too.

But this solution also works. Your mileage may wary.