Failed deployments using ASGs and elb scripts reduce ASG capacity #70

mmerkes · 2017-05-24T17:04:15Z

During the deregister scripts, if the host is a part of an ASG and min size matches desired size, the ASG min size setting gets decremented so that ASG does not spin up a new instance, and when the host goes back into service, the ASG min size gets incremented. However, if the deployment fails, the script will never increase the capacity, leaving the ASG with one less min size. Since putting a host into standby reduces the desired capacity, an ASG could have its min size reduced by multiple hosts during a deployment, and it will never be increased back to where it was.

When the min size gets decremented, a flag gets set in a temporary file that's around for the life of the deployment, but will not be viewed by any subsequent deployments. One option might be to set the flag in a permanent location to track the state mutations.

Issue #57 is an example of this.

Supported Solution

If CodeDeploy customers are not aware, CodeDeploy has direct support for some ELB situations. As of 5/1/17, CodeDeploy started supporting classic ELB via the service, which handles all of the registering and deregistering from the load balancer and allows additional lifecycle events, and it solves many of the limitations in these scripts. If your use case is not currently supported (i.e. you use application ELB), check in occasionally here to see if CodeDeploy has added support.

If you onboard with CodeDeploy ELB support, you should no longer need to use these load balancer scripts.

msilvestre · 2017-06-02T17:05:25Z

It has happened to me also.
The instance enters on the while loop:

[stderr]Instance is currently in state: EnteringStandby
[stderr]Instance failed to reach state, Standby within 180 seconds
[stderr]Instance i-XXXXXXXXXXXXX did not make it to standby after 180 seconds
[stderr][FATAL] Failed to move instance into standby

And then calls exit 1. Leaving ASG with no service at all.

feverLu · 2017-06-06T00:11:12Z

Native support of ELB in CodeDeploy is now available and very easy to use, feel free to try the new feature!

msilvestre · 2017-06-06T12:17:43Z

What do you mean by native support of ELB in CodeDeploy?
On this page http://docs.aws.amazon.com/codedeploy/latest/userguide/integrations-aws-elastic-load-balancing.html#integrations-aws-elastic-load-balancing-in-place

It says we should use these scripts!

mmerkes · 2017-06-06T15:32:38Z

@feverLu I opened this issue because this is still an issue for application ELB. If customers are using classic ELB, they can use the native support in CodeDeploy, but it work work with application ELB.

mmerkes · 2017-06-06T15:38:20Z

@msilvestre If you're not using classic ELB, you'll need to use the scripts. However, the issue you're seeing is related to another issue that was fixed here.

Essentially, AutoScaling made a change where if you're using application ELB with an ASG, it started honoring connection draining timeouts, which by default is 5 minutes. The current timeout you're waiting for an ASG to go into standby is only 3 minutes, causing you and other customers to see failed deployments.

To solve that issue, you can either reduce the connection draining timeout in your ALB or you can increase the timeout in your script like it's done here.

As for this particular issue, I reopened it because it is still unresolved.

eddca · 2017-07-21T17:41:05Z

Guys, ELB scripts were perfect and a bunch of things got screwed up with ELB v2 scripts.

It's named as V2 but it's missing old features. Can we at least match with whatever we had before?

I really depend on these scripts and I don't have the expertise to write them on my own.
I really want to switch to ALB but I can't.... due to these issues

mmerkes added bug enhancement labels May 24, 2017

mmerkes mentioned this issue May 24, 2017

Why does this change ASG desired and min? (question) #57

Closed

feverLu closed this as completed Jun 6, 2017

mmerkes reopened this Jun 6, 2017

kgorskowski mentioned this issue Jul 25, 2017

ELBv2 examples don't include suspending or resuming ASG processes #53

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed deployments using ASGs and elb scripts reduce ASG capacity #70

Failed deployments using ASGs and elb scripts reduce ASG capacity #70

mmerkes commented May 24, 2017

msilvestre commented Jun 2, 2017

feverLu commented Jun 6, 2017

msilvestre commented Jun 6, 2017 •

edited

Loading

mmerkes commented Jun 6, 2017

mmerkes commented Jun 6, 2017

eddca commented Jul 21, 2017

Failed deployments using ASGs and elb scripts reduce ASG capacity #70

Failed deployments using ASGs and elb scripts reduce ASG capacity #70

Comments

mmerkes commented May 24, 2017

Supported Solution

msilvestre commented Jun 2, 2017

feverLu commented Jun 6, 2017

msilvestre commented Jun 6, 2017 • edited Loading

mmerkes commented Jun 6, 2017

mmerkes commented Jun 6, 2017

eddca commented Jul 21, 2017

msilvestre commented Jun 6, 2017 •

edited

Loading