Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed deployments using ASGs and elb scripts reduce ASG capacity #70

Open
mmerkes opened this issue May 24, 2017 · 6 comments
Open

Failed deployments using ASGs and elb scripts reduce ASG capacity #70

mmerkes opened this issue May 24, 2017 · 6 comments

Comments

@mmerkes
Copy link

mmerkes commented May 24, 2017

During the deregister scripts, if the host is a part of an ASG and min size matches desired size, the ASG min size setting gets decremented so that ASG does not spin up a new instance, and when the host goes back into service, the ASG min size gets incremented. However, if the deployment fails, the script will never increase the capacity, leaving the ASG with one less min size. Since putting a host into standby reduces the desired capacity, an ASG could have its min size reduced by multiple hosts during a deployment, and it will never be increased back to where it was.

When the min size gets decremented, a flag gets set in a temporary file that's around for the life of the deployment, but will not be viewed by any subsequent deployments. One option might be to set the flag in a permanent location to track the state mutations.

Issue #57 is an example of this.

Supported Solution

If CodeDeploy customers are not aware, CodeDeploy has direct support for some ELB situations. As of 5/1/17, CodeDeploy started supporting classic ELB via the service, which handles all of the registering and deregistering from the load balancer and allows additional lifecycle events, and it solves many of the limitations in these scripts. If your use case is not currently supported (i.e. you use application ELB), check in occasionally here to see if CodeDeploy has added support.

If you onboard with CodeDeploy ELB support, you should no longer need to use these load balancer scripts.

@msilvestre
Copy link

It has happened to me also.
The instance enters on the while loop:

[stderr]Instance is currently in state: EnteringStandby
[stderr]Instance failed to reach state, Standby within 180 seconds
[stderr]Instance i-XXXXXXXXXXXXX did not make it to standby after 180 seconds
[stderr][FATAL] Failed to move instance into standby

And then calls exit 1. Leaving ASG with no service at all.

@feverLu
Copy link
Contributor

feverLu commented Jun 6, 2017

Native support of ELB in CodeDeploy is now available and very easy to use, feel free to try the new feature!

@feverLu feverLu closed this as completed Jun 6, 2017
@msilvestre
Copy link

msilvestre commented Jun 6, 2017

What do you mean by native support of ELB in CodeDeploy?
On this page http://docs.aws.amazon.com/codedeploy/latest/userguide/integrations-aws-elastic-load-balancing.html#integrations-aws-elastic-load-balancing-in-place

It says we should use these scripts!

@mmerkes
Copy link
Author

mmerkes commented Jun 6, 2017

@feverLu I opened this issue because this is still an issue for application ELB. If customers are using classic ELB, they can use the native support in CodeDeploy, but it work work with application ELB.

@mmerkes mmerkes reopened this Jun 6, 2017
@mmerkes
Copy link
Author

mmerkes commented Jun 6, 2017

@msilvestre If you're not using classic ELB, you'll need to use the scripts. However, the issue you're seeing is related to another issue that was fixed here.

Essentially, AutoScaling made a change where if you're using application ELB with an ASG, it started honoring connection draining timeouts, which by default is 5 minutes. The current timeout you're waiting for an ASG to go into standby is only 3 minutes, causing you and other customers to see failed deployments.

To solve that issue, you can either reduce the connection draining timeout in your ALB or you can increase the timeout in your script like it's done here.

As for this particular issue, I reopened it because it is still unresolved.

@eddca
Copy link

eddca commented Jul 21, 2017

Guys, ELB scripts were perfect and a bunch of things got screwed up with ELB v2 scripts.

It's named as V2 but it's missing old features. Can we at least match with whatever we had before?

I really depend on these scripts and I don't have the expertise to write them on my own.
I really want to switch to ALB but I can't.... due to these issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants