Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ECS] [request/bug?]: Add scale in protection on hosts running a task via RunTask #1207

Open
eriko-de opened this issue Dec 28, 2020 · 4 comments
Assignees
Labels
Proposed Community submitted issue

Comments

@eriko-de
Copy link

eriko-de commented Dec 28, 2020

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request
When using a ECS cluster with EC2 auto scaling group as capacity provider.
When starting tasks via RunTask actions on ECS the EC2 instance, where the task gets placed on, should be protected from scale in. Those are markes as protected from scale in, but still gets stopped when auto scaling group is scaling down/in.

Which service(s) is this request for?
ECS with EC2 ASG as capacity provider

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We are using ECS to manage our background processors (Rails app with resque workers running inside a task).
We have two kind of jobs, short-running interrupt able or repeatable jobs (image processing) and long running not interrupt able jobs (video processing and streaming). The short running jobs are managed by a ECS Service.

As we can't tell ECS Services, which jobs should be stopped, when scale in, we needed to implement our own 'scaling logic' for long running jobs. We use RunTask for scheduling new workers and stop tasks by them self, when scaling down.

Bug or unexpected behavior:
When we start a new task via RunTask action, we would expect the instances, where the task gets started on to be marked as protected from scale in, but it doesn't.

Are you currently working around this issue?
We manually observe the task count and start additional tasks, if any task got stopped due to the termination of the underlying EC2 host.

Additional context
We would not need to use our own scaling logic via RunTask, if scale in for services would be controllable, see:
#125

@eriko-de eriko-de added the Proposed Community submitted issue label Dec 28, 2020
@coultn
Copy link

coultn commented Jan 1, 2021

Have you enabled managed termination protection? https://docs.aws.amazon.com/AmazonECS/latest/APIReference/API_AutoScalingGroupProvider.html

@eriko-de
Copy link
Author

eriko-de commented Jan 4, 2021

Yes we have. This is our terraform template (not sure if it helps)

resource "aws_ecs_capacity_provider" "capacity_provider" {
  name = "${var.prefix}-CapacityProvider"

  auto_scaling_group_provider {
    auto_scaling_group_arn         = aws_autoscaling_group.resque_worker_auto_scaling_group.arn
    managed_termination_protection = "ENABLED"

    managed_scaling {
      maximum_scaling_step_size = 4
      minimum_scaling_step_size = 1
      status                    = "ENABLED"
      target_capacity           = 70
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

And the Webgui also shows:
Managed Instance Protection: Yes

... but still sometimes tasks gets started on instances via RunTask action, where the instance is not protected from scale in.

@eriko-de
Copy link
Author

eriko-de commented Jan 4, 2021

Maybe this is the wrong place to discuss, but the scale in protection works kind of different as I would expect it from the documentation

The instances, which gets started via launch template of the auto scaling group, registering them self to the cluster and it seems that only some of the instances or only after some time gets marked as protected from scale in.

I would have expected that the protected from scale in flag is only assigned as soon as the first task gets started on the instance and it would be removed as soon as the last task on the host gets stopped.

Currently it feels like ECS is only removing the scale in protection, when it tries to scale down the cluster.

@mdomsch-seczetta
Copy link

I'd prefer to have ECS manage the "protect from scale-in" flag on an EC2 instance. Until then, I added a call in my code's startup handler chain to set "protect from scale-in", and another in the shutdown handler chain to remove that protection, along with setting the corresponding ECS Container Instance state to DRAINING.

@AbhishekNautiyal AbhishekNautiyal self-assigned this Feb 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Proposed Community submitted issue
Projects
None yet
Development

No branches or pull requests

4 participants