Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine spot instance startup #824

Merged
merged 5 commits into from
Sep 27, 2022
Merged

Refine spot instance startup #824

merged 5 commits into from
Sep 27, 2022

Conversation

ansoncfit
Copy link
Member

@ansoncfit ansoncfit commented Sep 2, 2022

When a regional analysis triggers a request to start spot instance workers, if the request would lead the total number of workers to exceed our configured limit (~6x higher on prod vs. staging), the request is totally ignored.

This PR changes that behavior and reduces the likelihood requests will be ignored:

  • Reduces the number of instances requested for higher zoom and non-transit analyses
  • Adds a backoff function (h/t Zeno's paradox) -- if we are halfway to the limit on the total number of workers, no more than half of the available remaining slots will be requested

Three additional refinements could be considered -- per-access group limits, further reducing the number of instances requested when analyses don't have frequency-based routes (we don't yet expose hasFrequency() in a convenient place though), and sizing the spot instance request based on observed tasks per minute.

@ansoncfit ansoncfit requested a review from abyrd September 20, 2022 01:35
abyrd
abyrd previously approved these changes Sep 20, 2022
Copy link
Member

@abyrd abyrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to avoid the problem we've seen where way too many workers are created for a simple walk search, flooding the backend with responses from a single job. Longer term we'll want to more actively manage the worker pool using the components that were already drafted for that purpose.

@ansoncfit ansoncfit merged commit 7765f0b into dev Sep 27, 2022
@ansoncfit ansoncfit deleted the spot-instance-startup branch September 27, 2022 01:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants