Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS: Allow using a list of subnets #127

Merged
merged 2 commits into from
Apr 27, 2023

Conversation

julienduchesne
Copy link
Contributor

When using more specialized instance types, it may happen that they aren't available in the given AZ
By giving multiple subnets, it allows the autoscaler to retry in a different AZ when that happens

I saw a TODO about using a small struct to track attempts. By adding that, I could extend the current sizeAlt behavior
The autoscaler will now try the regular size in all subnets, then the alternate size in all subnets

This is what the output looks like when combined with a size alt (2 invalid subnets + 1 invalid machine type):

{"attempt":1,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"a","time":"2023-01-30T16:50:54Z"}
{"attempt":1,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 24a9131c-b46a-4afe-87e0-adea1b509232","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"a","time":"2023-01-30T16:50:54Z"}
{"attempt":2,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"b","time":"2023-01-30T16:50:54Z"}
{"attempt":2,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 34d7a00a-dd9f-4689-93fe-3fc5238404df","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"b","time":"2023-01-30T16:50:55Z"}
{"attempt":3,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"<redacted>","time":"2023-01-30T16:50:55Z"}
{"attempt":3,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 2a3d3e92-6fc3-4343-adff-b62f9d31b390","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"<redacted>","time":"2023-01-30T16:50:55Z"}
{"attempt":4,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"a","time":"2023-01-30T16:50:55Z"}
{"attempt":4,"error":"InvalidSubnetID.NotFound: The subnet ID 'a' does not exist\n\tstatus code: 400, request id: 1e4e7e07-56c6-489e-b01a-a0ba1fcca1c5","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"a","time":"2023-01-30T16:50:56Z"}
{"attempt":5,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"b","time":"2023-01-30T16:50:56Z"}
{"attempt":5,"error":"InvalidSubnetID.NotFound: The subnet ID 'b' does not exist\n\tstatus code: 400, request id: 7974f101-a5cd-4ed6-831d-7731dda2f172","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"b","time":"2023-01-30T16:50:56Z"}
{"attempt":6,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:56Z"}
{"attempt":6,"image":"ami-00149760ce42c967b","level":"info","msg":"instance create success","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:57Z"}
{"attempt":6,"image":"ami-00149760ce42c967b","level":"debug","msg":"check instance network","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:57Z"}

@julienduchesne julienduchesne marked this pull request as ready for review January 30, 2023 17:03
When using more specialized instance types, it may happen that they aren't available in the given AZ
By giving multiple subnets, it allows the autoscaler to retry in a different AZ when that happens

I saw a TODO about using a small struct to track attempts. By adding that, I could extend the current sizeAlt behavior
The autoscaler will now try the regular size in all subnets, then the alternate size in all subnets

This is what the output looks like when combined with a size alt (2 invalid subnets + 1 invalid machine type):

```
{"attempt":1,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"a","time":"2023-01-30T16:50:54Z"}
{"attempt":1,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 24a9131c-b46a-4afe-87e0-adea1b509232","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"a","time":"2023-01-30T16:50:54Z"}
{"attempt":2,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"b","time":"2023-01-30T16:50:54Z"}
{"attempt":2,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 34d7a00a-dd9f-4689-93fe-3fc5238404df","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"b","time":"2023-01-30T16:50:55Z"}
{"attempt":3,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"<redacted>","time":"2023-01-30T16:50:55Z"}
{"attempt":3,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 2a3d3e92-6fc3-4343-adff-b62f9d31b390","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"<redacted>","time":"2023-01-30T16:50:55Z"}
{"attempt":4,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"a","time":"2023-01-30T16:50:55Z"}
{"attempt":4,"error":"InvalidSubnetID.NotFound: The subnet ID 'a' does not exist\n\tstatus code: 400, request id: 1e4e7e07-56c6-489e-b01a-a0ba1fcca1c5","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"a","time":"2023-01-30T16:50:56Z"}
{"attempt":5,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"b","time":"2023-01-30T16:50:56Z"}
{"attempt":5,"error":"InvalidSubnetID.NotFound: The subnet ID 'b' does not exist\n\tstatus code: 400, request id: 7974f101-a5cd-4ed6-831d-7731dda2f172","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"b","time":"2023-01-30T16:50:56Z"}
{"attempt":6,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:56Z"}
{"attempt":6,"image":"ami-00149760ce42c967b","level":"info","msg":"instance create success","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:57Z"}
{"attempt":6,"image":"ami-00149760ce42c967b","level":"debug","msg":"check instance network","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:57Z"}
```
Copy link
Contributor

@tphoney tphoney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Programmatically this change looks good, and will help mitigate the issue. It would require the end user to understand what is happening to be able to correctly configure this. This does limit the number of users that would need this. Which leads on to my main point.
This change as is, would be a backwards breaking change, as any person using a string for subnet will be break, as the variable has changed.
My suggestion would be to add an another field called subnets, which takes an array of strings. Then check to see which field is being used, IF both are used, throw an error.
That way we get the new feature, and the old functionality.
Hope this helps.

This maintains API retrocompatibility
@julienduchesne
Copy link
Contributor Author

Programmatically this change looks good, and will help mitigate the issue. It would require the end user to understand what is happening to be able to correctly configure this. This does limit the number of users that would need this. Which leads on to my main point. This change as is, would be a backwards breaking change, as any person using a string for subnet will be break, as the variable has changed. My suggestion would be to add an another field called subnets, which takes an array of strings. Then check to see which field is being used, IF both are used, throw an error. That way we get the new feature, and the old functionality. Hope this helps.

Did you mean at the envconfig level or lower down the line?

I created a new commit that leaves the old SUBNET_ID envvar identical but adds a new one SUBNET_IDS_ALT. The behavior is unchanged in the driver itself though. Is that OK?

Copy link
Contributor

@tphoney tphoney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is great, took a little time to read through this. So this keeps the original way to configure subnet, and provides alternatives. This is much better.

@tphoney tphoney merged commit bd3623b into drone:master Apr 27, 2023
@cblakkan
Copy link

We've been running into capacity problems recently and this would be a great improvement, any chance this will get versioned and released soon?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants