-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS: Allow using a list of subnets #127
Conversation
296c80e
to
67c77ff
Compare
When using more specialized instance types, it may happen that they aren't available in the given AZ By giving multiple subnets, it allows the autoscaler to retry in a different AZ when that happens I saw a TODO about using a small struct to track attempts. By adding that, I could extend the current sizeAlt behavior The autoscaler will now try the regular size in all subnets, then the alternate size in all subnets This is what the output looks like when combined with a size alt (2 invalid subnets + 1 invalid machine type): ``` {"attempt":1,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"a","time":"2023-01-30T16:50:54Z"} {"attempt":1,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 24a9131c-b46a-4afe-87e0-adea1b509232","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"a","time":"2023-01-30T16:50:54Z"} {"attempt":2,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"b","time":"2023-01-30T16:50:54Z"} {"attempt":2,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 34d7a00a-dd9f-4689-93fe-3fc5238404df","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"b","time":"2023-01-30T16:50:55Z"} {"attempt":3,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"<redacted>","time":"2023-01-30T16:50:55Z"} {"attempt":3,"error":"InvalidParameterValue: Invalid value 'blabla' for InstanceType.\n\tstatus code: 400, request id: 2a3d3e92-6fc3-4343-adff-b62f9d31b390","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"blabla","subnet":"<redacted>","time":"2023-01-30T16:50:55Z"} {"attempt":4,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"a","time":"2023-01-30T16:50:55Z"} {"attempt":4,"error":"InvalidSubnetID.NotFound: The subnet ID 'a' does not exist\n\tstatus code: 400, request id: 1e4e7e07-56c6-489e-b01a-a0ba1fcca1c5","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"a","time":"2023-01-30T16:50:56Z"} {"attempt":5,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"b","time":"2023-01-30T16:50:56Z"} {"attempt":5,"error":"InvalidSubnetID.NotFound: The subnet ID 'b' does not exist\n\tstatus code: 400, request id: 7974f101-a5cd-4ed6-831d-7731dda2f172","image":"ami-00149760ce42c967b","level":"error","msg":"instance create failed","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"b","time":"2023-01-30T16:50:56Z"} {"attempt":6,"image":"ami-00149760ce42c967b","level":"debug","msg":"instance create","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:56Z"} {"attempt":6,"image":"ami-00149760ce42c967b","level":"info","msg":"instance create success","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:57Z"} {"attempt":6,"image":"ami-00149760ce42c967b","level":"debug","msg":"check instance network","name":"drone-linux-aws-amd64-ZQumcEFd","region":"us-east-2","size":"m6i.large","subnet":"<redacted>","time":"2023-01-30T16:50:57Z"} ```
67c77ff
to
cef66b1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Programmatically this change looks good, and will help mitigate the issue. It would require the end user to understand what is happening to be able to correctly configure this. This does limit the number of users that would need this. Which leads on to my main point.
This change as is, would be a backwards breaking change, as any person using a string for subnet will be break, as the variable has changed.
My suggestion would be to add an another field called subnets, which takes an array of strings. Then check to see which field is being used, IF both are used, throw an error.
That way we get the new feature, and the old functionality.
Hope this helps.
This maintains API retrocompatibility
Did you mean at the envconfig level or lower down the line? I created a new commit that leaves the old SUBNET_ID envvar identical but adds a new one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great, took a little time to read through this. So this keeps the original way to configure subnet, and provides alternatives. This is much better.
We've been running into capacity problems recently and this would be a great improvement, any chance this will get versioned and released soon? |
When using more specialized instance types, it may happen that they aren't available in the given AZ
By giving multiple subnets, it allows the autoscaler to retry in a different AZ when that happens
I saw a TODO about using a small struct to track attempts. By adding that, I could extend the current sizeAlt behavior
The autoscaler will now try the regular size in all subnets, then the alternate size in all subnets
This is what the output looks like when combined with a size alt (2 invalid subnets + 1 invalid machine type):