Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS region to the AWS Config Cache key #6134

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

maxbog
Copy link

@maxbog maxbog commented Sep 4, 2024

The PR adds the AWS region to the key used by the AWS config cache

Checklist

Fixes #6128

@maxbog maxbog requested a review from a team as a code owner September 4, 2024 09:37
@maxbog maxbog force-pushed the region_in_aws_config_cache branch 4 times, most recently from e025102 to a67350c Compare September 4, 2024 13:49
Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The build is broken

Signed-off-by: Maksymilian Boguń <[email protected]>
@maxbog
Copy link
Author

maxbog commented Sep 18, 2024

@zroubalik I fixed the build

@zroubalik
Copy link
Member

zroubalik commented Sep 18, 2024

/run-e2e aws
Update: You can check the progress here

@zroubalik
Copy link
Member

zroubalik commented Sep 19, 2024

/run-e2e aws
Update: You can check the progress here

@maxbog
Copy link
Author

maxbog commented Sep 24, 2024

@zroubalik Is something wrong with the e2e tests? I see they failed twice now :(

@zroubalik
Copy link
Member

@zroubalik Is something wrong with the e2e tests? I see they failed twice now :(

yeah, issue on our side, will rerun once it is fixed

@ndlanier
Copy link

ndlanier commented Oct 4, 2024

@zroubalik any updates on the e2e testing?

@JorTurFer
Copy link
Member

JorTurFer commented Oct 6, 2024

/run-e2e aws
Update: You can check the progress here

@maxbog
Copy link
Author

maxbog commented Oct 9, 2024

@JorTurFer I rebased the PR
Now the tests failed, but it looks like some random crash, can you please rerun the failing job and the e2e tests?

@JorTurFer
Copy link
Member

JorTurFer commented Oct 16, 2024

/run-e2e aws
Update: You can check the progress here

Copy link
Member

@JorTurFer JorTurFer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL @kedacore/keda-core-contributors

@JorTurFer JorTurFer enabled auto-merge (squash) October 16, 2024 18:00
@ndlanier
Copy link

ndlanier commented Oct 28, 2024

Any insight into why it paniced/failed on the amd64 validation?

@ndlanier
Copy link

@JorTurFer is the validate - amd64 stage failure an issue with the action or an issue with the code changes in this pr?

@JorTurFer
Copy link
Member

@JorTurFer is the validate - amd64 stage failure an issue with the action or an issue with the code changes in this pr?

I've triggered it again because probably it was a transient error

@JorTurFer
Copy link
Member

@zroubalik @wozniakjan PTAL

@JorTurFer
Copy link
Member

i would suggest to add Region in aws authorization metadata, rather than passing it all around.

5ea1eac/pkg/scalers/aws/aws_authorization.go#L18C1-L35

wdyt @JorTurFer ?

That's a good idea tbh. Could you update the PR @maxbog ?

@maxbog
Copy link
Author

maxbog commented Nov 4, 2024

@JorTurFer @ThaSami
Thanks for the review.
At first glance, I also thought this might be a good idea. However, after digging into the code, I'm not that convinced anymore. Here are the reasons:

  1. Although the AuthorizationMetadata struct has Authorization in the name, what it really represents is an AWS identity: a role + credentials coming from pod identity or the AccessKey/SecretAccessKey pair. Identities are global in AWS, so IMO, region does not belong in this struct. Region is actually only used when creating AWS clients, to specify, where the queried services are, not where the identity is located.
  2. We would get rid of the awsRegion parameter in the getCacheKey function, but then, if we still want to add it to the AuthorizationMetadata, we need to add a awsRegion parameter to the GetAwsAuthorization function, which, in turn, means we have to pass it from the different types of scaler metadata (for example, here, here, or here, or 5 different places)
  3. Having the awsRegion parameter in the getCacheKey actually makes sense when you look at the AuthorizationMetadata as an identity. The aws.Config struct represents a AWS service client configuration, so IMO it makes perfect sense to cache it for a specific identity and for a specific region. We lose expressiveness by hiding the region in the awsAuthorization parameter.

Please let me know, if, after taking the above into account, you still want to update the PR and add the region to the AuthorizationMetadata.

@ThaSami
Copy link
Contributor

ThaSami commented Nov 5, 2024

Although the AuthorizationMetadata struct has Authorization in the name, what it really represents is an AWS identity: a role + credentials coming from pod identity or the AccessKey/SecretAccessKey pair.

Exactly - that's what sparked this idea for me. Since credentials are part of identity, obtaining them in the AWS SDK requires specifying a region as STS endpoints are regional.
that's why i believe including AwsRegion within the AuthorizationMetadata struct makes architectural sense because:

  1. keeps all authentication-related data in one cohesive unit
  2. Ensures the region information is always available when making credential requests
  3. maintains a clear relationship between authorization data and its associated AWS region

would like to hear from @JorTurFer

Copy link
Member

@zroubalik zroubalik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the proposal from @ThaSami

@JorTurFer
Copy link
Member

JorTurFer commented Nov 5, 2024

I like the proposal from @ThaSami

Me too :). About passing the value, I'd say that Metadata config.TriggerMetadata always has the parameter awsRegion, so we can parse it from the map directly like we do with others

@maxbog
Copy link
Author

maxbog commented Nov 6, 2024

Ok then, I'll make the change

@maxbog
Copy link
Author

maxbog commented Nov 6, 2024

One more question:
Now that the region will be embedded in the AuthorizationMetadata, I think I should get rid of awsConfigMetadata, as it's now redundant. Do you agree?

type awsConfigMetadata struct {
awsRegion string
awsAuthorization AuthorizationMetadata
}

auto-merge was automatically disabled November 6, 2024 23:44

Head branch was pushed to by a user without write access

@JorTurFer
Copy link
Member

Do you agree?

yeah, makes sense!

@maxbog
Copy link
Author

maxbog commented Nov 6, 2024

@JorTurFer @ThaSami please take another look

pkg/scalers/aws/aws_common.go Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@JorTurFer
Copy link
Member

Unit test are failing, could you take a look?

@maxbog
Copy link
Author

maxbog commented Nov 7, 2024

Unit test are failing, could you take a look?

The failing tests are for prometheus scaler, and the error message is: "HTTP transport must be Google OAuth2"
TBH, I'm not sure how this is related to the changes in the PR.

@JorTurFer
Copy link
Member

JorTurFer commented Nov 7, 2024

Prometheus scaler uses multiple auth providers, maybe the region change of the latest commit has had a side effect here:

// could be the case of azure managed prometheus. Try and get the round-tripper.
// If it's not the case of azure managed prometheus, we will get both transport and err as nil and proceed assuming no auth.
azureTransport, err := azure.TryAndGetAzureManagedPrometheusHTTPRoundTripper(logger, config.PodIdentity, config.TriggerMetadata)
if err != nil {
logger.V(1).Error(err, "error while init Azure Managed Prometheus client http transport")
return nil, err
}
// transport should not be nil if its a case of azure managed prometheus
if azureTransport != nil {
httpClient.Transport = azureTransport
}
gcpTransport, err := gcp.GetGCPOAuth2HTTPTransport(config, httpClient.Transport, gcp.GcpScopeMonitoringRead)
if err != nil && !errors.Is(err, gcp.ErrGoogleApplicationCrendentialsNotFound) {
logger.V(1).Error(err, "failed to get GCP client HTTP transport (either using Google application credentials or workload identity)")
return nil, err
}
if err == nil && gcpTransport != nil {
httpClient.Transport = gcpTransport
}
awsTransport, err := aws.NewSigV4RoundTripper(config)
if err != nil {
logger.V(1).Error(err, "failed to get AWS client HTTP transport ")
return nil, err
}
if err == nil && awsTransport != nil {
httpClient.Transport = awsTransport
}
}

In any case, I've triggered the tests again

@JorTurFer
Copy link
Member

Most probably it's returning an error here:

awsTransport, err := aws.NewSigV4RoundTripper(config)
if err != nil {
logger.V(1).Error(err, "failed to get AWS client HTTP transport ")
return nil, err
}

An error in that line will make fail the test although is GCP because it returns error and not the GCP transport

@maxbog
Copy link
Author

maxbog commented Nov 7, 2024

Unit tests are passing now

CHANGELOG.md Outdated Show resolved Hide resolved
@zroubalik
Copy link
Member

zroubalik commented Nov 11, 2024

/run-e2e aws
Update: You can check the progress here

@maxbog
Copy link
Author

maxbog commented Nov 12, 2024

Can we merge the PR? I'd rather not be stuck in the conflict -> merge -> e2e test -> conflict death spiral ;)

@votzest
Copy link

votzest commented Nov 19, 2024

it seems e2e tests got stuck again? otherwise can you merge this pr @zroubalik, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AWS credentials cache key needs to include the region
6 participants