Skip to content

Conversation

@aitorarjona
Copy link
Contributor

@aitorarjona aitorarjona commented Jun 20, 2023

Fix for #1107

This pull request adds functionality to retrieve AWS SDK config and credentials from the standard config file (~/.aws/config and ~/.aws/credentials) or env vars (more info).

Consequently, it deprecates using aws_access_key_id and aws_secret_access_key in aws Lithops config section.

This approach is not only more secure, as we avoid sending secrets to the runtime via payload, but also we support users with SSO-based accounts, which will need configure a profile in their ~/.aws/config file and retrieve their session credentials dynamically. E.g.:

[profile my-sso-profile]
sso_start_url = https://XXXXXXXX.awsapps.com/start
sso_region = us-east-1
sso_account_id = XXXXXXXXXXX
sso_role_name = XXXXXXXXXXXXXXXXX
region = us-east-1

Summary:

  • Added new parameter in AWS config: config_profile.

Developer's Certificate of Origin 1.1

   By making a contribution to this project, I certify that:

   (a) The contribution was created in whole or in part by me and I
       have the right to submit it under the Apache License 2.0; or

   (b) The contribution is based upon previous work that, to the best
       of my knowledge, is covered under an appropriate open source
       license and I have the right under that license to submit that
       work with modifications, whether created in whole or in part
       by me, under the same open source license (unless I am
       permitted to submit under a different license), as indicated
       in the file; or

   (c) The contribution was provided directly to me by some other
       person who certified (a), (b) or (c) and I have not modified
       it.

   (d) I understand and agree that this project and the contribution
       are public and that a record of the contribution (including all
       personal information I submit with it, including my sign-off) is
       maintained indefinitely and may be redistributed consistent with
       this project or the open source license(s) involved.

@aitorarjona
Copy link
Contributor Author

@JosepSampe please don't merge yet

@aitorarjona aitorarjona marked this pull request as draft June 21, 2023 09:31
@aitorarjona aitorarjona marked this pull request as ready for review July 12, 2023 12:22
@aitorarjona
Copy link
Contributor Author

@JosepSampe ready for review and merge

@aitorarjona aitorarjona changed the title [AWS] Support for SSO credentials [AWS] Use credentials and config from AWS SDK file Jul 13, 2023
Comment on lines 76 to 79
temp = copy.deepcopy(config_data['aws_lambda'])
config_data['aws_lambda'].update(config_data['aws'])
config_data['aws_lambda'].update(temp)
Copy link
Member

@JosepSampe JosepSampe Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason to, instead of copying the aws keys inside aws_lambda, create another level of configuration inside aws_lambda and put aws section inside aws_lambda section? I think this new approach will breake a functionality explained below.

Copy link
Contributor Author

@aitorarjona aitorarjona Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the idea was to remove the "aws" section altogether when clients are created. Reverted

Comment on lines 89 to 93
if "secret_access_key" in config_data["aws_lambda"]["aws"] or "access_key_id" in config_data["aws_lambda"]["aws"]:
logger.warning('Using "secret_access_key" and "access_key_id" in lithops configuration is deprecated and '
'it will be removed in future releases '
'- Use boto3 configuration with environment variables or config file in ~/.aws instead '
'(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html)')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason to deprecate access_key_id secret_access_key? IMO we should keep it as an option

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the deprecation notice but kept the warning

else:
logger.debug("Creating default boto3 client")
client_config = Config(
signature_version=UNSIGNED,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly, this line is necessary for accessing public buckets when no s3 config is provided in the lithops config

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now I will keep the default client as is, so that it can get credentials from the AWS role in the lambda runtime. If any user needs unsigned requests, it should explicitly create an s3 client or specify it in the s3 config

Copy link
Member

@JosepSampe JosepSampe Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an app that is analyzing an AWS S3 public bucket, where the client is automatically created by lithops because we put in iterdata a bucket address, like in this example but for AWS S3, so we should find a way to keep this unsigned flag in the case that a user does not have credentials at all in his machine. If the user has credentials in his machine, the unsigned flag is not necessary.

Comment on lines 145 to 146
elif 'region' not in config_data['aws_lambda']['aws']:
config_data['aws_lambda']['aws']['region'] = config_data['aws_lambda']['region']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, these lines where used to propagate the region from the aws_lambda section to the aws section, and later to aws_s3, in that case where someone puts a region in aws_lambda, but not inaws and aws_s3, this way the s3 backend would use the same region as the lambda backend. I think this new approach will break this propagation.

Copy link
Contributor Author

@aitorarjona aitorarjona Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Josep, now I understand... I'll add some comments to the code to explain this. What I was thinking is that all AWS configuration should be inherited from the standard AWS SDK ways, but I understand that is also important to enforce users deploy lambdas and buckets in the same region to avoid data transfer costs. In this case, the safest option is require to specifiy the "region" parameter the "aws" section, regardless of the AWS SDK config, and not have it repeated in "aws_s3" or "aws_lambda"?

Comment on lines 73 to 76
sts_client = self.aws_session.client('sts', region_name=self.region_name)
caller_id = sts_client.get_caller_identity()

self.user_key = caller_id["UserId"].split(":")[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested these particular lines, and in my case the caller_id["UserId"] does not contain :, so it fails. My UserID looks like: {'UserId': 'XIPOZEKLFKWLQSXXX7587', ...}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines 28 to 46
if "secret_access_key" in config_data["aws"] or "access_key_id" in config_data["aws"]:
logger.warning(
'using "secret_access_key" and "access_key_id" in the lithops configuration is deprecated and '
'they will be removed in future releases '
'- Use boto3 configuration with environment variables or config file in ~/.aws instead '
'(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html)')

# Put "aws" section inside AWS backends, so we can access credentials at the backend class
# Remove from config_data to avoid storing secrets
if "aws" in config_data:
if "aws_lambda" in config_data:
config_data["aws_lambda"]["aws"] = config_data["aws"]
if "aws_s3" in config_data:
config_data["aws_s3"]["aws"] = config_data["aws"]
if "aws_batch" in config_data:
config_data["aws_batch"]["aws"] = config_data["aws"]
if "aws_ec2" in config_data:
config_data["aws_ec2"]["aws"] = config_data["aws"]
del config_data["aws"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as in the lambda backend

Comment on lines 31 to 33
temp = copy.deepcopy(config_data['aws_s3'])
config_data['aws_s3'].update(config_data['aws'])
config_data['aws_s3'].update(temp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as in the lambda backend

@aitorarjona aitorarjona requested a review from JosepSampe July 24, 2023 14:08
Comment on lines 78 to 80
# temp = copy.deepcopy(config_data["aws_lambda"])
config_data["aws_lambda"].update(config_data["aws"])
# config_data["aws_lambda"].update(temp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason of this roundtrip is to make sure that the most specific config is always applied. For example, if you set region both in aws and aws_lamda, the aws_lamda region must be applied. I don't know if there is a better way to make sure the most specific config is applied and not overwritten by the .update(), but by commenting those 2 lines, if you have a region set in your aws section of the config, for example us-east1, and then you do this in your code: fexec = lithops.FunctionExecutor(region="eu-west2"), the us-east1 region will always overwrite the region you explicitly set in the function executor.

else:
logger.debug("Creating default boto3 client")
client_config = Config(
signature_version=UNSIGNED,
Copy link
Member

@JosepSampe JosepSampe Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an app that is analyzing an AWS S3 public bucket, where the client is automatically created by lithops because we put in iterdata a bucket address, like in this example but for AWS S3, so we should find a way to keep this unsigned flag in the case that a user does not have credentials at all in his machine. If the user has credentials in his machine, the unsigned flag is not necessary.

@aitorarjona aitorarjona requested a review from JosepSampe August 22, 2023 15:12
@aitorarjona
Copy link
Contributor Author

@JosepSampe Hi Josep, all requests have been implemented. Please we should need this merged ASAP, we switched to an SSO-based account and the current implementation in main does not work well (and also to be ready for the next release #1137 ). Thanks!

Comment on lines +137 to +140
if "region" not in config_data["aws_lambda"]:
raise Exception("\"region\" is mandatory under the \"aws_lambda\" or \"aws\" section of the configuration")
elif "region" not in config_data["aws"]:
config_data["aws"]["region"] = config_data["aws_lambda"]["region"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if region is set using options 1 (~/.aws/config) or 2 (env var)? Or is region mandatory in any case in the lithops config like in the documentation?

lithops:
    backend: aws_lambda

aws_lambda:
    execution_role: <EXECUTION_ROLE_ARN>
    region: <REGION_NAME>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we still need the region name for this or this. In any case, the region from the Lithops config would override the region stated in the aws config.

Comment on lines 27 to 31
if "secret_access_key" in config_data["aws"] or "access_key_id" in config_data["aws"]:
logger.warning("Using 'secret_access_key' and 'access_key_id' in lithops configuration is not recommended "
"- Use boto3 configuration file in ~/.aws or environment variables instead "
"(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html)")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for any warning for now. We still have to decide whether we want to deprecate it or not

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't add comments in unmodifed code, but:
In lines 42-43: if in options 1 and 2 region is not mandatory, this should be fixed.
Now lithops supports the automatic creation of the storage bucket if it is not provided in the config. So lines 45-48 should be fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now lithops supports the automatic creation of the storage bucket if it is not provided in the config. So lines 45-48 should be fixed.

Not sure how to proceed with this... With the SSO approach, we don't have a "fixed" key or ID to read from, contrary to the key pair approach.

Copy link
Member

@JosepSampe JosepSampe Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the S3 instance shared between all the users in the SSO approach?
Or each SSO user has its own instance ?
Or the same insytance but s/he can only see his own buckets?

I wonder if we can simply use the config_profile name (or a hash of it) for the bucket name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple SSO users can access to a same account and share buckets, but each one will have a different profile name. Yes, using config profile is viable 👍


Lithops needs at least `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_DEFAULT_REGION` environment variables set.

3. Provide the credentials in the `aws` section of the Lithops config file **This option is not ideal and will be removed in future Lithops releases!**:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for any warning for now. We still have to decide whether we want to deprecate it or not

Comment on lines 71 to 74
if "secret_access_key" in config_data["aws"] or "access_key_id" in config_data["aws"]:
logger.warning("Using 'secret_access_key' and 'access_key_id' in lithops configuration is not recommended "
"- Use boto3 configuration file in ~/.aws or environment variables instead "
"(https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html)")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for any warning for now. We still have to decide whether we want to deprecate it or not

Comment on lines 54 to 56
aws_lambda:
execution_role: <EXECUTION_ROLE_ARN>
region: <REGION_NAME>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the execution_role mandatory in aws_lambda? if yes I would update this .md file and include it in all the parts where you put some lithops config example, to make it clearer. Is region mandatory in all the cases? I think this lithops config example is confusing here. I would remove it and put the config example in the next section, when necessary, with all the necessary parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, execution_role is mandatory. The user must specify which services can the lambda access to. We could automate this, but the user should have IAM permissions like create role... We can leave like this for now

Comment on lines 642 to 644
if "access_key_id" in payload["config"]["aws"] and "secret_access_key" in payload["config"]["aws"]:
del payload["config"]["aws"]["access_key_id"]
del payload["config"]["aws"]["secret_access_key"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary/convenient here to remove the session_token too?
I think here you can simply pop the keys instead of checking one by one if they exists

payload["config"]["aws"].pop("access_key_id", None)
payload["config"]["aws"].pop("secret_access_key", None)
payload["config"]["aws"].pop("session_token", None)

@JosepSampe
Copy link
Member

My last comments are about the 2 other AWS backend (Batch & EC2).

  • Does the changes made here in the way to configure aws affect to those backends?
  • Is it convenient to copy the changes made in the Lambda docs to the docs of Batch & EC2?
  • In order to adapt the Batch & EC2 backends, is it as simple as copy the relevant code in the __init__ of the Lambda backend to the other 2 backends?

Comment on lines +41 to +46
if 'access_key_id' in config_data['aws']:
key = config_data['aws_s3']['access_key_id']
elif 'config_profile' in config_data['aws']:
key = hashlib.md5(config_data['aws']['config_profile'].encode("utf-8"), usedforsecurity=False).hexdigest()
else:
raise Exception("'access_key_id' or 'config_profile' is mandatory in 'aws' section of the configuration")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the option of AWS_ACCESS_KEY_ID in the env missing here?

Comment on lines +52 to +54
1. Provide credentials via the `~/.aws/config` file. **This is the preferred option to configure AWS credentials for use with Lithops**:

You can run `aws configure` command if the AWS CLI is installed to setup the credentials.
Copy link
Member

@JosepSampe JosepSampe Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How the ~/.aws/config looks like in this case? are the keys going into a default profile by defaut? or are the keys set in the file without a profile?

I mean, after calling aws configure, you get this:?

aws_access_key_id=XXXXXXXXX
aws_secret_access_key=XXXXXX

or something like this:?

[profile default]
aws_access_key_id=XXXXXXXXXXXXXXX
aws_secret_access_key=XXXXXXXXXXXXXXXX

Copy link
Member

@JosepSampe JosepSampe Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in this case it makes sense to force the user to provide a profile_name with aws configure --profile my-unique-profile-name and then configure lithops like in the SSO approach, with:

lithops:
    backend: aws_lambda

aws:
    config_profile: my-unique-profile-name

aws_lambda:
    execution_role: <EXECUTION_ROLE_ARN>
    region: <REGION_NAME>

Comment on lines +56 to +58
2. Provide credentials via environment variables:

Lithops needs at least `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY` and `AWS_DEFAULT_REGION` environment variables set.
Copy link
Member

@JosepSampe JosepSampe Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe in this option you can put a config example (and maybe remove AWS_DEFAULT_REGION?):

lithops:
    backend: aws_lambda

aws_lambda:
    execution_role: <EXECUTION_ROLE_ARN>
    region: <REGION_NAME>

@aitorarjona
Copy link
Contributor Author

Closing for now, #1164 partially solves the issue described

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants