Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

aws s3 sync with include/exclude does wrongly report objects do not exist and reupload them #8932

Open
mapk-amazon opened this issue Sep 19, 2024 · 3 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue s3

Comments

@mapk-amazon
Copy link

mapk-amazon commented Sep 19, 2024

Describe the bug

The s3 sync command is wrongly identifying a file as not yet uploaded when include/excludes are provided.

Expected Behavior

I expect sync to behave the same in those two situation:

Syncing a directory with a single file vs. _Syncing a directory containing a single file with include/exclude conditions satisfied by the file

Current Behavior

The current behaviour does not recognize already uploaded files and is overwriting them, when including include/exclude commands.

Reproduction Steps

I have the following directory structure

.
├── testfile
│   └── ec2-user.txt

And I try to upload the file with sync which I should recognize that the object is already on S3 and not reupload it again. It works when i do

aws s3 sync  testfile/  s3://$bucketname/ 

Starting the same command does not try to upload anything after the first run (or when the file has changed).

But once I specify exclude and include (with $abspathfile being the absolute path of the single file)

aws s3 sync  testfile/  s3://$bucketname/  --exclude "*" --include "$abspathfile" 

it always uploads the file. When running it with debug, it does listV2 calls and reports (wrongly)

2024-09-19 18:11:49,138 - MainThread - awscli.customizations.s3.syncstrategy.base - DEBUG - syncing: XXXXX/testfile/ec2-user.txt -> $bucketname/ec2-user.txt, file does not exist at destination

Possible Solution

AWS CLI should recognize already uploaded files

Additional Information/Context

No response

CLI version used

aws-cli/2.17.54 Python/3.12.6 Linux/6.1.79-99.167.amzn2023.x86_64 exe/x86_64.amzn.2023

Environment details (OS name and version, etc.)

Amazon Linux 2023

@mapk-amazon mapk-amazon added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Sep 19, 2024
@mapk-amazon
Copy link
Author

I believe the issue is here

def call(self, file_infos):

It applies the filter for the local filesystem to the S3 bucket. It excludes my file because it fits the * pattern, but does not "reinclude" it because the path on S3 is different than my local path. This leads the CLI to believe that the file does not exist on my S3 bucket and thus will reupload the file.

@RyanFitzSimmonsAK RyanFitzSimmonsAK self-assigned this Sep 23, 2024
@RyanFitzSimmonsAK RyanFitzSimmonsAK added investigating This issue is being investigated and/or work is in progress to resolve the issue. s3 p2 This is a standard priority issue needs-review This issue or pull request needs review from a core team member. and removed needs-triage This issue or PR still needs to be triaged. investigating This issue is being investigated and/or work is in progress to resolve the issue. needs-review This issue or pull request needs review from a core team member. labels Sep 23, 2024
@RyanFitzSimmonsAK
Copy link
Contributor

Hi, thanks for reporting this issue. I was able to reproduce the behavior, and am working on root causing and fixing it. I'll put any updates in this issue. Thanks!

@RyanFitzSimmonsAK
Copy link
Contributor

I don't think this is an issue with the filters; those appear to be consistent between using absolute path and relative path. Additionally, the logic for determining if a file exists in an S3 bucket is separate from the filters. The ListObjectsV2 response is also correct, so it's possibly an issue with the CLI interpreting that response. Regardless, thank you for reporting this bug.

@RyanFitzSimmonsAK RyanFitzSimmonsAK removed their assignment Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue s3
Projects
None yet
Development

No branches or pull requests

2 participants