Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3Path does not set object mime-type when uploading a file on S3 #222

Closed
davidatsamp opened this issue Apr 19, 2022 · 5 comments · Fixed by #226
Closed

S3Path does not set object mime-type when uploading a file on S3 #222

davidatsamp opened this issue Apr 19, 2022 · 5 comments · Fixed by #226

Comments

@davidatsamp
Copy link

In order to serve correctly S3 object from cloudfront origin, we need to set correctly S3 object mime-type.
Unfortunately the S3 client does not provide any mime-type when calling upload_file.

It could be possible to use at least the mimetypes module for the most common mime-types based on file extension in s3client.py:

def _upload_file(self, local_path: Union[str, os.PathLike], cloud_path: S3Path) -> S3Path:
    obj = self.s3.Object(cloud_path.bucket, cloud_path.key)

    extra_args = {}
    content_type, content_encoding = mimetypes.guess_type(str(local_path))
    if content_type is not None:
        extra_args["ContentType"] = content_type
    if content_encoding is not None:
        extra_args["ContentEncoding"] = content_encoding

    obj.upload_file(str(local_path), Config=self.boto3_transfer_config, ExtraArgs=extra_args)
    return cloud_path
@pjbull
Copy link
Member

pjbull commented Apr 19, 2022

I agree it would be good to support content types—especially for web publishing flows. Maybe we can use an implementation like the above and have a guess_mimetypes flag that is set in the client __init__ so people can disable this behavior if it is doing unexpected things.

Two notes:

  • Default on or off may depend on mimetypes.guess_type performance.
  • We may want to think about a better way to let users explicitly set this per-file, (e.g., setters for certain properties), but I like this as a low-lift immediate implementation.

@davidatsamp
Copy link
Author

An alternate solution could be to set globally a mime type / content encoding function provider. So the default implementation could be mimetypes.guess_type and it would be easy to provide an alternate implementation if needed.
Something like that:
S3Path.mimetype_guess = my_own_mimetype_guess_function
I think about a global setup because it should be a real pain (and error prone) to have to set this configuration on a per instance basis.

@pjbull
Copy link
Member

pjbull commented Apr 27, 2022

I think about a global setup because it should be a real pain (and error prone) to have to set this configuration on a per instance basis.

Agreed this should not need to be set per CloudPath object. If you wanted to use something other than the defaults, you would either create a specific client and then use that client's factory methods to create all the path instances, or you would create the client and then call set_as_default_client so that it was automatically used for all future instances.

You can see examples of those two patterns in the docs here:
https://cloudpathlib.drivendata.org/stable/authentication/#advanced-use

Either way, though, you'd only have to set it in one place.

@pjbull
Copy link
Member

pjbull commented May 19, 2022

@davidatsamp this should be fixed in 0.8.0 which is on pypi now. could you test your use case and confirm? guess_type is the default and instructions for customizing are here: https://cloudpathlib.drivendata.org/stable/other_client_settings/#content-type-guessing-content_type_method

@davidatsamp
Copy link
Author

I have tested it and integrate it. Thanks a lot 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants