bug: Can't read GCS files due to Decompressive Transcoding #5070

amos-osmos · 2024-08-29T18:27:19Z

Describe the bug

We have some files which are gzip encoded at rest in GCS. Thanks to Decompressive transcoding, when getting metadata about a file the compressed byte count will be returned, but when the file itself is retrieved it will be uncompressed and so the total byte count will be higher.

This interacts poorly with this PR: #4690 where the two values mentioned above are compared and we keep hitting reader got too much data.

Steps to Reproduce

I'm working on providing an actual repro but running into difficulties with permissions.

Have a file gzip encoded on GCS, with object metadata Content-Encoding set to gzip.
Now, build an operator of type opendal::types::operator::operator::Operator and call operator.read()
What currently happens is an opendal::Error as the Result that prints to:

Unexpected (permanent) at  => reader got too much data

Context:

expect: <compressed byte count>

actual: <uncompressed byte count>

Expected Behavior

An Ok Result with the contents of the file.

Additional Context

No response

Are you willing to submit a PR to fix this bug?

Yes, I would like to submit a PR.

The text was updated successfully, but these errors were encountered:

Xuanwo · 2024-08-30T01:52:21Z

We will bypass the content length check if the response includes content-encoding or it doesn't have content-length header:

opendal/core/src/raw/http_util/client.rs

Lines 126 to 131 in 309d3eb

    
           let content_length = if is_head || parse_content_encoding(resp.headers())?.is_some() { 
        
               None 
        
           } else { 
        
               parse_content_length(resp.headers())? 
        
           };

I will try to reproduce this.

amos-osmos added the bug Something isn't working label Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Can't read GCS files due to Decompressive Transcoding #5070

bug: Can't read GCS files due to Decompressive Transcoding #5070

amos-osmos commented Aug 29, 2024

Xuanwo commented Aug 30, 2024

bug: Can't read GCS files due to Decompressive Transcoding #5070

bug: Can't read GCS files due to Decompressive Transcoding #5070

Comments

amos-osmos commented Aug 29, 2024

Describe the bug

Steps to Reproduce

Expected Behavior

Additional Context

Are you willing to submit a PR to fix this bug?

Xuanwo commented Aug 30, 2024