Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Can't read GCS files due to Decompressive Transcoding #5070

Open
1 task
amos-osmos opened this issue Aug 29, 2024 · 1 comment
Open
1 task

bug: Can't read GCS files due to Decompressive Transcoding #5070

amos-osmos opened this issue Aug 29, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@amos-osmos
Copy link

Describe the bug

We have some files which are gzip encoded at rest in GCS. Thanks to Decompressive transcoding, when getting metadata about a file the compressed byte count will be returned, but when the file itself is retrieved it will be uncompressed and so the total byte count will be higher.

This interacts poorly with this PR: #4690 where the two values mentioned above are compared and we keep hitting reader got too much data.

Steps to Reproduce

I'm working on providing an actual repro but running into difficulties with permissions.

  1. Have a file gzip encoded on GCS, with object metadata Content-Encoding set to gzip.
  2. Now, build an operator of type opendal::types::operator::operator::Operator and call operator.read()
  3. What currently happens is an opendal::Error as the Result that prints to:
Unexpected (permanent) at  => reader got too much data

Context:

expect: <compressed byte count>

actual: <uncompressed byte count>

Expected Behavior

An Ok Result with the contents of the file.

Additional Context

No response

Are you willing to submit a PR to fix this bug?

  • Yes, I would like to submit a PR.
@amos-osmos amos-osmos added the bug Something isn't working label Aug 29, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Aug 30, 2024

We will bypass the content length check if the response includes content-encoding or it doesn't have content-length header:

let content_length = if is_head || parse_content_encoding(resp.headers())?.is_some() {
None
} else {
parse_content_length(resp.headers())?
};

I will try to reproduce this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants