Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Regression after upgrading to fastavro 1.8.4 #720

Closed
riteshghorse opened this issue Oct 6, 2023 · 8 comments
Closed

[Bug]: Regression after upgrading to fastavro 1.8.4 #720

riteshghorse opened this issue Oct 6, 2023 · 8 comments

Comments

@riteshghorse
Copy link

We found a regression in the latest release version 1.8.4 where some of our tests are failing with EOFError. Below is the stacktrace indicating specific functions from fastavro:

�[1m�[31mapache_beam/io/gcp/bigquery.py�[0m:1312: in __next__
    return fastavro.schemaless_reader(self.bytes_reader, self.avro_schema)
�[1m�[31mfastavro/_read.pyx�[0m:1126: in fastavro._read.schemaless_reader
    ???
�[1m�[31mfastavro/_read.pyx�[0m:1153: in fastavro._read.schemaless_reader
    ???
�[1m�[31mfastavro/_read.pyx�[0m:743: in fastavro._read._read_data
    ???
�[1m�[31mfastavro/_read.pyx�[0m:616: in fastavro._read.read_record
    ???
�[1m�[31mfastavro/_read.pyx�[0m:735: in fastavro._read._read_data
    ???
�[1m�[31mfastavro/_read.pyx�[0m:526: in fastavro._read.read_union
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
�[1m�[31mE   EOFError�[0m

�[1m�[31mfastavro/_read.pyx�[0m:176: EOFError
@tvalentyn
Copy link
Contributor

@riteshghorse could it be because we have a newer version of fastavro at job submission, when we generate test data, but older version of fastavro in the Dataflow containers at runtime?

@tvalentyn
Copy link
Contributor

there might be not enough info for fastavro folks to investigate , perhaps this can be reproed in a simple example?

@riteshghorse
Copy link
Author

@riteshghorse could it be because we have a newer version of fastavro at job submission, when we generate test data, but older version of fastavro in the Dataflow containers at runtime?

Agree, could be a version mismatch issue.

@scottbelden
Copy link
Collaborator

Thanks for the heads up, but yeah, will definitely need some additional info. The best would be the binary data that was being decoded. If that's not possible, a reproducible script would be great. Of course, I understand it's not always possible to share the schema/data.

Do you know if the encoded binary was created by fastavro or another avro library?

@scottbelden
Copy link
Collaborator

@riteshghorse @tvalentyn I think I have an idea of what's going on. One of the changes between 1.8.3 and 1.8.4 was to raise EOFError instead of StopIteration when we reach the end of a buffer as it's more descriptive of what's actually happening.

Looking at the part that is failing in Beam, I see it tries to call schemaless_reader but can catch the StopIteration. I'm willing to bet that all you have to do is change the try/except to catch an EOFError instead of a StopIteration.

@riteshghorse
Copy link
Author

Oh thanks for this investigation, Scott! I'll try this change

@antoni-szych-rtbhouse
Copy link

Hint for anyone who found this topic and they're using google-bigquery-storage: it was fixed a few days ago (googleapis/python-bigquery-storage#687), but it's not yet released as of 9 Oct 2023. Workaround for now: use fastavro<1.8.4.

@riteshghorse
Copy link
Author

@riteshghorse @tvalentyn I think I have an idea of what's going on. One of the changes between 1.8.3 and 1.8.4 was to raise EOFError instead of StopIteration when we reach the end of a buffer as it's more descriptive of what's actually happening.

Looking at the part that is failing in Beam, I see it tries to call schemaless_reader but can catch the StopIteration. I'm willing to bet that all you have to do is change the try/except to catch an EOFError instead of a StopIteration.

Thank you Scott! That solved the issue for us :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants