Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make recursive iteration to get dictionaries more defensive for interop #375

Closed
okartal opened this issue Jan 11, 2023 · 2 comments · Fixed by #382
Closed

Make recursive iteration to get dictionaries more defensive for interop #375

okartal opened this issue Jan 11, 2023 · 2 comments · Fixed by #382

Comments

@okartal
Copy link
Contributor

okartal commented Jan 11, 2023

The problem I have encountered is due to this for loop:

for child in field.children

It assumes that field.children is an iterable and, to my understanding, this assumption is usually warranted as for the last elements, when the leaves of the tree are reached, field.children is Arrow.Flatbuf.Field[]. However, we have observed that when Arrow files are written in C# field.children === nothing and trying to iterate throws an error. I can share an example file.

I could fix this in my deved, local Arrow.jl by simply conditioning the for loop:

if field.children !== nothing
    for child in field.children
        getdictionaries!(dictencoded, child)
    end
end

I am not sure if this is the right fix but I am opening this issue here because I am more familiar with the Arrow implementation in Julia and I think it is reasonable to be more defensive. But, as reading the same data works once the file is imported-exported with Python/pyarrow, my guess is that it is the C# implementation of Arrow that is doing something fishy here. Maybe this can also be fixed at the site where Julia produces the nothing when translating the Arrow types?

@quinnj
Copy link
Member

quinnj commented Jan 19, 2023

that looks like a good fix to me; if you make a PR, that'd be great. If you have an example arrow file that exhibits the problem that we can use to test it with, even better.

@okartal
Copy link
Contributor Author

okartal commented Jan 20, 2023

@quinnj I opened a PR now and I can definitely share a file beginning of next week. Thanks a lot for this great project!

quinnj pushed a commit that referenced this issue May 25, 2023
Closes #375 

This PR fixes a problem in reading an Arrow IPC file generated by C#
Apache.Arrow 10.0.1 where the last recursive iteration to get the
dictionaries returns `nothing` instead of an empty iterable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants