Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nested column branch had multiple children #23078

Closed
asfimport opened this issue Sep 30, 2019 · 6 comments
Closed

Nested column branch had multiple children #23078

asfimport opened this issue Sep 30, 2019 · 6 comments

Comments

@asfimport
Copy link
Collaborator

asfimport commented Sep 30, 2019

from pyarrow import json
import pyarrow.parquet as pq

r = json.read_json('example.jl')
pq.write_table(r, 'example.parquet')

Doing the above operation resulting in ArrowInvalid: Nested column branch had multiple children

Posting it here as per the request from #4045 (comment)

The sample schema looks like this

package_version: string
source_version: string
uuid: string
_type: string
position: struct<ais_type: string, course: double, draught: double, draught_raw: null, heading: double, lat: double, lon: double, nav_state: int64, received_time: timestamp[s], speed: double>
 child 0, ais_type: string
 child 1, course: double
 child 2, draught: double
 child 3, draught_raw: null
 child 4, heading: double
 child 5, lat: double
 child 6, lon: double
 child 7, nav_state: int64
 child 8, received_time: timestamp[s]
 child 9, speed: double
provider_name: string
vessel: struct<beam: null, build_year: null, call_sign: string, dead_weight: null, dwt: null, flag_code: null, flag_name: string, gross_tonnage: null, imo: string, length: null, mmsi: string, name: string, type: null, vessel_type: string>
 child 0, beam: null
 child 1, build_year: null
 child 2, call_sign: string
 child 3, dead_weight: null
 child 4, dwt: null
 child 5, flag_code: null
 child 6, flag_name: string
 child 7, gross_tonnage: null
 child 8, imo: string
 child 9, length: null
 child 10, mmsi: string
 child 11, name: string
 child 12, type: null
 child 13, vessel_type: string
source_provider: string

 

 

Reporter: harikrishnan

Related issues:

Original Issue Attachments:

Note: This issue was originally created as ARROW-6737. Please see the migration documentation for further details.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
[~harish1792] would you be able to provide a reproducible example? (so we can run the code and investigate the issue)
For example, a small json file that shows the problem?

@asfimport
Copy link
Collaborator Author

Wes McKinney / @wesm:
Is this ARROW-1644?

@asfimport
Copy link
Collaborator Author

harikrishnan:
@jorisvandenbossche Please use the attached file for testing. The attached file is not the same one as the schema I posted above. But this would result in the same error as I posted above. Let me know if you need any more info!

@asfimport
Copy link
Collaborator Author

harikrishnan:
Not sure https://jira.apache.org/jira/browse/ARROW-6760# this is also related to this. But facing this when I am trying to do a similar option with a slighty different Json format

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
Thanks for providing the sample file. This is indeed a duplicate of ARROW-1644. Nested lists/structs are currently not yet supported in the Arrow parquet IO implementation.

@asfimport
Copy link
Collaborator Author

Joris Van den Bossche / @jorisvandenbossche:
I noticed that reading this file on master actually gives problems, while it works on 0.14.1, so opened ARROW-6762 for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant