Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

List/Map is not compatible with AWS Athena/Hive/PrestoDB #30

Open
dg3feiko opened this issue Dec 12, 2017 · 10 comments
Open

List/Map is not compatible with AWS Athena/Hive/PrestoDB #30

dg3feiko opened this issue Dec 12, 2017 · 10 comments

Comments

@dg3feiko
Copy link

I generated a parquet file with parquet.js with data containing list and map, but the nested field is not readable by AWS Athena, which is based on PrestoDB. I checked other implementations and it seem this is the reason apache/parquet-java#411

Thank you for the great job all the same.

@dg3feiko
Copy link
Author

dg3feiko commented Dec 13, 2017

this is the schema generated by parquet.js for a list of elements

{
  mylist:[{"foo":"abc", "bar":"abc"}, {"foo":"abc", "bar":"abc"} ]
}
message root {
  repeated group mylist {
    required binary foo (UTF8);
    required binary bar (UTF8);
  }
}

and expected schema for PrestoDB/Hive is

message root {
  required group mylist (LIST){
    repeated group list {
       required group element {
            required binary foo (UTF8);
            required binary bar (UTF8);
       }
    }
  }
}

@shyim
Copy link

shyim commented Aug 21, 2018

Hey @dg3feiko,
have you found a working solution for that problem?

@ZJONSSON
Copy link
Contributor

@shyim @dg3feiko Did you check out the #67 - might be related

@shyim
Copy link

shyim commented Aug 21, 2018

I have installed your version like mentioned in the comment with

npm install zjonsson/parquetjs#07fb2fd8fc03bf2b57243531eaf91f2d60f5e460

Generated new files and copied that to the S3 bucket, still problems with the athena query..

@ZJONSSON
Copy link
Contributor

there is also #43 you could try to install a fork that has all my outstanding PRs here merged to master (including the 43)

npm install zjonsson/parquetjs

@shyim
Copy link

shyim commented Aug 27, 2018

I can select simple fields in the first tier, but when i select a struct Athena crashes with message: HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 with your latest fork

@bwisitero
Copy link

i used 0.8.0 to convert a flat json file to parquet. Verified that im able to write and read it back. Uploaded it to s3 and used glue to create the athena table. Im unable to query the data for some reason though, getting a GENERIC_INTERNAL_ERROR: 0
Anybody else using this converter for athena?

@justinsoliz
Copy link

I gave this a try recently in AWS with Athena + Presto using the latest from zjonsson/parquetjs.

Root level primitives worked but nested lists failed:

Expected LIST column column to only have one field, but has x fields

@gbassan-br
Copy link

gbassan-br commented Apr 30, 2019

I gave this a try recently in AWS with Athena + Presto using the latest from zjonsson/parquetjs.

Root level primitives worked but nested lists failed:

Expected LIST column column to only have one field, but has x fields

+1
Anyone with a answer?

@ZJONSSON
Copy link
Contributor

So I encountered the same issue and spend some time getting it to work. Here is a solution that seems to work at least for my case of lists with structs: ZJONSSON#34
Test case from parquetjs to Athena can be found here: https://github.com/ZJONSSON/parquetjs/blob/9cee1592ce41e8dbca088fa2330b48ceb2d1de1a/test/list.js

jeffbski-rga pushed a commit to jeffbski/parquetjs that referenced this issue Mar 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants