pyarrow latest parquet map column type isn't supported #2262

den-rifiniti · 2018-07-13T16:58:10Z

Hello
When I trying read parquet file with column of type map, pyarrow.lib.ArrowNotImplementedError: lists with structs are not supported. exception is throws.

Seems like this was fixed here #1530, and only release is required?

The text was updated successfully, but these errors were encountered:

xhochy · 2018-07-14T17:54:36Z

No, this would also require apache/parquet-cpp#462

wesm · 2018-07-19T13:53:14Z

Assistance with this would be much appreciated. Unfortunately we haven't been able to get this done in time for 0.10, so it will have to be later this year

damache · 2018-12-07T23:24:57Z

was this fixed? I installed the following

!conda install -c conda-forge pyarrow

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /opt/conda/envs/DSX-Python35:

The following NEW packages will be INSTALLED:

    arrow-cpp:   0.10.0-py35h70250a7_0 conda-forge
    boost-cpp:   1.67.0-h3a22d5f_0     conda-forge
    parquet-cpp: 1.5.0.pre-h83d4a3d_0  conda-forge
    pyarrow:     0.10.0-py35hfc679d8_0 conda-forge

boost-cpp-1.67 100% |################################| Time: 0:00:00  90.87 MB/s
arrow-cpp-0.10 100% |################################| Time: 0:00:00  47.85 MB/s
parquet-cpp-1. 100% |################################| Time: 0:00:00  68.96 MB/s
pyarrow-0.10.0 100% |################################| Time: 0:00:00  62.08 MB/s

then tried this code

import io
import pandas as pd
import pyarrow.parquet as pq

# Read the parquet file
buffer = io.BytesIO()
object = cos.Object('*********','*****************')
object.download_fileobj(buffer)
table = pq.read_table(buffer)
df = table.to_pandas()
print(df.head())

but I get this error

ArrowNotImplementedError                  Traceback (most recent call last)
<ipython-input-11-a1e8748910ba> in <module>()
      7 object = cos.Object('********','*************')
      8 object.download_fileobj(buffer)
----> 9 table = pq.read_table(buffer)
     10 df = table.to_pandas()
     11 print(df.head())

/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/pyarrow/parquet.py in read_table(source, columns, nthreads, metadata, use_pandas_metadata)
   1048     pf = ParquetFile(source, metadata=metadata)
   1049     return pf.read(columns=columns, nthreads=nthreads,
-> 1050                    use_pandas_metadata=use_pandas_metadata)
   1051 
   1052 

/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/pyarrow/parquet.py in read(self, columns, nthreads, use_pandas_metadata)
    150             columns, use_pandas_metadata=use_pandas_metadata)
    151         return self.reader.read_all(column_indices=column_indices,
--> 152                                     nthreads=nthreads)
    153 
    154     def scan_contents(self, columns=None, batch_size=65536):

/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/pyarrow/_parquet.pyx in pyarrow._parquet.ParquetReader.read_all()

/opt/conda/envs/DSX-Python35/lib/python3.5/site-packages/pyarrow/error.pxi in pyarrow.lib.check_status()

ArrowNotImplementedError: lists with structs are not supported.

wesm · 2018-12-08T01:37:36Z

No It has not yet been implemented

wesm · 2018-12-09T17:43:49Z

@damache would anyone from IBM like to get involved with Parquet development? We could really use the help.

sujayramaiah · 2020-01-24T16:45:05Z

Most of the data files in our data lake has map columns. Not being able to read parquet files with map columns using pyarrow creates dependency on Spark. Is there a plan to support map columns?

wesm · 2020-01-28T23:06:11Z

Yes, but someone has to do the implementation work. See ARROW-1644 and related issues

wesm closed this as completed Jul 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyarrow latest parquet map column type isn't supported #2262

pyarrow latest parquet map column type isn't supported #2262

den-rifiniti commented Jul 13, 2018 •

edited

Loading

xhochy commented Jul 14, 2018

wesm commented Jul 19, 2018

damache commented Dec 7, 2018

wesm commented Dec 8, 2018

wesm commented Dec 9, 2018

sujayramaiah commented Jan 24, 2020

wesm commented Jan 28, 2020

pyarrow latest parquet map column type isn't supported #2262

pyarrow latest parquet map column type isn't supported #2262

Comments

den-rifiniti commented Jul 13, 2018 • edited Loading

xhochy commented Jul 14, 2018

wesm commented Jul 19, 2018

damache commented Dec 7, 2018

wesm commented Dec 8, 2018

wesm commented Dec 9, 2018

sujayramaiah commented Jan 24, 2020

wesm commented Jan 28, 2020

den-rifiniti commented Jul 13, 2018 •

edited

Loading