Skip to content

Commit

Permalink
[BUG] Fix reading of logical types from Parquet files in s3 (#3026)
Browse files Browse the repository at this point in the history
* The inferred schema from a Parquet file includes logical types
* However, when reading Series from the Parquet file, we read the "arrow
types" from the Parquet schema
* This causes the Table to crap itself because the schemas don't match
(we try to pass in the inferred schema with logical types but it doesn't
match the Series types which are inferred from arrow types on the
Parquet file)

Co-authored-by: Jay Chia <[email protected]@users.noreply.github.com>
  • Loading branch information
jaychia and Jay Chia authored Oct 10, 2024
1 parent 73ff3f3 commit 9ae8122
Showing 1 changed file with 1 addition and 2 deletions.
3 changes: 1 addition & 2 deletions src/daft-parquet/src/file.rs
Original file line number Diff line number Diff line change
Expand Up @@ -733,10 +733,9 @@ impl ParquetFileReader {
})?
.into_iter()
.collect::<DaftResult<Vec<_>>>()?;
let daft_schema = daft_core::prelude::Schema::try_from(self.arrow_schema.as_ref())?;

Table::new_with_size(
daft_schema,
Schema::new(all_series.iter().map(|s| s.field().clone()).collect())?,
all_series,
self.row_ranges.as_ref().iter().map(|rr| rr.num_rows).sum(),
)
Expand Down

0 comments on commit 9ae8122

Please sign in to comment.