Skip to content

ORC: Support partition values from a constant map #897

@rdblue

Description

@rdblue

Identity partition values should be added to materialized records using values from a file's partition data. The initial implementation for Spark used a JoinedRow to join the partition values to each row read from a format, but this had a few problems:

  • Only top-level fields could be set this way, not nested fields
  • Values were not added in place and would require a projection in Spark
  • No support for directly reading tables with generics

Avro and Parquet are moving to implementations that pass a map from field ID to a value when building the reader, so the constant can be added at the right place in the read schema, and so that the implementation can be shared across in-memory representations. ORC should also add support for passing partition values as a map.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions