Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Sep 30, 2020

Parquet row group statistics did not respect dict encoding. Also added a workaround to support filtering a dictionary encoded column.

}

DCHECK(lhs.is_array());
if (lhs.type()->id() == Type::DICTIONARY && rhs.type()->id() == Type::DICTIONARY) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wesm What do you think about adding kernels to scalar_compare.cc which do this inside compute:: ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this sounds fine, can you open a JIRA issue about it?

@github-actions
Copy link

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the non-performant way of decoding is fine for now (certainly because the array+scalar case will be more common).

But should there be some more tests added?

Could also use the small reproducer from the issue (my comment) to add as a python test

}

auto maybe_min = min->CastTo(field->type());
auto maybe_max = max->CastTo(field->type());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this change behaviour? For a dictionary with string values, is field->type() string or dictionary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StatisticsAsScalars returns scalars whose types are the correct physical type, so even if the column was dictionary(string) min and max would be just string before this cast

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(IE, it only changes behavior in cases where the physical type wasn't appropriate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants