Skip to content

Conversation

@jshmchenxi
Copy link
Contributor

Split #2582 into several PRs.
This part adds support for reading parquet bloom filter to filter row groups.

I also wrote unit tests for ParquetBloomRowGroupFilter but it needs support for writing bloom filter. Will update unit tests once #2642 is merged.

@jshmchenxi jshmchenxi force-pushed the bloom-filter-read branch from ff3cccc to 4a59ee3 Compare May 27, 2021 07:45
Comment on lines +246 to +255
switch (col.getPrimitiveType().getPrimitiveTypeName()) {
case BINARY:
return bloom.hash(Binary.fromString(value.toString()));
case INT32:
case INT64:
case FLOAT:
case DOUBLE:
return bloom.hash(value);
default:
throw new IllegalArgumentException("Cannot hash of type: " + col.getPrimitiveType().getPrimitiveTypeName());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case BINARY, we ran into "Not supported type" error with BlockSplitBloomFilter.hash() because String is not of type Binary.
This is an ugly solution and it only considers String type under BINARY.
Maybe there needs to be a function to transfer Iceberg type to Parquet type, like the reverse of ParquetConversions.converterFromParquet(colType, icebergType)?

@jshmchenxi
Copy link
Contributor Author

Close in favor of #4831

@jshmchenxi jshmchenxi closed this May 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant