Skip to content

[SPARK-12938][SQL] DataFrame API for Bloom filter#10937

Closed
cloud-fan wants to merge 3 commits intoapache:masterfrom
cloud-fan:bloom-filter
Closed

[SPARK-12938][SQL] DataFrame API for Bloom filter#10937
cloud-fan wants to merge 3 commits intoapache:masterfrom
cloud-fan:bloom-filter

Conversation

@cloud-fan
Copy link
Contributor

This PR integrates Bloom filter from spark-sketch into DataFrame. This version resorts to RDD.aggregate for building the filter. A more performant UDAF version can be built in future follow-up PRs.

This PR also add 2 specify put version(putBinary and putLong) into BloomFilter, which makes it easier to build a Bloom filter over a DataFrame.

@cloud-fan
Copy link
Contributor Author

cc @rxin @liancheng

@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50156 has finished for PR 10937 at commit a0dcaa8.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class BloomFilterImpl extends BloomFilter implements Serializable

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add comment to explain the branching at here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific -> specialized

version -> variant

@rxin
Copy link
Contributor

rxin commented Jan 27, 2016

Since the two (cms and bf) are implemented by two different persons, it'd be great for one of you to go through both to make sure everything is consistent. We can do that in a follow-up pull request.

@cloud-fan
Copy link
Contributor Author

retest this please

@SparkQA
Copy link

SparkQA commented Jan 27, 2016

Test build #50208 has finished for PR 10937 at commit bd0671c.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • public class Utils

@rxin
Copy link
Contributor

rxin commented Jan 27, 2016

Thanks - going to merge this.

@asfgit asfgit closed this in 680afab Jan 27, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments