Skip to content

Conversation

@saucam
Copy link

@saucam saucam commented Sep 28, 2014

The In case class is replaced by a InSet class in case all the filters are literals, which uses a hashset instead of Sequence, thereby giving significant performance improvement (earlier the seq was using a worst case linear match (exists method) since expressions were assumed in the filter list) . Maximum improvement should be visible in case small percentage of large data matches the filter list.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@marmbrus
Copy link
Contributor

marmbrus commented Oct 1, 2014

ok to test

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21150/

@SparkQA
Copy link

SparkQA commented Oct 1, 2014

QA tests have started for PR 2561 at commit 430f5d1.

  • This patch merges cleanly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment that this is an optimized version of In for when all values of the inList are static.

@marmbrus
Copy link
Contributor

marmbrus commented Oct 2, 2014

Nice optimization :) A few minor suggestions only.

@SparkQA
Copy link

SparkQA commented Oct 2, 2014

QA tests have finished for PR 2561 at commit 430f5d1.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

…se to Optimizer.scala by adding a rule. Add

appropriate comments
@saucam
Copy link
Author

saucam commented Oct 2, 2014

Updated the branch by incorporating review comments.

  1. Moved the optimized version of In clause to Optimizer.scala
  2. Added appropriate comments

@SparkQA
Copy link

SparkQA commented Oct 2, 2014

QA tests have started for PR 2561 at commit bd84c67.

  • This patch merges cleanly.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21207/

@SparkQA
Copy link

SparkQA commented Oct 2, 2014

QA tests have finished for PR 2561 at commit bd84c67.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21208/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why special handling for UnaryMinus? If the underlying thing is a literal this will get constant folded away.

@marmbrus
Copy link
Contributor

marmbrus commented Oct 2, 2014

Another thing is there should be some tests for this. Probably in ExpressionEvaluationSuite. Also perhaps create a class similar to ConstantFoldingSuite

Yash Datta added 2 commits October 3, 2014 18:24
…Suite

            2. Add class OptimizedInSuite on the lines of ConstantFoldingSuite, for the optimized In clause
@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have started for PR 2561 at commit afedbcd.

  • This patch merges cleanly.

@saucam
Copy link
Author

saucam commented Oct 3, 2014

  1. removed unnecessary UnaryMinus case as it is constant-folded away
  2. Added unit test in ExpressionEvaluationSuite
  3. Added a class OptimizedInSuite which tests that In is optimized to InSet when filter list consists only of Literals

@SparkQA
Copy link

SparkQA commented Oct 3, 2014

QA tests have finished for PR 2561 at commit afedbcd.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21247/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should actually only be indented 2 spaces (and I'd include a blank line after). Only wrapped arguments are indented 4.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also double check that there isn't a problem if a null is in the in list. I'm not sure if thats actually valid SQL (and it should never change the result), but we shouldn't throw an exception.

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have started for PR 2561 at commit 448b650.

  • This patch merges cleanly.

@saucam
Copy link
Author

saucam commented Oct 6, 2014

  1. Fixed code style and import order
  2. Fixed optimization condition. Tested.
  3. Added tests for null in filter list. Also tested by running that null does not cause any problems with optimization included
  4. Added test case that optimization is not triggered in case of attributes in filter list

@SparkQA
Copy link

SparkQA commented Oct 6, 2014

QA tests have finished for PR 2561 at commit 448b650.

  • This patch fails unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21336/Test FAILed.

            2. Fix optimization condition
            3. Add tests for null in filter list
            4. Add test case that optimization is not triggered in case of attributes in filter list
@SparkQA
Copy link

SparkQA commented Oct 7, 2014

QA tests have started for PR 2561 at commit 4bf2d19.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Oct 7, 2014

QA tests have finished for PR 2561 at commit 4bf2d19.

  • This patch passes unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21356/Test PASSed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be OptimizeIn.

@marmbrus
Copy link
Contributor

marmbrus commented Oct 9, 2014

Thanks for implementing this! :)

I going to make the two final small changes myself before merging. Just wanted to comment for future reference.

asfgit pushed a commit that referenced this pull request Oct 9, 2014
The In case class is replaced by a InSet class in case all the filters are literals, which uses a hashset instead of Sequence, thereby giving significant performance improvement (earlier the seq was using a worst case linear match (exists method) since expressions were assumed in the filter list) . Maximum improvement should be visible in case small percentage of large data matches the filter list.

Author: Yash Datta <[email protected]>

Closes #2561 from saucam/branch-1.1 and squashes the following commits:

4bf2d19 [Yash Datta] SPARK-3711: 1. Fix code style and import order             2. Fix optimization condition             3. Add tests for null in filter list             4. Add test case that optimization is not triggered in case of attributes in filter list
afedbcd [Yash Datta] SPARK-3711: 1. Add test cases for InSet class in ExpressionEvaluationSuite             2. Add class OptimizedInSuite on the lines of ConstantFoldingSuite, for the optimized In clause
0fc902f [Yash Datta] SPARK-3711: UnaryMinus will be handled by constantFolding
bd84c67 [Yash Datta] SPARK-3711: Incorporate review comments. Move optimization of In clause to Optimizer.scala by adding a rule. Add appropriate comments
430f5d1 [Yash Datta] SPARK-3711: Optimize the filter list in case of negative values as well
bee98aa [Yash Datta] SPARK-3711: Optimize where in clause filter queries
@marmbrus
Copy link
Contributor

marmbrus commented Oct 9, 2014

Oh, also, in the future please open pull requests against master not a specific branch. We'll back port them as needed.

@asfgit asfgit closed this in 752e90f Oct 9, 2014
@saucam
Copy link
Author

saucam commented Oct 10, 2014

Thanks a lot for all the help with the merge :) Will surely keep all the pointers in mind for the next time :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants