Skip to content

Conversation

@c21
Copy link
Contributor

@c21 c21 commented Sep 11, 2020

What changes were proposed in this pull request?

Several minor code and documentation improvement for stream-stream join. Specifically:

Why are the changes needed?

Minor optimization to avoid per-row unnecessary work (this probably can be optimized away by compiler, but we can do a better join to avoid it at the first place). And other comment/indentation fix to have better code readability for future developers.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests in StreamingJoinSuite.scala as no new logic is introduced.

@c21
Copy link
Contributor Author

c21 commented Sep 11, 2020

cc @cloud-fan and @sameeragarwal if you have time to take a look, thanks.

@SparkQA
Copy link

SparkQA commented Sep 11, 2020

Test build #128552 has finished for PR 29724 at commit 069ad73.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@c21
Copy link
Contributor Author

c21 commented Sep 11, 2020

retest this please

* If a timestamp column with event time watermark is present in the join keys or in the input
* data, then the it uses the watermark figure out which rows in the buffer will not join with
* and the new data, and therefore can be discarded. Depending on the provided query conditions, we
* data, then it uses the watermark figure out which rows in the buffer will not join with
Copy link
Contributor

@cloud-fan cloud-fan Sep 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uses the watermark to figure out

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - updated.


// Join one side input using the other side's buffered/state rows. Here is how it is done.
//
// - `leftJoiner.joinWith(rightJoiner)` generates all rows from matching new left input with
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not familiar with this part, cc @zsxwing @HeartSaVioR @xuanyuanking

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems to be just modified for replacing leftJoiner.joinWith(rightJoiner) with leftSideJoiner.storeAndJoinWithOtherSide(rightSideJoiner) and vice versa for right side. Other parts aren't modified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan , @HeartSaVioR - yes, this is just updating the comment, because there's no leftJoiner/rightJoiner/joinWith in the file, and the original author (#19271) should mean to refer to leftSideJoiner/rightSideJoiner/storeAndJoinWithOtherSide. I think it would make sense to be consistent between code and comment here. This is anyway a minor change for comment only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original PR just wants to use pseudocode to explain, either way is ok to me.

@SparkQA
Copy link

SparkQA commented Sep 11, 2020

Test build #128556 has finished for PR 29724 at commit 069ad73.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Contributor Author

@c21 c21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @cloud-fan and @HeartSaVioR , the PR is updated and ready for review again, thanks.

* If a timestamp column with event time watermark is present in the join keys or in the input
* data, then the it uses the watermark figure out which rows in the buffer will not join with
* and the new data, and therefore can be discarded. Depending on the provided query conditions, we
* data, then it uses the watermark figure out which rows in the buffer will not join with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan - updated.


// Join one side input using the other side's buffered/state rows. Here is how it is done.
//
// - `leftJoiner.joinWith(rightJoiner)` generates all rows from matching new left input with
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan , @HeartSaVioR - yes, this is just updating the comment, because there's no leftJoiner/rightJoiner/joinWith in the file, and the original author (#19271) should mean to refer to leftSideJoiner/rightSideJoiner/storeAndJoinWithOtherSide. I think it would make sense to be consistent between code and comment here. This is anyway a minor change for comment only.


// Join one side input using the other side's buffered/state rows. Here is how it is done.
//
// - `leftJoiner.joinWith(rightJoiner)` generates all rows from matching new left input with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the original PR just wants to use pseudocode to explain, either way is ok to me.

s"${getClass.getSimpleName} should not take $x as the JoinType")
case LeftOuter => left.outputPartitioning
case RightOuter => right.outputPartitioning
case _ => throwBadJoinTypeException()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuanyuanking - sorry that I don't get how to change, are you suggesting to have a string val for error message to be used in throwBadJoinTypeException and require(...): val errorMessageForJoinType = s"${getClass.getSimpleName} should not take $joinType as the JoinType"), or something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, have a string val for the same error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuanyuanking - sure, updated.

@SparkQA
Copy link

SparkQA commented Sep 11, 2020

Test build #128578 has finished for PR 29724 at commit cfd9cc0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Sep 12, 2020

Test build #128580 has finished for PR 29724 at commit ed800ee.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@c21
Copy link
Contributor Author

c21 commented Sep 14, 2020

Addressed all comments and tests are passed. Let me know if anything needs to be changed, thanks, @cloud-fan .

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 978f531 Sep 14, 2020
@c21
Copy link
Contributor Author

c21 commented Sep 14, 2020

Thank you @cloud-fan , @HeartSaVioR and @xuanyuanking for review!

@c21 c21 deleted the streaming branch September 14, 2020 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants