Skip to content

Conversation

@tdas
Copy link
Contributor

@tdas tdas commented Jan 13, 2018

What changes were proposed in this pull request?

Added documentation for stream-stream joins

image

image

image

image

How was this patch tested?

N/a

@SparkQA
Copy link

SparkQA commented Jan 13, 2018

Test build #86073 has finished for PR 20255 at commit 1335a6d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

clickTime >= impressionTime AND
clickTime <= impressionTime + interval 1 hour
"""
))
Copy link
Member

@felixcheung felixcheung Jan 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should just work for R, like this:

(I added withWatermark in 2.3)

impressions <- read.stream( ...
clicks <- read.stream( ...

# Apply watermarks on event-time columns
impressionsWithWatermark <- withWatermark(impressions, "impressionTime", "2 hours")
clicksWithWatermark <- withWatermark(clicks, "clickTime", "3 hours")

# Join with event-time constraints
impressionsWithWatermark.join(
   clicksWithWatermark,
   expr(
     "clickAdId = impressionAdId AND
      clickTime >= impressionTime AND
      clickTime <= impressionTime + interval 1 hour"
))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link
Contributor Author

@tdas tdas Jan 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests for stream-stream joins in R as well? :)
Actually, I would like it to be tested first before I add a code snippet. so that instead "should work" we can claim for sure "works".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!


However, note that the outer NULL results will be generated with a delay (depends on the specified
watermark delay and the time range condition) because the engine has to wait for that long to ensure
there were no matches and there will be no more matches in future.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra space?


- Spark Summit 2017 Talk - [Easy, Scalable, Fault-tolerant Stream Processing with Structured Streaming in Apache Spark](https://spark-summit.org/2017/events/easy-scalable-fault-tolerant-stream-processing-with-structured-streaming-in-apache-spark/)
- Spark Summit Europe 2017 Talks -
- [Easy, Scalable, Fault-tolerant Stream Processing with Structured Streaming in Apache Spark](https://spark-summit.org/2017/events/easy-scalable-fault-tolerant-stream-processing-with-structured-streaming-in-apache-spark/)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: this link needs to be updated. blocked on some links not working on the spark summit website.

@SparkQA
Copy link

SparkQA commented Jan 17, 2018

Test build #86214 has finished for PR 20255 at commit b8381ef.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2018

Test build #86215 has finished for PR 20255 at commit 0af12a3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 17, 2018

Test build #86219 has finished for PR 20255 at commit 68f30d0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except some nits


- Cannot use streaming aggregations before joins.

- Cannot use mapGroupsWithState and flatMapGroupsWithState in Update mode cannot before joins.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: cannot before joins.

<td style="vertical-align: middle;">Inner</td>
<td style="vertical-align: middle;">
Supported, optionally specify watermark on both sides +
time constraints for state cleanup<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit" remove <

@SparkQA
Copy link

SparkQA commented Jan 18, 2018

Test build #86303 has finished for PR 20255 at commit e39b0a6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@zsxwing zsxwing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zsxwing
Copy link
Member

zsxwing commented Jan 18, 2018

Merging to master and 2.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants