-
Notifications
You must be signed in to change notification settings - Fork 29k
[MINOR][DOCS] Fix few typos in structured streaming doc #17037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| see how this model handles event-time based processing and late arriving data. | ||
|
|
||
| ## Handling Event-time and Late Data | ||
| Event-time is the time embedded in the data itself. For many applications, you may want to operate on this event-time. For example, if you want to get the number of events generated by IoT devices every minute, then you probably want to use the time when the data was generated (that is, event-time in the data), rather than the time Spark receives them. This event-time is very naturally expressed in this model -- each event from the devices is a row in the table, and event-time is a column value in the row. This allows window-based aggregations (e.g. number of events every minute) to be just a special type of grouping and aggregation on the even-time column -- each time window is a group and each row can belong to multiple windows/groups. Therefore, such event-time-window-based aggregation queries can be defined consistently on both a static dataset (e.g. from collected device events logs) as well as on a data stream, making the life of the user much easier. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you point out what changed here? github doesnt seeming to showing the difference clearly like the other diffs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and aggregation on the even-time column
is changed to
and aggregation on the event-time column
srowen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd generally ask you scan the rest of the docs and similar docs for typos too, in order to cut down on the number of typo fix PRs we need to review.
|
|
||
| ### Window Operations on Event Time | ||
| Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. | ||
| Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand window-based aggregations is very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs a fix -- I would just say "Window-based aggregations are very similar to grouped aggregations"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed.
"The key idea to understand is that window-based aggregations are very similar to grouped aggregations."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this as well
|
Test build #3583 has finished for PR 17037 at commit
|
|
Merged to master |
## What changes were proposed in this pull request? Minor typo in `even-time`, which is changed to `event-time` and a couple of grammatical errors fix. ## How was this patch tested? N/A - since this is a doc fix. I did a jekyll build locally though. Author: Ramkumar Venkataraman <[email protected]> Closes apache#17037 from ramkumarvenkat/doc-fix.
What changes were proposed in this pull request?
Minor typo in
even-time, which is changed toevent-timeand a couple of grammatical errors fix.How was this patch tested?
N/A - since this is a doc fix. I did a jekyll build locally though.