Skip to content

Conversation

@ramkumarvenkat
Copy link

@ramkumarvenkat ramkumarvenkat commented Feb 23, 2017

What changes were proposed in this pull request?

Minor typo in even-time, which is changed to event-time and a couple of grammatical errors fix.

How was this patch tested?

N/A - since this is a doc fix. I did a jekyll build locally though.

@ramkumarvenkat
Copy link
Author

@srowen @tdas Can you guys please look into this small doc fix?

see how this model handles event-time based processing and late arriving data.

## Handling Event-time and Late Data
Event-time is the time embedded in the data itself. For many applications, you may want to operate on this event-time. For example, if you want to get the number of events generated by IoT devices every minute, then you probably want to use the time when the data was generated (that is, event-time in the data), rather than the time Spark receives them. This event-time is very naturally expressed in this model -- each event from the devices is a row in the table, and event-time is a column value in the row. This allows window-based aggregations (e.g. number of events every minute) to be just a special type of grouping and aggregation on the even-time column -- each time window is a group and each row can belong to multiple windows/groups. Therefore, such event-time-window-based aggregation queries can be defined consistently on both a static dataset (e.g. from collected device events logs) as well as on a data stream, making the life of the user much easier.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you point out what changed here? github doesnt seeming to showing the difference clearly like the other diffs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and aggregation on the even-time column

is changed to

and aggregation on the event-time column

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd generally ask you scan the rest of the docs and similar docs for typos too, in order to cut down on the number of typo fix PRs we need to review.


### Window Operations on Event Time
Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand about window-based aggregations are very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration.
Aggregations over a sliding event-time window are straightforward with Structured Streaming. The key idea to understand window-based aggregations is very similar to grouped aggregations. In a grouped aggregation, aggregate values (e.g. counts) are maintained for each unique value in the user-specified grouping column. In case of window-based aggregations, aggregate values are maintained for each window the event-time of a row falls into. Let's understand this with an illustration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs a fix -- I would just say "Window-based aggregations are very similar to grouped aggregations"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed.
"The key idea to understand is that window-based aggregations are very similar to grouped aggregations."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed this as well

@SparkQA
Copy link

SparkQA commented Feb 25, 2017

Test build #3583 has finished for PR 17037 at commit ac24bd6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 25, 2017

Merged to master

@asfgit asfgit closed this in 1b9ba25 Feb 25, 2017
Yunni pushed a commit to Yunni/spark that referenced this pull request Feb 27, 2017
## What changes were proposed in this pull request?

Minor typo in `even-time`, which is changed to `event-time` and a couple of grammatical errors fix.

## How was this patch tested?

N/A - since this is a doc fix. I did a jekyll build locally though.

Author: Ramkumar Venkataraman <[email protected]>

Closes apache#17037 from ramkumarvenkat/doc-fix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants