-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31390][SQL][DOCS] Document Window Function in SQL Syntax Section #28220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
5eee4ee
cc7c443
1c7f59f
ea8ee10
116f403
6af2eba
3fb73f0
747cfef
6a3d475
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,7 +1,7 @@ | ||
| --- | ||
| layout: global | ||
| title: Windowing Analytic Functions | ||
| displayTitle: Windowing Analytic Functions | ||
| title: Window Functions | ||
| displayTitle: Window Functions | ||
| license: | | ||
| Licensed to the Apache Software Foundation (ASF) under one or more | ||
| contributor license agreements. See the NOTICE file distributed with | ||
|
|
@@ -19,4 +19,192 @@ license: | | |
| limitations under the License. | ||
| --- | ||
|
|
||
| **This page is under construction** | ||
| ### Description | ||
|
|
||
| Window functions operate on a group of rows, referred to as a window, and calculate a return value for each row based on the group of rows. Window functions are useful for processing tasks such as calculating a moving average, computing a cumulative statistic, or accessing the value of rows given the relative position of the current row. | ||
| Spark SQL supports three types of window functions: | ||
|
|
||
| * Ranking Functions | ||
| * Analytic Functions | ||
| * Aggregate Functions | ||
|
|
||
huaxingao marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ### Syntax | ||
|
|
||
| {% highlight sql %} | ||
| window_function OVER | ||
| ( [ { PARTITION | DISTRIBUTE } BY partition_col_name = partition_col_val ( [ , ... ] ) ] | ||
| { ORDER | SORT } BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [ , ... ] | ||
| [ window_frame ] ) | ||
| {% endhighlight %} | ||
|
|
||
| ### Parameters | ||
|
|
||
| <dl> | ||
| <dt><code><em>window_function</em></code></dt> | ||
| <dd> | ||
| <ul> | ||
| <li>Ranking Functions</li> | ||
| <br> | ||
| <b>Syntax:</b> | ||
| <code> | ||
| RANK | DENSE_RANK | PERCENT_RANK | NTILE | ROW_NUMBER | ||
| </code> | ||
| </ul> | ||
| <ul> | ||
| <li>Analytic Functions</li> | ||
| <br> | ||
| <b>Syntax:</b> | ||
| <code> | ||
| CUME_DIST | LAG | LEAD | ||
| </code> | ||
| </ul> | ||
| <ul> | ||
| <li>Aggregate Functions</li> | ||
| <br> | ||
| <b>Syntax:</b> | ||
| <code> | ||
| MAX | MIN | COUNT | SUM | AVG | ... | ||
| </code> | ||
| <br> | ||
| Please refer to the <a href="api/sql/">Built-in Functions</a> document for a complete list of Spark aggregate functions. | ||
| </ul> | ||
| </dd> | ||
| </dl> | ||
| <dl> | ||
| <dt><code><em>window_frame</em></code></dt> | ||
| <dd> | ||
| Specifies which row to start the window on and where to end it.<br><br> | ||
| <b>Syntax:</b><br> | ||
| <code> | ||
| { RANGE | ROWS } [ BETWEEN ] | ||
| UNBOUNDED { PRECEDING | FOLLOWING } | ||
| | CURRENT ROW | ||
| | boolean_expression { PRECEDING | FOLLOWING } | ||
| </code> <br><br> | ||
|
||
| <code>boolean_expression</code><br> | ||
| Specifies an expression with a return type of boolean. | ||
| </dd> | ||
| </dl> | ||
huaxingao marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ### Examples | ||
|
|
||
| {% highlight sql %} | ||
|
|
||
|
||
| CREATE TABLE employees (name STRING, dept STRING, salary INT, age INT); | ||
|
|
||
| INSERT INTO employees VALUES ("Lisa", "Sales", 10000, 35); | ||
| INSERT INTO employees VALUES ("Evan", "Sales", 32000, 38); | ||
| INSERT INTO employees VALUES ("Fred", "Engineering", 21000, 28); | ||
| INSERT INTO employees VALUES ("Alex", "Sales", 30000, 33); | ||
| INSERT INTO employees VALUES ("Tom", "Engineering", 23000, 33); | ||
| INSERT INTO employees VALUES ("Jane", "Marketing", 29000, 28); | ||
| INSERT INTO employees VALUES ("Jeff", "Marketing", 35000, 38); | ||
| INSERT INTO employees VALUES ("Paul", "Engineering", 29000, 23); | ||
| INSERT INTO employees VALUES ("Chloe", "Engineering", 23000, 25); | ||
|
|
||
| SELECT * FROM employees; | ||
| +-----+-----------+------+-----+ | ||
| | name| dept|salary| age| | ||
| +-----+-----------+------+-----+ | ||
| |Chloe|Engineering| 23000| 25| | ||
| | Fred|Engineering| 21000| 28| | ||
| | Paul|Engineering| 29000| 23| | ||
| |Helen| Marketing| 29000| 40| | ||
| | Tom|Engineering| 23000| 33| | ||
| | Jane| Marketing| 29000| 28| | ||
| | Jeff| Marketing| 35000| 38| | ||
| | Evan| Sales| 32000| 38| | ||
| | Lisa| Sales| 10000| 35| | ||
| | Alex| Sales| 30000| 33| | ||
| +-----+-----------+------+-----+ | ||
|
|
||
| SELECT name, dept, RANK() OVER (PARTITION BY dept ORDER BY salary) AS rank FROM employees; | ||
| +-----+-----------+------+----+ | ||
| | name| dept|salary|rank| | ||
| +-----+-----------+------+----+ | ||
| | Lisa| Sales| 10000| 1| | ||
| | Alex| Sales| 30000| 2| | ||
| | Evan| Sales| 32000| 3| | ||
| | Fred|Engineering| 21000| 1| | ||
| | Tom|Engineering| 23000| 2| | ||
| |Chloe|Engineering| 23000| 2| | ||
| | Paul|Engineering| 29000| 4| | ||
| |Helen| Marketing| 29000| 1| | ||
| | Jane| Marketing| 29000| 1| | ||
| | Jeff| Marketing| 35000| 3| | ||
| +-----+-----------+------+----+ | ||
|
|
||
| SELECT name, dept, DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary ROWS BETWEEN | ||
| UNBOUNDED PRECEDING AND CURRENT ROW) AS dense_rank FROM employees; | ||
| +-----+-----------+------+----------+ | ||
| | name| dept|salary|dense_rank| | ||
| +-----+-----------+------+----------+ | ||
| | Lisa| Sales| 10000| 1| | ||
| | Alex| Sales| 30000| 2| | ||
| | Evan| Sales| 32000| 3| | ||
| | Fred|Engineering| 21000| 1| | ||
| | Tom|Engineering| 23000| 2| | ||
| |Chloe|Engineering| 23000| 2| | ||
| | Paul|Engineering| 29000| 3| | ||
| |Helen| Marketing| 29000| 1| | ||
| | Jane| Marketing| 29000| 1| | ||
| | Jeff| Marketing| 35000| 2| | ||
| +-----+-----------+------+----------+ | ||
|
|
||
| SELECT name, dept, age, CUME_DIST() OVER (PARTITION BY dept ORDER BY age | ||
| RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as cume_dist FROM employees; | ||
|
||
| +-----+-----------+------+------------------+ | ||
| | name| dept|age | cume_dist| | ||
| +-----+-----------+------+------------------+ | ||
| | Alex| Sales| 33|0.3333333333333333| | ||
| | Lisa| Sales| 35|0.6666666666666666| | ||
| | Evan| Sales| 38| 1.0| | ||
| | Paul|Engineering| 23| 0.25| | ||
| |Chloe|Engineering| 25| 0.75| | ||
| | Fred|Engineering| 28| 0.25| | ||
| | Tom|Engineering| 33| 1.0| | ||
| | Jane| Marketing| 28|0.3333333333333333| | ||
| | Jeff| Marketing| 38|0.6666666666666666| | ||
| |Helen| Marketing| 40| 1.0| | ||
| +-----+-----------+------+------------------+ | ||
|
|
||
| SELECT name, dept, salary, MIN(salary) OVER (PARTITION BY dept ORDER BY salary) AS min | ||
| FROM employees; | ||
| +-----+-----------+------+-----+ | ||
| | name| dept|salary| min| | ||
| +-----+-----------+------+-----+ | ||
| | Lisa| Sales| 10000|10000| | ||
| | Alex| Sales| 30000|10000| | ||
| | Evan| Sales| 32000|10000| | ||
| |Helen| Marketing| 29000|29000| | ||
| | Jane| Marketing| 29000|29000| | ||
| | Jeff| Marketing| 35000|29000| | ||
| | Fred|Engineering| 21000|21000| | ||
| | Tom|Engineering| 23000|21000| | ||
| |Chloe|Engineering| 23000|21000| | ||
| | Paul|Engineering| 29000|21000| | ||
| +-----+-----------+------+-----+ | ||
|
|
||
| SELECT name, salary, | ||
| LAG(salary) OVER (PARTITION BY dept ORDER BY salary) as lag, | ||
|
||
| LEAD(salary, 1, 0) OVER (PARTITION BY dept ORDER BY salary) as lead | ||
| FROM employees; | ||
| +-----+-----------+------+-----+-----+ | ||
| | name| dept|salary| lag| lead| | ||
| +-----+-----------+------+-----+-----+ | ||
| | Lisa| Sales| 10000|NULL |30000| | ||
| | Alex| Sales| 30000|10000|32000| | ||
| | Evan| Sales| 32000|30000| 0| | ||
| | Fred|Engineering| 21000| NULL|23000| | ||
| |Chloe|Engineering| 23000|21000|23000| | ||
| | Tom|Engineering| 23000|23000|29000| | ||
| | Paul|Engineering| 29000|23000| 0| | ||
| |Helen| Marketing| 29000| NULL|29000| | ||
| | Jane| Marketing| 29000|29000|35000| | ||
| | Jeff| Marketing| 35000|29000| 0| | ||
| +-----+-----------+------+-----+-----+ | ||
| {% endhighlight %} | ||
|
|
||
| ### Related Statements | ||
|
|
||
| * [SELECT](sql-ref-syntax-qry-select.html) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this look better now? @maropu @viirya
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also cc @srowen
Please feel free to rephrase. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
computing a cumulative -> computing a cumulative sum (or anything similar: average, statistic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it looks better. How about putting the last statement in a new line?;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need this list here? The
Syntaxsection has the same list.