Skip to content
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
200 changes: 200 additions & 0 deletions docs/source/user-guide/sql/aggregate_functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -618,6 +618,29 @@ regr_avgx(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Exmaple

consider the following table:

```sql
> create table daily_sales(day int, temperature int) as values (1,35), (2,36), (3, NULL), (4,37), (5,38);
> select * from daily_sales;
+-----+-------------+
| day | total_sales |
| --- | ----------- |
| 1 | 100 |
| 2 | 150 |
| 3 | 200 |
| 4 | NULL |
| 5 | 250 |
+-----+-------------+
```

```sql
SELECT regr_avgx(total_sales, day) AS avg_day --considering day(x) independent variable
Comment thread
Adez017 marked this conversation as resolved.
Outdated
FROM daily_sales; --output = (1+2+3+5)/4 = 2.75
```

### `regr_avgy`

Computes the average of the dependent variable (output) expression_y for the non-null paired data points.
Expand All @@ -631,6 +654,27 @@ regr_avgy(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
> create table daily_temperature(day int, temperature int) as values (1,30), (2,32), (3, NULL), (4,35), (5,36);
> select * from daily_temperature;
+-----+-------------+
| day | temperature |
| --- | ----------- |
| 1 | 30 |
| 2 | 32 |
| 3 | NULL |
| 4 | 35 |
| 5 | 36 |
+-----+-------------+
```

```sql
SELECT regr_avgy(temperature, day) AS avg_temperature --temperature as Dependent Variable(Y)
FROM daily_temperature; --output = 33.25
Comment thread
Adez017 marked this conversation as resolved.
Outdated
```

### `regr_count`

Counts the number of non-null paired data points.
Expand All @@ -644,6 +688,30 @@ regr_count(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table:

```sql
> create table daily_metrics(int day, int user_signups) as values (1,100), (2,120), (3, NULL), (4,110), (5,NULL);
Comment thread
Adez017 marked this conversation as resolved.
Outdated
> select * from daily_metrics;
+-----+---------------+
| day | user_signups |
| --- | ------------- |
| 1 | 100 |
| 2 | 120 |
| 3 | NULL |
| 4 | 110 |
| 5 | NULL |
+-----+---------------+
```

```sql
SELECT regr_count(user_signups, day) AS valid_pairs
FROM daily_metrics; -- output = 3 pairs i.e (1,100),(2,120),(4,110)

Comment thread
Adez017 marked this conversation as resolved.
Outdated
```

### `regr_intercept`

Computes the y-intercept of the linear regression line. For the equation (y = kx + b), this function returns b.
Expand All @@ -657,6 +725,30 @@ regr_intercept(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table:

```sql
>create table weekly_performances(int day, int user_signups) as values (1,60), (2,65), (3, 70), (4,75), (5,80);
> select * from weekly_performances;
+------+---------------------+
| week | productivity_score |
Comment thread
Adez017 marked this conversation as resolved.
| ---- | ------------------- |
| 1 | 60 |
| 2 | 65 |
| 3 | 70 |
| 4 | 75 |
| 5 | 80 |
+------+---------------------+
```

```sql
SELECT regr_intercept(productivity_score, week) AS intercept -- week(x),productivity_score(y)
FROM weekly_performance; --k = 5
-- y = kx+b ->60 = 5*1+b -->b = 55
```

### `regr_r2`

Computes the square of the correlation coefficient between the independent and dependent variables.
Expand All @@ -670,6 +762,29 @@ regr_r2(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table:

```sql
>create table weekly_performances(int day, int user_signups) as values (1,60), (2,65), (3, 70), (4,75), (5,80);
> select * from weekly_performances;
+------+---------------------+
| week | productivity_score |
| ---- | ------------------- |
| 1 | 60 |
| 2 | 65 |
| 3 | 70 |
| 4 | 75 |
| 5 | 80 |
+------+---------------------+
```

```sql
SELECT regr_r2(productivity_score, week) AS r_squared
FROM weekly_performance; -- Output - 1.0 as data is perfect linear
```

### `regr_slope`

Returns the slope of the linear regression line for non-null pairs in aggregate columns. Given input column Y and X: regr_slope(Y, X) returns the slope (k in Y = k\*X + b) using minimal RSS fitting.
Expand All @@ -683,6 +798,30 @@ regr_slope(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
>create table weekly_performances(int day, int user_signups) as values (1,60), (2,65), (3, 70), (4,75), (5,80);
> select * from weekly_performances;
+------+---------------------+
| week | productivity_score |
| ---- | ------------------- |
| 1 | 60 |
| 2 | 65 |
| 3 | 70 |
| 4 | 75 |
| 5 | 80 |
+------+---------------------+
```

```sql
--in simpler words slope = Δx/Δy
SELECT regr_slope(productivity_score, week) AS slope
FROM weekly_performance;
```

**Remember**: the slops tells _how much y changes when x increases by 1._

### `regr_sxx`

Computes the sum of squares of the independent variable.
Expand All @@ -696,6 +835,29 @@ regr_sxx(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

consider the following table `study_hours`:

```sql
> crate table study_hours(int student_id, int hours,int test_score) as values(1,2,55),(2,4,65) , (3,6,75),(4,8,85),(5,10,95);
>select * from study_hours;
+-------------+-----------+-----------------+
| student_id | hours (x) | test_score (y) |
| ----------- | --------- | --------------- |
| 1 | 2 | 55 |
| 2 | 4 | 65 |
| 3 | 6 | 75 |
| 4 | 8 | 85 |
| 5 | 10 | 95 |
+-------------+-----------+-----------------+
```

```sql
SELECT regr_sxx(test_score, hours) AS sxx
FROM study_hours; --Output - 40
```

### `regr_sxy`

Computes the sum of products of paired data points.
Expand All @@ -709,6 +871,25 @@ regr_sxy(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
> create table employee_productivity(int week,int productivity_score) as values(1,60) , (2,65), (3,70);
>select * from employee_productivity;
+-------+---------------------+
| week | Procutivity_score |
+-------+---------------------+
| 1 | 65 |
| 2 | 70 |
| 3 | 75 |
+-------+---------------------+
```

```sql
SELECT regr_sxy(productivity_score, week) AS sum_product_deviations
FROM employee_productivity;
```

### `regr_syy`

Computes the sum of squares of the dependent variable.
Expand All @@ -722,6 +903,25 @@ regr_syy(expression_y, expression_x)
- **expression_y**: Dependent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.
- **expression_x**: Independent variable expression to operate on. Can be a constant, column, or function, and any combination of operators.

### Example

```sql
> create table employee_productivity(int week,int productivity_score) as values(1,60) , (2,65), (3,70);
>select * from employee_productivity;
+-------+---------------------+
| week | Procutivity_score |
+-------+---------------------+
| 1 | 65 |
| 2 | 70 |
| 3 | 75 |
+-------+---------------------+
```

```sql
SELECT regr_syy(productivity_score, week) AS sum_squares_y
FROM employee_productivity;
```

### `stddev`

Returns the standard deviation of a set of numbers.
Expand Down