Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Column values sort #349

Merged
merged 5 commits into from
Mar 31, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 38 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,40 @@
# dbt-utils v0.7.0 (unreleased)

## :rotating_light: Breaking changes


### get_column_values
The order of (optional) arguments has changed in the `get_column_values` macro:
Before:
```jinja
{% macro get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}
...
{% endmacro %}
```

After:
```jinja
{% macro get_column_values(table, column, max_records=none, default=none) -%}
...
{% endmacro %}
```
If you were relying on the position to match up your optional arguments, this may be a breaking change — in general, we recommend that you explicitly declare any optional arguments (if not all of your arguments!)
```
-- before: the `50` will now be passed through as the `order_by` argument
{% set payment_methods = dbt_utils.get_column_values(
ref('stg_payments'),
'payment_method',
50
) %}

-- after
{% set payment_methods = dbt_utils.get_column_values(
ref('stg_payments'),
'payment_method',
max_records=50
) %}
```

* Added optional `where` clause in `unique_combination_of_columns` test macro [#295](https://github.com/fishtown-analytics/dbt-utils/pull/295) [findinpath](https://github.com/findinpath)

## Features
Expand All @@ -7,8 +43,8 @@
* Allow individual columns in star macro to be aliased (code originally in [#230](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@elliottohara](https://github.com/elliottohara), merged via [#245])
* Allow star macro to be case insensitive, and improve docs (code originally in [#281](https://github.com/fishtown-analytics/dbt-utils/pull/230/) via [@mdimercurio](https://github.com/mdimercurio), merged via [#348](https://github.com/fishtown-analytics/dbt-utils/pull/348/))
* Add new schema test, `not_accepted_values` ([#284](https://github.com/fishtown-analytics/dbt-utils/pull/284) [@JavierMonton](https://github.com/JavierMonton))
* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343])

* Add new schema test, `fewer_rows_than` (code originally in [#221](https://github.com/fishtown-analytics/dbt-utils/pull/230/) from [@dmarts](https://github.com/dmarts), merged via [#343](https://github.com/fishtown-analytics/dbt-utils/pull/343/))
* Add new argument, `order_by`, to `get_column_values` (code originally in [#289](https://github.com/fishtown-analytics/dbt-utils/pull/289/) from [@clausherther](https://github.com/clausherther), merged via [#349](https://github.com/fishtown-analytics/dbt-utils/pull/349/))

## Fixes
* Handle booleans gracefully in the unpivot macro ([#305](https://github.com/fishtown-analytics/dbt-utils/pull/305) [@avishalom](https://github.com/avishalom))
Expand Down
39 changes: 32 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -569,24 +569,49 @@ group by 1
```

#### get_column_values ([source](macros/sql/get_column_values.sql))
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation).
It takes an options `default` argument for compiling when the relation does not already exist.
This macro returns the unique values for a column in a given [relation](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation) as an array.

Arguments:
- `table` (required): a [Relation](https://docs.getdbt.com/reference/dbt-classes#relation) (a `ref` or `source`) that contains the list of columns you wish to select from
- `column` (required): The name of the column you wish to find the column values of
- `order_by` (optional, default=`'count(*) desc'`): How the results should be ordered. The default is to order by `count(*) desc`, i.e. decreasing frequency. Setting this as `'my_column'` will sort alphabetically, while `'min(created_at)'` will sort by when thevalue was first observed.
- `max_records` (optional, default=`none`): The maximum number of column values you want to return
- `default` (optional, default=`[]`): The results this macro should return if the relation has not yet been created (and therefore has no column values).

Usage:
```
-- Returns a list of the top 50 states in the `users` table
{% set states = dbt_utils.get_column_values(table=ref('users'), column='state', max_records=50, default=[]) %}
```sql
-- Returns a list of the payment_methods in the stg_payments model_
{% set payment_methods = dbt_utils.get_column_values(table=ref('stg_payments'), column='payment_method') %}

{% for state in states %}
{% for payment_method in payment_methods %}
...
{% endfor %}

...
```

```sql
-- Returns the list sorted alphabetically
{% set payment_methods = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='payment_method'
) %}
```

#### get_relations_by_pattern ([source](macros/sql/get_relations_by_pattern.sql))
```sql
-- Returns the list sorted my most recently observed
{% set payment_methods = dbt_utils.get_column_values(
table=ref('stg_payments'),
column='payment_method',
order_by='max(created_at) desc',
max_records=50,
default=['bank_transfer', 'coupon', 'credit_card']
%}
...
```

#### get_relations_by_prefix
Returns a list of [Relations](https://docs.getdbt.com/docs/writing-code-in-dbt/class-reference/#relation)
that match a given schema- or table-name pattern.

Expand Down
2 changes: 1 addition & 1 deletion integration_tests/models/sql/test_get_column_values.sql
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@

{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default = []) %}
{% set columns = dbt_utils.get_column_values(ref('data_get_column_values'), 'field', default=[], order_by="field") %}


{% if target.type == 'snowflake' %}
Expand Down
24 changes: 6 additions & 18 deletions macros/sql/get_column_values.sql
Original file line number Diff line number Diff line change
@@ -1,26 +1,13 @@
{#
This macro fetches the unique values for `column` in the table `table`

Arguments:
table: A model `ref`, or a schema.table string for the table to query (Required)
column: The column to query for unique values
max_records: If provided, the maximum number of unique records to return (default: none)

Returns:
A list of distinct values for the specified columns
#}

{% macro get_column_values(table, column, max_records=none, default=none) -%}
{{ return(adapter.dispatch('get_column_values', packages = dbt_utils._get_utils_namespaces())(table, column, max_records, default)) }}
{% macro get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}
{{ return(adapter.dispatch('get_column_values', packages = dbt_utils._get_utils_namespaces())(table, column, order_by, max_records, default)) }}
{% endmacro %}

{% macro default__get_column_values(table, column, max_records=none, default=none) -%}
{% macro default__get_column_values(table, column, order_by='count(*) desc', max_records=none, default=none) -%}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
{{ return('') }}
{% endif %}
{#-- #}

{%- set target_relation = adapter.get_relation(database=table.database,
schema=table.schema,
Expand All @@ -40,12 +27,13 @@ Returns:

{%- else -%}


select
{{ column }} as value

from {{ target_relation }}
group by 1
order by count(*) desc
order by {{ order_by }}

{% if max_records is not none %}
limit {{ max_records }}
Expand Down