Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output a warning when star finds no columns, not '*' #732

Merged
merged 2 commits into from
Nov 29, 2022

Conversation

joellabes
Copy link
Contributor

resolves #681

This is a:

  • documentation update
  • bug fix with no breaking changes
  • new functionality
  • a breaking change

All pull requests from community contributors should target the main branch (default).

Description & motivation

#561 solved a SQLFluff linting error where the star() macro returned an empty string if there were no columns found (either because the model doesn't exist or because all columns were excluded).

The side effect of this is that if someone excludes all columns, they get '*' (every column) back instead, which is the exact opposite of what you'd expect.

This change means that if no columns are found, it returns 'no columns returned from star() macro' as outcome_msg instead.

This is still valid SQL, so SQLFluff won't complain about it, but it better represents the outcome that got us here than a *.

Checklist

  • This code is associated with an Issue which has been triaged and accepted for development.
  • I have verified that these changes work locally on the following warehouses (Note: it's okay if you do not have access to all warehouses, this helps us understand what has been covered)
    • BigQuery
    • Postgres
    • Redshift
    • Snowflake
  • I followed guidelines to ensure that my changes will work on "non-core" adapters by:
    • dispatching any new macro(s) so non-core adapters can also use them (e.g. the star() source)
    • using the limit_zero() macro in place of the literal string: limit 0
    • using dbt.type_* macros instead of explicit datatypes (e.g. dbt.type_timestamp() instead of TIMESTAMP
  • I have updated the README.md (if applicable)
  • I have added tests & descriptions to my models (and macros if applicable)
  • I have added an entry to CHANGELOG.md

@dbeatty10
Copy link
Contributor

dbeatty10 commented Nov 29, 2022

I buy into * being surprising / astonishing when there are zero selected columns. But in that case, I'm wondering (and hoping!) if we can do better than adding an unrequested outcome_msg column.

Example

  • If we query for a list of English words that rhyme with "dangerous", we'd expect an empty list.
  • We'd be astonished if we got a list containing every word in the dictionary. Likewise, we'd be surprised by a list with one item: "no rhyming words found".

Counter-proposal

Considering the four different scenarios between execute and compile vs run, as-is this PR would yield the following:

dbt compile dbt run/build
execute == False * *
execute == True 'no columns returned from star() macro' as outcome_msg 'no columns returned from star() macro' as outcome_msg

What would stop us from implementing the following instead?

dbt compile dbt run/build
execute == False * *
execute == True * empty string¹

We could do something like here to determine the dbt sub-command and handle the execute == True cases like:

    {%- if cols|length <= 0 and flags.WHICH == 'compile' -%}
      {{- return('*') -}}
    {%- elif cols|length <= 0 -%}
      {{- return('') -}}
    {%- else -%}

Notes

¹An alternative to an empty string would be to return an error as described in #681.

#605 was concerned with the combo of execute == True and dbt compile.

#681 is concerned with the combo of execute == True and dbt run/build.

Copy link
Contributor

@dbeatty10 dbeatty10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter-proposal included in the comments.

I'm not a pro with flags.WHICH, execute, or SQLFluff, so I might be misconstruing or misunderstanding how things work in my counter-proposal. But regardless of the details, I would advocate for an implementation different than what we have in this PR.

@joellabes
Copy link
Contributor Author

I'm answering this assuming that the current summary table's rows and columns are the wrong way around (we currently return '*' in all cases when execute is false)

With that said, I really like this idea! Thanks for digging up the history on different invocation commands.

I think checking against flags.WHICH is a goer.

@joellabes
Copy link
Contributor Author

OK in compile mode I output:

  • a star *
  • a comment explaining that the behaviour is different to what happens during dbt run invocations etc

In all other modes, I output:

  • a comment that says no columns were found.

This was actually what I originally wanted to do yesterday (instead of creating a real column in the output), but I realised that SQLFluff was probably smart enough to realise that a comment wasn't columns either 😬 I hadn't thought of using the WHICH flag.

Why a comment instead of a blank string as you suggested? In the past I have been tripped up by star returning nothing and not being able to work out why - this feels like a more useful pointer while still accurately reflecting the query results that will happen.

@joellabes
Copy link
Contributor Author

Even if there are tweaks to make to the final result, I think this directionally reflects what will ultimately ship. In the interests of getting the RC out I'm going to merge this and we can make any changes in a separate PR if needed (our pals at Fivetran are blocked by this release being delayed)

@joellabes joellabes merged commit 9e58405 into utils-v1 Nov 29, 2022
@dbeatty10
Copy link
Contributor

I'm answering this assuming that the current summary table's rows and columns are the wrong way around (we currently return '*' in all cases when execute is false)

Yeah, the table had some values flip-flopped! 😅 For completeness, I'll update the table in this comment to be:

dbt compile dbt run/build
execute == False * *
execute == True 'no columns returned from star() macro' as outcome_msg 'no columns returned from star() macro' as outcome_msg

instead of the incorrect flip-flopped version:

dbt compile dbt run/build
execute == False * 'no columns returned from star() macro' as outcome_msg
execute == True * 'no columns returned from star() macro' as outcome_msg

@dbeatty10
Copy link
Contributor

Feedback on implementation

This strategy you proposed looks great to me:

dbt compile dbt run/build
execute == False * *
execute == True /* a comment */
*
/* another comment */

But in the implementation, I didn't see the * during dbt compile and execute == True that is meant for SQLFluff. Did I overlook it? Is it covered the way you intended?

@dbeatty10
Copy link
Contributor

Of lesser importance, I am now also curious about cross-database compatibility of single line -- and multi-line /* */ comments.

@joellabes
Copy link
Contributor Author

joellabes commented Nov 30, 2022

I didn't see the * during dbt compile and execute == True that is meant for SQLFluff. Did I overlook it?

I think so! It's here:
https://github.com/dbt-labs/dbt-utils/pull/732/files#diff-b009b0ae1dc2f9c557c2313910e67fa8c90a78490910b5780fcf3caed2087819R19

Of lesser importance, I am now also curious about cross-database compatibility of single line -- and multi-line /* */ comments.

I always use /*bounded comments*/ inside of Jinja macros because I can't guarantee what {{- whitespace management -}} end users are going to do, and don't want to accidentally wind up with --last line of the macro first line of the real code. I also had the same thought - if it becomes a problem, we can add dbt.multiline_comment() for the one adapter that woke up and chose violence 😬

@dbeatty10
Copy link
Contributor

Cool -- I see the * now 👁️ ! Admittedly, I didn't expect it before the comment or fully left-justified, but I can guess some reasons why you put it there (namely prioritizing the indentation of the * over the comment during Jinja rendering).

Speaking of comments... agreed with your dbt.multiline_comment() strategy (if needed).

But might not be necessary; the IBM Informix docs claim that:

  • Double hyphen ( -- ) complies with the ANSI/ISO standard for SQL.
  • C-style slash-and-asterisk ( /* . . . */ ) comply with the SQL-99 standard.

joellabes added a commit that referenced this pull request Dec 1, 2022
* add safe_divide documentation

* add safe_divide macro

* add integration test for safe_divide macro

* Merge changes from main into utils v1 (#699)

* Correct link from README to the CONTRIBUTING guide. (#687)

* fix typo (#688)

Co-authored-by: Alex Malins <[email protected]>

* Change `escape_single_quotes` Reference in Pivot Macro (#692)

* Update pivot.sql

* Changelog Updates

Co-authored-by: Liam O'Boyle <[email protected]>
Co-authored-by: Alex Malins <[email protected]>
Co-authored-by: Alex Malins <[email protected]>
Co-authored-by: zachoj10 <[email protected]>

* Use backwards comaptible versions of timestamp macro

* moved macro and documentation to new SQL generator section

* add tests with expressions

* fix syntax errors  (#705)

* fix syntax errors

* remove whitespace in seed file

* Restore dbt. prefix for all migrated cross-db macros (#701)

* added prefix dbt. on cross db macros

* Also prefix for new macro

* Adding changelog change

* Squashed commit of the following:

commit 5eba82b
Author: Deanna Minnick <[email protected]>
Date:   Wed Oct 12 10:30:42 2022 -0400

    remove whitespace in seed file

commit 7a2a5e3
Author: Deanna Minnick <[email protected]>
Date:   Wed Oct 12 10:22:07 2022 -0400

    fix syntax errors

Co-authored-by: Joel Labes <[email protected]>

* Remove obsolete condition argument from expression_is_true (#700)

* Remove obsolete condition argument from expression_is_true

* Improve docs

* Improve docs

* Update star.sql to allow for non-quote wrapped column names (#706)

* Update star.sql

* Update star.sql

* feat: add testing to star macro 

column encased in quotes functionality

* chore: update schema.yml

* Update star.sql

* chore: update star.sql and schema.yml

* chore: update star.sql to trim blank space

* Update README.md

* Update README.md

adds example usage of star macro's quote_identifiers argument

Co-authored-by: crlough <[email protected]>

* Switch to dbt.escape_single_quotes

* Change deprecation resolution advice

* Wrap xdb warnings in if execute block

* Slugify for snowflake (#707)

* Merge main into utils-v1 (#726)

* Feature/safe divide (#697)

* add safe_divide documentation

* add safe_divide macro

* add integration test for safe_divide macro

* moved macro and documentation to new SQL generator section

Co-authored-by: Grace Goheen <[email protected]>

* Revert "Feature/safe divide (#697)" (#702)

This reverts commit f368cec.

* Quick nitpicks (#718)

I was doing some studying on these and spotted some stuff. One verb conjugation and a consistency in macro description

Co-authored-by: deanna-minnick <[email protected]>
Co-authored-by: Grace Goheen <[email protected]>
Co-authored-by: ian-fahey-dbt <[email protected]>

* Feat: add macro get_query_results_as_single_value (#696)

* feat: add query_results_as_single_value.sql macro

* chore: update the macro definition

Current error to work through: "failed to find conversion function from unknown to text"

* chore: update test

* chore: final edits

* chore: remove extra model reference

* chore: update return() to handle BigQuery

* chore: README.md, macro updates

* feat: factoring in first review changes

* chore: updates to testing

* chore: updates tests

* chore: update test for bigquery

* chore: update cast for bigquery

* Use example with a single record in readme

* Add default value when no record found

* test when no results are found

* Rename test file

* Add test definitions

* Fix incorrect ref

* And another one

* Update test_get_query_results_as_single_value.sql

* cast strings as strings

* Put arg in right place

* Update test_get_query_results_as_single_value.sql

* switch to limit zero for BQ

* Update test_get_query_results_as_single_value.sql

* quote column name in arg

* snowflake wont let you safe cast something to itself

* warning to future readers [skip ci]

* Add singular test to check for multi row/multi column setup

* forgot to save comment [skip ci]

* Rename to get_single_value

Co-authored-by: crlough-gitkraken <[email protected]>
Co-authored-by: Joel Labes <[email protected]>

* Remove rc1 requirement for utils v1

* Recency truncate date option (#731)

* WIP changing recency test

* Add tests

* cast to timestamp for bq

* forgot the curlies

* avoid lateral column aliasing

* ts not dt

* cast source as timestamp

* don't cast inside test

* cast as date instead of truncate

* Update recency.sql

* log bq events

* store pg artifacts

* int tests dir

* Correctly store artifacts

* try casting to date or datetime

* order of operations more like order of ooperations

* dt -> ts

* Do I really have to cast this?

* Revert "Do I really have to cast this?"

This reverts commit 21e2c0d.

* Output a warning when star finds no columns, not '*' (#732)

* Change star() behaviour when no columns returned

* Code review: return a * in compile mode

* README changes

* Delete xdb_deprecation_warning.sql

* Update README.md

* Remove from ToC

* Update toc

* Fix surrogate key variable example

Co-authored-by: Deanna Minnick <[email protected]>
Co-authored-by: Liam O'Boyle <[email protected]>
Co-authored-by: Alex Malins <[email protected]>
Co-authored-by: Alex Malins <[email protected]>
Co-authored-by: zachoj10 <[email protected]>
Co-authored-by: Grace Goheen <[email protected]>
Co-authored-by: deanna-minnick <[email protected]>
Co-authored-by: Simon Quvang <[email protected]>
Co-authored-by: miles <[email protected]>
Co-authored-by: Connor <[email protected]>
Co-authored-by: crlough <[email protected]>
Co-authored-by: fivetran-catfritz <[email protected]>
Co-authored-by: ian-fahey-dbt <[email protected]>
Co-authored-by: crlough-gitkraken <[email protected]>
@joellabes joellabes deleted the star-warning-not-star-when-no-columns branch December 6, 2022 00:17
@vperron
Copy link

vperron commented May 3, 2023

Hi ! I stumbled across this issue but it seems that in my case, sqlfluff does not enter in the WHICH = "compile" mode. I took a very deep look at the invocation of the DBT compiler and did not see anything too fancy, but it is possible that this is a regression and SQLFluff does not pass the compile mode anymore. Anyone else has noticed this ?

My config has nothing very fancy:

[tool.sqlfluff.core]
dialect = "postgres"
exclude_rules = "ambiguous.distinct,layout.long_lines"
templater = "dbt"
max_line_length = 119

[tool.sqlfluff.templater.jinja]
apply_dbt_builtins = true
load_macros_from_path = "dbt/packages"

[tool.pytest.ini_options]
addopts = "--ignore=dbt/packages"

and it started breaking immediately after the first inclusion of our star() macro against an empty database.

@vperron
Copy link

vperron commented May 3, 2023

To be clearer, it seems that the SQLFluff mitigation that is wisely decided here is not used by sqlfluff anymore, or there is some configuration switch that I still have to do.

@dbeatty10
Copy link
Contributor

What error message are you seeing @vperron ?

@vperron
Copy link

vperron commented May 3, 2023

Hello !

sqlfluff lint --exclude-rules ambiguous.distinct,layout.long_lines,references.consistent,references.from,references.qualification --disable-progress-bar --nocolor dags dbt tests
=== [dbt templater] Sorting Nodes...                                                                                                                                                                               
16:22:59  Unable to do partial parsing because env vars used in profiles.yml have changed                                                                                                                          
16:22:59  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:                                                                                                                                                                            
- models.dbt_pilotage.meta                                                                                                                                                                                         
=== [dbt templater] Compiling dbt project...                                                                                                                                                                       
=== [dbt templater] Project Compiled.                                                                                                                                                                              
== [dbt/models/marts/suivi_candidatures_prescripteurs_habilites.sql] FAIL                                                                                                                                          
L:   2 | P:  11 |  PRS | Line 2, Position 11: Found unparsable section: '\n                              
                       | /* no columns returned from sta...'                                                                                                                                                       
WARNING: Parsing errors found and dialect is set to 'postgres'. Have you configured your dialect correctly?
All Finished!               

The original source code is:

{{ dbt_utils.star(ref('candidatures_echelle_locale')) }},

The issue is that it is replaced by: (note the comma)

/* no columns returned from star() macro */,

which does not parse because of the comma. It clearly was the intent though, seemingly with this PR, to have:

*
/* No columns were returned. Maybe the relation doesn't exist yet 
or all columns were excluded. This star is only output during  
dbt compile, and exists to keep SQLFluff happy. */

instead, which would indeed have made SQLFluff and myself happy, but it chose the other path (the "non compile" path).

It seems from my debug sessions that the WHICH variable is set to None with my current sqlfluff setup, which seems to indicate either a configuration issue on my end (but it's not very exotic) or a regression somewhere.

@dbeatty10
Copy link
Contributor

Thanks for the write-up @vperron 🤩

Here's my very simple dbt project:

models/my_model.sql

select 1 as id

models/my_model_2.sql

select {{ dbt_utils.star(ref('my_model')) }}, 'me' as my_name

Starting with a completely empty database, here's the output of dbt compile:

select 
*
/* No columns were returned. Maybe the relation doesn't exist yet 
or all columns were excluded. This star is only output during  
dbt compile, and exists to keep SQLFluff happy. */
            , 'me' as my_name

Output of sqlfluff render models/my_model_2.sql:

select /* no columns returned from star() macro */, 'me' as my_name

Proposed next steps

@vperron Could you open a new issue in dbt-utils that references the relevant discussion here? We'll want to coordinate with the maintainers of SQLFluff on that new issue to determine a path forward.

cc: @alanmcruickshank

@tlfbrito
Copy link

tlfbrito commented May 4, 2023

I'm experiencing the same issue

@katieclaiborne
Copy link

@dbeatty10 @vperron, thanks for the investigation!

Has a new issue been created to determine a path forward? I'm happy to create one, but wanted to double-check here first.

@vperron
Copy link

vperron commented May 30, 2023

Hello ! No I am just coming back from 3 weeks of vacation, sorry about this. So yeah, a new issue should be created as @dbeatty10 pointed out. Specifically, it seems that dbt-utils is working just as expected but the link between sqlfluff and dbt seems to not use the compile mode: the WHICH variable is set to None, which is surprising.

I am a little overwhelmed today since I'm just getting back to work, but I'll happily add my input to the issue in a few days if you create one.

vperron added a commit to gip-inclusion/pilotage-airflow that referenced this pull request May 31, 2023
While waiting for dbt_utils.star() to be fixed with
dbt-labs/dbt-utils#732, we will
use this proxy macro to not having to repeat the tedious if/else
in all of our scripts.
@katieclaiborne
Copy link

Sure thing! I've just created #802.

YannickPassa pushed a commit to gip-inclusion/pilotage-airflow that referenced this pull request Jun 14, 2023
While waiting for dbt_utils.star() to be fixed with
dbt-labs/dbt-utils#732, we will
use this proxy macro to not having to repeat the tedious if/else
in all of our scripts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants