Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable support of sqlfluff context #548

Closed
KulykDmytro opened this issue Jan 11, 2024 · 3 comments · Fixed by #582
Closed

Enable support of sqlfluff context #548

KulykDmytro opened this issue Jan 11, 2024 · 3 comments · Fixed by #582
Labels
enhancement New feature or request

Comments

@KulykDmytro
Copy link

KulykDmytro commented Jan 11, 2024

Is your feature request related to a problem? Please describe

As soon as dialect-based parsing use sqlfluff to parse SQL it will be great to support sqlfluff context (from .sqlfluff files) where we can use/set templater and other options

Describe the solution you'd like

sqlfluff.core.linter.Linter have optional config: FluffConfig parameter. Need to pass it or ensure that default is used

Additional context

dmytro.kulyk@MAC-570022 datalake % sqllineage -f transform/airflow/dags/queries/aws_transforms/aggregate/pokermatch_payment.partitions_range.sql
Traceback (most recent call last):
  File "/opt/homebrew/bin/sqllineage", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqllineage/cli.py", line 125, in main
    runner.print_table_lineage()
  File "/opt/homebrew/lib/python3.11/site-packages/sqllineage/runner.py", line 175, in print_table_lineage
    print(str(self))
          ^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqllineage/runner.py", line 26, in wrapper
    self._eval()
  File "/opt/homebrew/lib/python3.11/site-packages/sqllineage/runner.py", line 195, in _eval
    stmt_holder = analyzer.analyze(stmt, session.metadata_provider)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/lib/python3.11/site-packages/sqllineage/core/parser/sqlfluff/analyzer.py", line 48, in analyze
    raise UnsupportedStatementException(
sqllineage.exceptions.UnsupportedStatementException: SQLLineage cannot parse SQL:{%- set task_id = 'ph2_transaction.data_processing' %}
{%- set transaction_date = ti.xcom_pull(task_ids=task_id, key='new_partitions') %}
{% if transaction_date|length > 0 %}
    select
        row_num,
        min(as_of_date) as start_date,
        max(as_of_date) as end_date,
        array_agg(as_of_date) as dates
    from (
        select
...

running sqlfluff directly:

dmytro.kulyk@MAC-570022 datalake % sqlfluff parse transform/airflow/dags/queries/aws_transforms/aggregate/pokermatch_payment.partitions_range.sql --verbose | more
==== sqlfluff ====
sqlfluff:                2.3.5 python:                 3.11.6
implementation:        cpython verbosity:                   1
dialect:                athena templater:               jinja
rules:                                all
=== [ path: transform/airflow/dags/queries/aws_transforms/aggregate/pokermatch_payment.partitions_range.sql ] ===

[L:  1, P:  1]      |file:
[L:  1, P:  1]      |    [META] placeholder:                                       [Type: 'templated', Raw: "{%- set task_id = 'ph2_transaction.data_processing' %}"]
[L:  1, P: 55]      |    [META] placeholder:                                       [Type: 'literal', Raw: '\n']
[L:  2, P:  1]      |    [META] placeholder:                                       [Type: 'templated', Raw: "{%- set transaction_date = ti.xcom_pull(task_ids=task_id, key='new_partitions') %}"]
[L:  2, P: 83]      |    newline:                                                  '\n'
[L:  3, P:  1]      |    [META] placeholder:                                       [Type: 'block_start', Raw: '{% if transaction_date|length > 0 %}', Block: 'd42817']
[L:  3, P: 37]      |    [META] indent:                                            [Block: 'd42817']
[L:  3, P: 37]      |    newline:                                                  '\n'
[L:  4, P:  1]      |    whitespace:                                               '    '
[L:  4, P:  5]      |    statement:
[L:  4, P:  5]      |        select_statement:
[L:  4, P:  5]      |            select_clause:
[L:  4, P:  5]      |                keyword:                                      'select'
[L:  4, P: 11]      |                [META] indent:
[L:  4, P: 11]      |                newline:                                      '\n'
[L:  5, P:  1]      |                whitespace:                                   '        '
[L:  5, P:  9]      |                select_clause_element:
[L:  5, P:  9]      |                    column_reference:
[L:  5, P:  9]      |                        naked_identifier:                     'row_num'
...

.sqlfluff example

[sqlfluff]
templater = jinja
output_line_length = 180
exclude_rules = AL07, AM02, AM03, AM05, CV10, LT08, ST05, ST06, ST07
warnings = RF01
dialect = athena
large_file_skip_char_limit = 0
large_file_skip_byte_limit = 0
max_line_length = 120

[sqlfluff:rules:references.consistent]
single_table_references = consistent

[sqlfluff:rules:references.keywords]
ignore_words =
    account, domain, language, operator, rank, result, source, type, uid, position, format, state, offset, name,
    location, role, user, comment, method, date, quarter, hour, week, weekday, days

[sqlfluff:templater:jinja]
library_path = operation/deploy/lint
apply_dbt_builtins = false
@KulykDmytro KulykDmytro added the enhancement New feature or request label Jan 11, 2024
@KulykDmytro KulykDmytro changed the title Enable support of sqlfluff context Enable support of sqlfluff context Jan 11, 2024
@reata
Copy link
Owner

reata commented Jan 13, 2024

It makes sense to be support analyzing templated SQL so lineage can analyzed based on what's in orchestration system instead of merely on SQL logs.

Quick question: do you require any configuration in sqlfluff other than templating? Because supports all sqlfluff configuration looks like a overkill for me.

@reata reata added the question Further information is requested label Jan 13, 2024
@KulykDmytro
Copy link
Author

we use both sqlfluff and sqllineage (which is using sqlfluff) and looks like templating configuration is ignored

@reata
Copy link
Owner

reata commented Feb 16, 2024

Thanks for your confirmation. I'll see what we can do.

@reata reata removed the question Further information is requested label Feb 16, 2024
@reata reata closed this as completed in #582 Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants