Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: misidentify column name as lateral alias (#539) #540

Merged
merged 12 commits into from
Jan 29, 2024

Conversation

maoxingda
Copy link
Contributor

@maoxingda maoxingda commented Jan 9, 2024

fix #539

@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch 2 times, most recently from d3e987d to 047fc3d Compare January 9, 2024 02:06
@maoxingda maoxingda changed the title fix: misidentify-column-name-as-alias (#537) fix: misidentify-column-name-as-alias (#539) Jan 9, 2024
Copy link

codecov bot commented Jan 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (a93b894) 99.50% compared to head (df79cf0) 99.50%.

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #540   +/-   ##
=======================================
  Coverage   99.50%   99.50%           
=======================================
  Files          41       41           
  Lines        2208     2223   +15     
=======================================
+ Hits         2197     2212   +15     
  Misses         11       11           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch 2 times, most recently from 5be0927 to 2219a4f Compare January 9, 2024 10:06
@maoxingda
Copy link
Contributor Author

    insert into public.tgt_tbl1
    (
        id,
        id_original
    )
    select
        a || b || c || id as id,
        id as id_original        -- # noqa: E501 TODO: I need the metadata information for the table public.src_tbl1 to identify whether the column reference 'id' in this context is from the table public.src_tbl1 or from an alias reference, currently being used as an alias reference. Note: This decision may significantly deviate from the actual scenario.
    from
        public.src_tbl1

I need the help of a pro.😜😜😜

@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch from 2219a4f to 07c9a66 Compare January 9, 2024 14:49
@reata
Copy link
Owner

reata commented Jan 9, 2024

See my comment on #539 or email. Maybe we should take a step back and rethink on the approach.

@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch 2 times, most recently from 2adf762 to 67eaa50 Compare January 10, 2024 02:35
@maoxingda
Copy link
Contributor Author

maoxingda commented Jan 10, 2024

select
    a || b || c as c,
    c as d            -- Metadata for a subquery is needed in this context to confirm whether the reference to 'c' is from the subquery or an alias reference.
from
    (
        select
            1 as a,
            2 as b,
            3 as c
    )

The current parsing method has no knowledge of the subquery in this context. Because at this point, the subquery has not yet begun to be parsed.

@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch 2 times, most recently from 15f9569 to ba0f630 Compare January 13, 2024 01:09
@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch 3 times, most recently from 0364355 to 6ff5ce3 Compare January 13, 2024 13:54
@maoxingda maoxingda changed the title fix: misidentify-column-name-as-alias (#539) fix: misidentify-column-name-as-lateral-alias (#539) Jan 13, 2024
@maoxingda maoxingda changed the title fix: misidentify-column-name-as-lateral-alias (#539) fix: misidentify column name as lateral alias (#539) Jan 13, 2024
@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch 2 times, most recently from 732a466 to 39240e5 Compare January 13, 2024 14:31
Copy link
Owner

@reata reata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given lateral column alias reference is not universally supported, let's not make it default behavior.

@maoxingda maoxingda force-pushed the fix/misidentify-column-name-as-alias branch 5 times, most recently from c007801 to 1b8ca6d Compare January 14, 2024 11:21
@maoxingda maoxingda mentioned this pull request Jan 14, 2024
@maoxingda
Copy link
Contributor Author

After #552 merge, I will refactor the configuration of LATERAL_COLUMN_ALIAS_REFERENCE

@maoxingda maoxingda requested a review from reata January 17, 2024 02:18
@maoxingda
Copy link
Contributor Author

already refactor done, please re-review, thanks. ✅

@maoxingda
Copy link
Contributor Author

https://www.databricks.com/blog/introducing-support-lateral-column-alias

I don’t know much about databricks but it seems that it also supports lateral column alias (LCA) reference.

@maoxingda
Copy link
Contributor Author

https://sqlkover.com/cool-stuff-in-snowflake-part-4-aliasing-all-the-things/

It seems that snowflake also supports it, but unfortunately I don’t have the environment to verify it.

@maoxingda
Copy link
Contributor Author

https://forums.oracle.com/ords/apexds/post/how-to-reference-column-alias-names-in-other-columns-of-a-q-7737

It seems that snowflake also supports it, but unfortunately I don’t find the official document.

@reata
Copy link
Owner

reata commented Jan 17, 2024

Popularity is one of the considerations. The major reason I'd like to move this feature as configurable is that it won't function without metadata. And the assumption is that by default sqllineage only does static code analysis and metadata is not present.

@maoxingda
Copy link
Contributor Author

Agree. This feature is added because it is very pleasant to use.😜😜😜

Copy link
Owner

@reata reata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall code structure now looks good. I might still want to tweak a thing or two and more importantly add some documents on how to use this config. But you don't need to worry about this PR any more.

Target getting this merged by end of this week. I'll handle it if there's any conflict from master with other PR merged.

@maoxingda
Copy link
Contributor Author

👌

@reata reata force-pushed the fix/misidentify-column-name-as-alias branch from e118b50 to f132fc9 Compare January 21, 2024 09:28
@reata
Copy link
Owner

reata commented Jan 21, 2024

Postpone merging to next week.

Right now to_source_columns does part of the column to table/subquery resolution work. It would be better that we can handle this universally in _build_digraph method of class SQLLineageHolder.

I'm investigating whether we can move all logic to end_of_query_cleanup and not modify to_source_columns.

@reata reata force-pushed the fix/misidentify-column-name-as-alias branch from 26a82fb to 452e5d1 Compare January 27, 2024 14:15
@reata reata merged commit d0ca3dc into reata:master Jan 29, 2024
17 checks passed
delphisharp pushed a commit to delphisharp/sqllineage that referenced this pull request Feb 26, 2024
* fix: misidentify-column-name-as-alias (reata#539)

* add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig

* adjust import order

* add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery

* unknown

* refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type

* refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE

* refactor: rebase master and resolve conflict

* refactor: move logic from to_source_columns to end_of_query_cleanup

* refactor: rebase master and fix black format

* docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide

* docs: starting version for each config

---------

Co-authored-by: reata <[email protected]>
reata added a commit to delphisharp/sqllineage that referenced this pull request Mar 10, 2024
* fix: misidentify-column-name-as-alias (reata#539)

* add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig

* adjust import order

* add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery

* unknown

* refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type

* refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE

* refactor: rebase master and resolve conflict

* refactor: move logic from to_source_columns to end_of_query_cleanup

* refactor: rebase master and fix black format

* docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide

* docs: starting version for each config

---------

Co-authored-by: reata <[email protected]>
reata added a commit that referenced this pull request Apr 7, 2024
* fix: Set param from config #545
redo rebase and clean submit

* fix: misidentify column name as lateral alias (#540)

* fix: misidentify-column-name-as-alias (#539)

* add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig

* adjust import order

* add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery

* unknown

* refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type

* refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE

* refactor: rebase master and resolve conflict

* refactor: move logic from to_source_columns to end_of_query_cleanup

* refactor: rebase master and fix black format

* docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide

* docs: starting version for each config

---------

Co-authored-by: reata <[email protected]>

* feat:SQLLineageConfig supports set value and thread safety

* fix: Fix mypy error

* fix: Fix pytest cov

* fix: Fix the scenario of direct assignment without using with. Add the test of multi-process scenario.

* fix: add  SQLLineageConfigLoader set function

* feat: disable setattr for SQLLineageConfig

* feat: make SQLLineageConfig context manager non-reentrant

* feat: disable set unknown config

* feat: access config in parallel

* chore: disable A005 for module name builtin conflict

* refactor: classmethod to staticmethod

---------

Co-authored-by: liuzhou <[email protected]>
Co-authored-by: maoxd <[email protected]>
Co-authored-by: reata <[email protected]>
delphisharp pushed a commit to delphisharp/sqllineage that referenced this pull request Apr 15, 2024
* fix: misidentify-column-name-as-alias (reata#539)

* add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig

* adjust import order

* add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery

* unknown

* refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type

* refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE

* refactor: rebase master and resolve conflict

* refactor: move logic from to_source_columns to end_of_query_cleanup

* refactor: rebase master and fix black format

* docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide

* docs: starting version for each config

---------

Co-authored-by: reata <[email protected]>
delphisharp added a commit to delphisharp/sqllineage that referenced this pull request Apr 15, 2024
* fix: Set param from config reata#545
redo rebase and clean submit

* fix: misidentify column name as lateral alias (reata#540)

* fix: misidentify-column-name-as-alias (reata#539)

* add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig

* adjust import order

* add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery

* unknown

* refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type

* refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE

* refactor: rebase master and resolve conflict

* refactor: move logic from to_source_columns to end_of_query_cleanup

* refactor: rebase master and fix black format

* docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide

* docs: starting version for each config

---------

Co-authored-by: reata <[email protected]>

* feat:SQLLineageConfig supports set value and thread safety

* fix: Fix mypy error

* fix: Fix pytest cov

* fix: Fix the scenario of direct assignment without using with. Add the test of multi-process scenario.

* fix: add  SQLLineageConfigLoader set function

* feat: disable setattr for SQLLineageConfig

* feat: make SQLLineageConfig context manager non-reentrant

* feat: disable set unknown config

* feat: access config in parallel

* chore: disable A005 for module name builtin conflict

* refactor: classmethod to staticmethod

---------

Co-authored-by: liuzhou <[email protected]>
Co-authored-by: maoxd <[email protected]>
Co-authored-by: reata <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Lateral Column Alias Reference Configurable
2 participants