-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: misidentify column name as lateral alias (#539) #540
fix: misidentify column name as lateral alias (#539) #540
Conversation
d3e987d
to
047fc3d
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #540 +/- ##
=======================================
Coverage 99.50% 99.50%
=======================================
Files 41 41
Lines 2208 2223 +15
=======================================
+ Hits 2197 2212 +15
Misses 11 11 ☔ View full report in Codecov by Sentry. |
5be0927
to
2219a4f
Compare
insert into public.tgt_tbl1
(
id,
id_original
)
select
a || b || c || id as id,
id as id_original -- # noqa: E501 TODO: I need the metadata information for the table public.src_tbl1 to identify whether the column reference 'id' in this context is from the table public.src_tbl1 or from an alias reference, currently being used as an alias reference. Note: This decision may significantly deviate from the actual scenario.
from
public.src_tbl1 I need the help of a pro.😜😜😜 |
2219a4f
to
07c9a66
Compare
See my comment on #539 or email. Maybe we should take a step back and rethink on the approach. |
2adf762
to
67eaa50
Compare
select
a || b || c as c,
c as d -- Metadata for a subquery is needed in this context to confirm whether the reference to 'c' is from the subquery or an alias reference.
from
(
select
1 as a,
2 as b,
3 as c
) The current parsing method has no knowledge of the subquery in this context. Because at this point, the subquery has not yet begun to be parsed. |
15f9569
to
ba0f630
Compare
Move the parsing of the subquery ahead of the for loop, so that the metadata information of the subquery is available. This way, can we determine whether the column references in the SELECT clause come from the subquery or lateral alias references? |
0364355
to
6ff5ce3
Compare
732a466
to
39240e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given lateral column alias reference is not universally supported, let's not make it default behavior.
c007801
to
1b8ca6d
Compare
After #552 merge, I will refactor the configuration of LATERAL_COLUMN_ALIAS_REFERENCE |
already refactor done, please re-review, thanks. ✅ |
https://www.databricks.com/blog/introducing-support-lateral-column-alias I don’t know much about databricks but it seems that it also supports lateral column alias (LCA) reference. |
https://sqlkover.com/cool-stuff-in-snowflake-part-4-aliasing-all-the-things/ It seems that snowflake also supports it, but unfortunately I don’t have the environment to verify it. |
It seems that snowflake also supports it, but unfortunately I don’t find the official document. |
Popularity is one of the considerations. The major reason I'd like to move this feature as configurable is that it won't function without metadata. And the assumption is that by default sqllineage only does static code analysis and metadata is not present. |
Agree. This feature is added because it is very pleasant to use.😜😜😜 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall code structure now looks good. I might still want to tweak a thing or two and more importantly add some documents on how to use this config. But you don't need to worry about this PR any more.
Target getting this merged by end of this week. I'll handle it if there's any conflict from master with other PR merged.
👌 |
e118b50
to
f132fc9
Compare
Postpone merging to next week. Right now I'm investigating whether we can move all logic to |
…COLUMN_ALIAS_REFERENCE
26a82fb
to
452e5d1
Compare
* fix: misidentify-column-name-as-alias (reata#539) * add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig * adjust import order * add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery * unknown * refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type * refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE * refactor: rebase master and resolve conflict * refactor: move logic from to_source_columns to end_of_query_cleanup * refactor: rebase master and fix black format * docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide * docs: starting version for each config --------- Co-authored-by: reata <[email protected]>
* fix: misidentify-column-name-as-alias (reata#539) * add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig * adjust import order * add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery * unknown * refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type * refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE * refactor: rebase master and resolve conflict * refactor: move logic from to_source_columns to end_of_query_cleanup * refactor: rebase master and fix black format * docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide * docs: starting version for each config --------- Co-authored-by: reata <[email protected]>
* fix: Set param from config #545 redo rebase and clean submit * fix: misidentify column name as lateral alias (#540) * fix: misidentify-column-name-as-alias (#539) * add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig * adjust import order * add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery * unknown * refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type * refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE * refactor: rebase master and resolve conflict * refactor: move logic from to_source_columns to end_of_query_cleanup * refactor: rebase master and fix black format * docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide * docs: starting version for each config --------- Co-authored-by: reata <[email protected]> * feat:SQLLineageConfig supports set value and thread safety * fix: Fix mypy error * fix: Fix pytest cov * fix: Fix the scenario of direct assignment without using with. Add the test of multi-process scenario. * fix: add SQLLineageConfigLoader set function * feat: disable setattr for SQLLineageConfig * feat: make SQLLineageConfig context manager non-reentrant * feat: disable set unknown config * feat: access config in parallel * chore: disable A005 for module name builtin conflict * refactor: classmethod to staticmethod --------- Co-authored-by: liuzhou <[email protected]> Co-authored-by: maoxd <[email protected]> Co-authored-by: reata <[email protected]>
* fix: misidentify-column-name-as-alias (reata#539) * add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig * adjust import order * add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery * unknown * refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type * refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE * refactor: rebase master and resolve conflict * refactor: move logic from to_source_columns to end_of_query_cleanup * refactor: rebase master and fix black format * docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide * docs: starting version for each config --------- Co-authored-by: reata <[email protected]>
* fix: Set param from config reata#545 redo rebase and clean submit * fix: misidentify column name as lateral alias (reata#540) * fix: misidentify-column-name-as-alias (reata#539) * add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig * adjust import order * add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery * unknown * refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type * refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE * refactor: rebase master and resolve conflict * refactor: move logic from to_source_columns to end_of_query_cleanup * refactor: rebase master and fix black format * docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide * docs: starting version for each config --------- Co-authored-by: reata <[email protected]> * feat:SQLLineageConfig supports set value and thread safety * fix: Fix mypy error * fix: Fix pytest cov * fix: Fix the scenario of direct assignment without using with. Add the test of multi-process scenario. * fix: add SQLLineageConfigLoader set function * feat: disable setattr for SQLLineageConfig * feat: make SQLLineageConfig context manager non-reentrant * feat: disable set unknown config * feat: access config in parallel * chore: disable A005 for module name builtin conflict * refactor: classmethod to staticmethod --------- Co-authored-by: liuzhou <[email protected]> Co-authored-by: maoxd <[email protected]> Co-authored-by: reata <[email protected]>
fix #539