Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNION * Fails to resolve subsequent column lineage in UNION clause #629

Open
skada-coder opened this issue Jun 21, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@skada-coder
Copy link

skada-coder commented Jun 21, 2024

Describe the bug
When parsing a UNION clause using * and the MetadataProvider class it fails to resolve subsequent source columns in the UNION clause.

Below is a simplified example.

SQL
Paste the SQL text here. For example:

CREATE TABLE MAIN.FOOBAR AS (
    SELECT * FROM MAIN.BAR
    UNION
    SELECT * FROM MAIN.BAR2
    UNION
    SELECT * FROM MAIN.BAR3
)

Column Lineage fails to resolve columns from MAIN.BAR2 and MAIN.BAR3 but MAIN.BAR is fine

To Reproduce

from sqllineage.core.metadata.dummy import DummyMetaDataProvider
from sqllineage.runner import LineageRunner
metadata = {
    "main.bar": ["col1", "col2"],
    "main.bar2": ["col1", "col2"],
    "main.bar3": ["col1", "col2"]
}
provider = DummyMetaDataProvider(metadata)
sql1 = """
CREATE TABLE MAIN.FOOBAR AS (
    SELECT * FROM MAIN.BAR
    UNION
    SELECT * FROM MAIN.BAR2
    UNION
    SELECT * FROM MAIN.BAR3
)
"""
LineageRunner(sql1, metadata_provider=provider).print_column_lineage()

actual output is

main.foobar.col1 <- main.bar.col1
main.foobar.col2 <- main.bar.col2

I'd expect it to return this instead

main.foobar.col1 <- main.bar.col1
main.foobar.col2 <- main.bar.col2
main.foobar.col1 <- main.bar2.col1
main.foobar.col2 <- main.bar2.col2
main.foobar.col1 <- main.bar3.col1
main.foobar.col2 <- main.bar3.col2

Python version (available via python --version)

  • 3.11.5

SQLLineage version (available via sqllineage --version):

  • 1.5.3

Note that this issue does not occur in the non * case where the columns are all explicit in the UNION clause so looks like it only impacts * scenario.

We noticed the issue using the snowflake dialect for our own production use cases

@skada-coder skada-coder added the bug Something isn't working label Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant