Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dialect='tsql' should return warning when no semicolons are detected #422

Closed
crossxwill opened this issue Aug 3, 2023 · 3 comments · Fixed by #431
Closed

Dialect='tsql' should return warning when no semicolons are detected #422

crossxwill opened this issue Aug 3, 2023 · 3 comments · Fixed by #431
Labels
enhancement New feature or request

Comments

@crossxwill
Copy link

The issue is related to #384 and #159. It's common (although frustrating) practice to exclude semicolons in TSQL queries. For example:

-- sample_query.sql

SELECT CUST_ID, CUST_NAME, CUST_CITY, CUST_COUNTRY
INTO #TEMP_CUSTOMERS
FROM T_CUSTOMERS
WHERE CUST_COUNTRY = 'USA'

SELECT CUST.*
    , SALES.CUM_SALES
INTO #TEMP_FINAL_RESULTS
FROM #TEMP_CUSTOMERS CUST AS CUST
LEFT JOIN T_SALES AS SALES
ON CUST.CUST_ID = SALES.CUST_ID
WHERE SALES.CUST_ID IS NULL

SELECT *
FROM #TEMP_FINAL_RESULTS

This causes the LineageRunner() to parse the first statement and skip the remaining statements.

from sqllineage.runner import LineageRunner
import csv
import pandas as pd

# read text from sample_query.sql
sql_script = open('sample_query.sql', 'r').read()

# parse sql_script with LineageRunner
parsed_results = LineageRunner(sql_script, dialect="tsql")

# write parsed_results to data frame
df = pd.DataFrame(parsed_results.get_column_lineage())

df.columns = ['Source', 'Target']

print(df)

The output is:

Source Target
<default>.t_customers.cust_city <default>.#temp_customers.cust_city
<default>.t_customers.cust_country <default>.#temp_customers.cust_country
<default>.t_customers.cust_id <default>.#temp_customers.cust_id
<default>.t_customers.cust_name <default>.#temp_customers.cust_name

The proposal is to return a warning message when dialect='tsql':

parsed_results = LineageRunner(sql_script, dialect="tsql")

"""Warning: No semicolons detected. Consider adding a semicolon after each statement."""
@reata
Copy link
Owner

reata commented Aug 6, 2023

Adding a warning is easy. Though I'm not familiar with tsql enough. Does split it to multiple statements make sense?

@reata reata added the question Further information is requested label Aug 6, 2023
@crossxwill
Copy link
Author

crossxwill commented Aug 7, 2023

I suggest adding the warning message. I'm not familiar with how Microsoft parses multiple statements without semicolons.

@reata
Copy link
Owner

reata commented Aug 13, 2023

A SyntaxWarning will be triggered in case like this. We'll handle split statements without semicolon in #384

@reata reata added enhancement New feature or request and removed question Further information is requested labels Aug 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants