Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support split SQL statements without semicolon in tsql #384

Closed
nitinpatwardhan opened this issue Jun 6, 2023 · 9 comments · Fixed by #442
Closed

support split SQL statements without semicolon in tsql #384

nitinpatwardhan opened this issue Jun 6, 2023 · 9 comments · Fixed by #442
Labels
enhancement New feature or request

Comments

@nitinpatwardhan
Copy link

nitinpatwardhan commented Jun 6, 2023

Hello,
I have Q about column lineage -f scratch.txt -l column
Issue 1:
I am trying to parse a large Store Procedure with multiple MS SQL statements and it produces bad results. So I tried to check and init__.py gives error at line # 27 if len(holder.write) > 1:
I believe this happens because generally in SP, the statements are not separated by semicolon ;

Is there a way to detect ; automatically and add at end of each statement?

For example
insert table x1 select b1 from foo
insert table x2 select b2 from bar

fails but when I add semicolon ; at end of first statement, it works. The problem is most do not write ; at end while writing Stored Procs. Is the a way out?

NOTE : For TSQL syntax is a bit diff but same issue
insert into x1 select b1 from foo
insert into x2 select b2 from bar

Issue 2: (This is a suggestion)
TSQL has syntax like
if object_id('tempdb..#tbl') is not null
drop table #tbl

Can you parse this and ignore it for lineage instead of creating error?

Thanks

@nitinpatwardhan nitinpatwardhan changed the title Question Question : Do we need ; at end every statement, how do we parse SQL is a Stored Proc without ; at end of each? Jun 6, 2023
@reata
Copy link
Owner

reata commented Jun 7, 2023

This is somewhat related to #159 . Semicolon is clearly a must in universal SQL world to split statement. But we can come to support stored procedure as a special case.

We have some preliminary support for tsql since v1.4.x, can you try --dialect=tsql to see how it goes? And if you can manage to provide a small and compact test case for stored procedure, that's even better.

In particular, I'd like to understand if your stored procured is defined like in the Microsoft docs:

CREATE PROCEDURE GetCustInfo (@CustomerID INT) AS  
   SELECT * FROM Customers WHERE CustID = @CustomerID  
   SELECT OrderID FROM Orders  
      WHERE CustID = @CustomerID AND Status = 'OPEN'

or just a bunch of statement without CREATE PROCEDURE and without semicolon like

SELECT * FROM Customers WHERE CustID = @CustomerID  
SELECT OrderID FROM Orders WHERE CustID = @CustomerID AND Status = 'OPEN'

Another silly question, you mention MS SQL vs TSQL, is there any significant difference? I usually treat them as the same thing.

@reata reata added the question Further information is requested label Jun 7, 2023
@nitinpatwardhan
Copy link
Author

  1. MS SQL same as TSQL
  2. Detecting end of statement and adding ; automatically is critical if this needs to be used for a large scale application such as finding lineage in whole Stored Proc. So I would say that processing of statements without semicolon is critical.
  3. To run, just put the statements above in your test UI and you will notice that in TSQL, the second statement is ignored.
  4. Create Procedure, If (as as stated above to drop table based on condition), should all be filtered out to process Stored Proc.
  5. After detecting multiple lines in SP (Without ;) they should be put i a list and processed one by one

These are just suggestion that will scale the usability of this nice tool

@reata
Copy link
Owner

reata commented Jun 11, 2023

You mentioned CREATE PROCEDURE should be filtered out. Is it supposed by sqllineage library, or you just can get them without any effort?

It's important because if it's in a CREATE PROCEDURE call then it's run-able statement, we can figure out a way to analyze the AST. Otherwise when sqllineage is fed with a bunch of statement without semicolon as delimiter, I assume the syntax is just not correct and we cannot get a correct AST to begin with.

@reata
Copy link
Owner

reata commented Aug 13, 2023

This syntax is tsql only. We will try to support split SQL statements without semicolon for tsql.

@reata reata added enhancement New feature or request and removed question Further information is requested labels Aug 13, 2023
@reata reata changed the title Question : Do we need ; at end every statement, how do we parse SQL is a Stored Proc without ; at end of each? support split SQL statements without semicolon in tsql Aug 13, 2023
@nitinpatwardhan
Copy link
Author

nitinpatwardhan commented Aug 13, 2023 via email

@reata
Copy link
Owner

reata commented Sep 2, 2023

@nitinpatwardhan Can you help test code in master branch? We provide a config that you can pass via environment variable "SQLLINEAGE_TSQL_NO_SEMICOLON". Note this must be used together with dialect=tsql, otherwsie a user warning will be triggered.

You can test by git clone and run

git clone https://github.com/reata/sqllineage.git
cd sqllineage
SQLLINEAGE_TSQL_NO_SEMICOLON=TRUE python -m sqllineage.cli -f test.sql --dialect=tsql

Or alternatively install from master (which requires NodeJS and NPM in your local environment)

pip install git+https://github.com/reata/sqllineage.git
SQLLINEAGE_TSQL_NO_SEMICOLON=TRUE sqllineage -f test.sql --dialect=tsql

Do let me know if you find any issue.

@nitinpatwardhan
Copy link
Author

nitinpatwardhan commented Sep 2, 2023 via email

@reata
Copy link
Owner

reata commented Sep 2, 2023

It doesn't have to be system environment variables. You can set them when starting sqllineage (like I showed in command). I will not make this default even when we release it in v1.4.8 as this will have side effects for other dialect. SQLs without semicolon in other dialect is supposed to raise a syntax exception by default. Limiting it to configurable with tsql is more reasonable.

This feature works for the following sql you provided:

insert into x1 select b1 from foo
insert into x2 select b2 from bar

I can't promise for CREATE PROCEDURE though. That's where I need your help.

@nitinpatwardhan
Copy link
Author

nitinpatwardhan commented Sep 2, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants