Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Set param from config #553

Merged
merged 14 commits into from
Apr 7, 2024
Merged

Conversation

delphisharp
Copy link
Contributor

@delphisharp delphisharp commented Jan 15, 2024

Dear ALL:

目前模型的默认模式名是通过环境变量传递的,我认为实现方法不够python。
所以,我实现了另一种解决方法。

修改 sqllineage/config.py 实现配置项的读取和写入, 读取方法保持和已有方法兼容。
修改 sqllineage/core/models.py 的Schema 类,在每次初始化的时候动态读取配置
修改 sqllineage/runner.py的init方法,传入变量,并在 SQLLineageConfig 赋值
请大家提出意见,谢谢
Currently the default schema name of the model is passed through environment variables, and I think the implementation method is not python enough.
So, I implemented another workaround.

Modify sqllineage/config.py to implement reading and writing of configuration items. The reading method remains compatible with the existing method.
Modify the Schema class of sqllineage/core/models.py and dynamically read the configuration every time it is initialized.
Modify the init method of sqllineage/runner.py, pass in variables, and assign values in SQLLineageConfig
Please give your opinions, thank you

重新建立分支并提交
Re-branch and submit

first step to solve #536

Copy link

codecov bot commented Jan 15, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.51%. Comparing base (c23d38d) to head (33bf65e).
Report is 3 commits behind head on master.

❗ Current head 33bf65e differs from pull request most recent head 687f7da. Consider uploading reports for the commit 687f7da to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #553   +/-   ##
=======================================
  Coverage   99.50%   99.51%           
=======================================
  Files          41       41           
  Lines        2219     2251   +32     
=======================================
+ Hits         2208     2240   +32     
  Misses         11       11           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reata
Copy link
Owner

reata commented Jan 15, 2024

Do we have all the codes in this new branch? I remember we need to change the Schema model in #545

@delphisharp
Copy link
Contributor Author

Do we have all the codes in this new branch? I remember we need to change the Schema model in #545

Sorry, there were some problems when my local branch was synchronized from upstream changes of the master.
So a new branch was reopened.

@reata
Copy link
Owner

reata commented Jan 16, 2024

What's the difference between this and #555 ? Do they jointly solve #536 and is there any dependency, like I need to merge this PR first and then #555?

Copy link
Owner

@reata reata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change config at runtime is a dangerous thing as we discussed previously under multi-thread context. Although we normally encourage user to use multi-process for bulk process, but thread-safety is still something we need to guarantee given SQLLineageConfig is a global variable.

With that in mind, I'd like to propose this API leveraging Python context manager:

with SQLLineageConfig.set(DEFAULT_SCHEMA="ods", TSQL_NO_SEMICOLON=True):
    runner = LineageRunner(sql)
    print(runner.source_tables)

Once entering the with block, those config are temporarily set to another value for current thread and are only readable to current thread. And the temporary value is cleaned up after exit with block.

At the heart of the implementation, we should have a data structure like:

{
    "thread_1": {"DEFAULT_SCHEMA": "ods", "TSQL_NO_SEMICOLON": True},
    "thread_2": {"DEFAULT_SCHEMA": "default"}
}

which takes highest priority > environment variable > default value

sqllineage/config.py Outdated Show resolved Hide resolved
@reata reata changed the title fix: Set param from config #545 fix: Set param from config Jan 18, 2024
@reata reata mentioned this pull request Jan 18, 2024
liuzhou and others added 6 commits March 10, 2024 22:17
redo rebase and clean submit
* fix: misidentify-column-name-as-alias (reata#539)

* add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig

* adjust import order

* add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery

* unknown

* refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type

* refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE

* refactor: rebase master and resolve conflict

* refactor: move logic from to_source_columns to end_of_query_cleanup

* refactor: rebase master and fix black format

* docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide

* docs: starting version for each config

---------

Co-authored-by: reata <[email protected]>
@reata reata force-pushed the set_param_from_config branch from a21e4e9 to b0260fc Compare March 10, 2024 14:20
Copy link
Owner

@reata reata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please stick to our design that user need to explicitly call set method instead of __setattr__:

with SQLLineageConfig.set(DEFAULT_SCHEMA="ods", TSQL_NO_SEMICOLON=True):
    runner = LineageRunner(sql)
    print(runner.source_tables)

Copy link
Owner

@reata reata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@reata reata merged commit f1972e0 into reata:master Apr 7, 2024
15 checks passed
delphisharp added a commit to delphisharp/sqllineage that referenced this pull request Apr 15, 2024
* fix: Set param from config reata#545
redo rebase and clean submit

* fix: misidentify column name as lateral alias (reata#540)

* fix: misidentify-column-name-as-alias (reata#539)

* add LATERAL_COLUMN_ALIAS_REFERENCE in SQLLineageConfig

* adjust import order

* add test_column_top_level_enable_lateral_ref_with_metadata_from_nested_subquery

* unknown

* refactor: rebase master and convert LATERAL_COLUMN_ALIAS_REFERENCE to bool type

* refactor: use as few condition as possible: SQLLineageConfig.LATERAL_COLUMN_ALIAS_REFERENCE

* refactor: rebase master and resolve conflict

* refactor: move logic from to_source_columns to end_of_query_cleanup

* refactor: rebase master and fix black format

* docs: LATERAL_COLUMN_ALIAS_REFERENCE how-to guide

* docs: starting version for each config

---------

Co-authored-by: reata <[email protected]>

* feat:SQLLineageConfig supports set value and thread safety

* fix: Fix mypy error

* fix: Fix pytest cov

* fix: Fix the scenario of direct assignment without using with. Add the test of multi-process scenario.

* fix: add  SQLLineageConfigLoader set function

* feat: disable setattr for SQLLineageConfig

* feat: make SQLLineageConfig context manager non-reentrant

* feat: disable set unknown config

* feat: access config in parallel

* chore: disable A005 for module name builtin conflict

* refactor: classmethod to staticmethod

---------

Co-authored-by: liuzhou <[email protected]>
Co-authored-by: maoxd <[email protected]>
Co-authored-by: reata <[email protected]>
@delphisharp delphisharp deleted the set_param_from_config branch April 15, 2024 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants