You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Column-level lineage will not be 100% accurate because that would require metadata information. However, there’s no unified metadata service for all kinds of SQL systems. For the moment, in column-level lineage, column-to-table resolution is conducted in a best-effort way, meaning we only provide possible table candidates for situation like select * or select col from tab1 join tab2.
Proposed Solution:
To build a metadata provider interface, that returns all the columns given a table name. The implementation can vary, from the naive provider where user store the metadata in a dictionary, to more complex ones that queries metadata service (like query hive metastore via thrift API, execute show tables SQL, query information_schema, etc.)
This way, user can register their metadata for sqllineage to resolve during lineage analysis.
We'll start with the naive solution, which walks us through the most common part. And ultimately try to provide common implementation like HiveMetaStoreMetaDataProvider and SQLAlchemyMetaDataProvider so user just need to feed in things like database url to enjoy the accurate column lineage with metadata assistance.
This will be the major feature for v1.5.x release.
The text was updated successfully, but these errors were encountered:
Quoting sqllineage docs:
Proposed Solution:
To build a metadata provider interface, that returns all the columns given a table name. The implementation can vary, from the naive provider where user store the metadata in a dictionary, to more complex ones that queries metadata service (like query hive metastore via thrift API, execute
show tables
SQL, query information_schema, etc.)This way, user can register their metadata for sqllineage to resolve during lineage analysis.
We'll start with the naive solution, which walks us through the most common part. And ultimately try to provide common implementation like HiveMetaStoreMetaDataProvider and SQLAlchemyMetaDataProvider so user just need to feed in things like database url to enjoy the accurate column lineage with metadata assistance.
This will be the major feature for v1.5.x release.
The text was updated successfully, but these errors were encountered: