Relationships in the data lineage generated by the SQLFlow.
-
fdd, the value of target column is come from source column, such as: t = s + 2
You may check
effectType
to see how the target column is changed.effectType = select
, the source data is from select.effectType = insert
, the source data is from insert.effectType = update
, the source data is from update.effectType = merge_update
, the source data is merge update.effectType = merege_insert
, the source data is from merege insert.effectType = create_view
, the source data is from create view.
-
fddi, the value of the target column is not derived from the source column directly, but it is affected by the source column. However, it's difficult to determine this kind of relation, take this syntax for example: t=fx(s). so the relationship of the source and target column is
fdd
in a function call unless we know clearly thatt
will not include value froms
.In this sample SQL, the relationship between
teur
andkamut
is fddi.select case when a.kamut=1 and b.teur IS null then 'no locks' when a.kamut=1 then b.teur else 'locks' end teur from tbl a left join TT b on (a.key=b.key)
-
fdr, the value of target column affected by the row number of the source table. It always happens when the aggregate function is used. the source column may be appeared in the where,group by clause. This kind of relationship may also appear between the target column and the source table.
-
frd, the row number of target column is affected by the value of source column. The source column usually appears in the where clause.
The meaning of the letter in fdd
, fddi
, fdr
, frd
, f: dataflow, d: data value, r: record set.
The first letter is always f,the second letter represents the source column,the third letter represents the target column, the fourth is reserved.
The value of the target column is derived from the source column directly.
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
vEmp.eName
<- fdd <- "eName"
<- fdd <- scott.emp.empName
The value of the target column is influenced by a source table itself, for example by the number of records. This is caused by the use of aggregate function in the query.
create view vSal as
SELECT a.deptno "Department",
a.num_emp/b.total_count "Employees",
a.sal_sum/b.total_sal "Salary"
FROM
(SELECT deptno, COUNT() num_emp, SUM(SAL) sal_sum
FROM scott.emp
Where city = 'NYC'
GROUP BY deptno) a,
(SELECT COUNT() total_count, SUM(sal) total_sal
FROM scott.emp
Where city = 'NYC') b
vSal.Salary
depends on the record number of table: scott.emp
.
This is due to the use of aggregate function COUNT()
, SUM()
in the query,
vSal.Salary
also depends on the scott.emp.deptno
column which is used in the
group by clause.
The city
column in the where clause also determines the value of vSal.Salary
.
The chain of the dataflow is:
vSal.Salary
<- fdd <- query_resultset.Salary
query_resultset.Salary
<- fdd <- sal_sum
query_resultset.Salary
<- fdd <- total_sal
sal_sum
<- fdd <- scott.emp.SAL
sal_sum
<- fdr <- scott.emp
sal_sum
<- fdr <- scott.emp.city
sal_sum
<- fdr <- scott.emp.deptno
total_sal
<- fdd <- scott.emp.sal
total_sal
<- fdr <- scott.emp
total_sal
<- fdr <- scott.emp.city
Some of the columns in source tables such as WHERE clause do not influence the value of target columns but are crucial for the selected row set, so they are also saved for impact analyses, with relationship to the target columns.
create view vEmp(eName) as
SELECT a.empName "eName"
FROM scott.emp a
Where sal > 1000
The value of vEmp.eName
doesn’t depends on scott.emp.sal
,
but the number of records in the vEmp
depends on the scott.emp.sal
,
so this tool record this kind of relationship as well
vEmp.eName
<- fdd <- query_resultset."eName"
<- fdd <- scott.emp.empName
vEmp.eName
<- fdd <- query_resultset."eName"
<- frd <- scott.emp.sal
The value of the target column depends on the value of the source column, but not come from the source column.
select
case when a.kamut=1 and b.teur IS null
then 'no locks'
when a.kamut=1
then b.teur
else 'locks'
end teur
from tbl a left join TT b on (a.key=b.key)
The value of the select result: teur
depends on the source column tbl.kamut
in the case expression, although it values is not derived from tbl.kamut
directly.
query_result.teur
<- fddi <- tbl.kamut
query_result.teur
<- frd <- tbl.key
query_result.teur
<- frd <- TT.key
During ETL, It's typically to work with the staging tables and later rename them to the prod table names, or even better, swap the prod and staging tables in a single operation. So there is a need to understand this kind of transition.
create table t2 as select c from t1;
alter table t2 rename to t3;
create table t4 as select * from t3;
OR
create table t2 as select c from t1;
alter table t2 swap with t3;
create table t4 as select * from t3;
We also use fdd
to reprenst those 2 relations.
alter table t2 rename to t3;
This relation should be built in the dataflow output.
t2
-> fdd -> t3
<relation id="1" type="fdd" effectType="renameTable">
<target id="5" target_name="t3" coordinate="x,y"/>
<source source_id="1" source_name="t2" coordinate="x,y"/>
</relation>
alter table t2 swap with t3;
This relation should be built in the dataflow output.
t2
-> fdd -> t3
<relation id="1" type="fdd" effectType="swapTable">
<target id="5" target_name="t2" coordinate="x,y"/>
<source source_id="1" source_name="t3" coordinate="x,y"/>
</relation>