Declarative ETL framework.
Perform ETL processing without loading any data in memory.
- BigQuery
init:
credential-file: /.gcp/credential.json
service: bigquery
in:
query: |
select *
from Datalake.users
where
_partitiontime > '${last_modified}'
vars:
- name: last_modified
database: DWH
table: users
mode: meta
field: last_modified_time
default: '2020-01-01'
out:
project: project
database: DWH
table: users
mode: merge
merge:
order:
- column: modified
desc: True
keys:
- id
docker run -v path-to-credential.json:/.gcp/credential.json -v $PWD:/src davincistd/inbulk:0.1.0 inbulk /src/user.yaml
The query to be executed is displayed.
docker run -v path-to-credential.json:/.gcp/credential.json -v $PWD:/src davincistd/inbulk:0.1.0 inbulk /src/user.yaml --dry-run
Field | Description |
---|---|
in.query | Query |
in.vars | Settings for embedding existing table data and meta information in queries. If you write it in a query like ${name} , it will be expanded at runtime.Can be used for difference execution, etc. |
in.vars[].default | Set the value to be used if the table does not exist. |
out.project | Destination GCP Project |
out.database | Destination dataset name |
out.table | Destination table name |
out.mode | Mode of addition methods. One of (append, replace, merge). |
out.merge | Set only when mode is merge. |
out.merge.order | List of fields to prioritize in case of duplication. |
out.merge.order[].column | Column to prioritize in case of duplication(e.g. modified_at). |
out.merge.order[].desc | Set the order in ascending or descending order so that they are in order of priority. Default is False. |
out.merge.keys | List of non-duplicate fields(e.g. id). |