You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, I'd like to generate template datasets that can be used immediately without the analysis of which fields need to be filled with real data.
Root (snapshot)
---
kind: DatasetSnapshot
version: 1
content:
# A human-friendly alias of the dataset
name: root
# Root datasets are the points of entry of external data into the system
# See: https://docs.kamu.dev/cli/ingest/
kind: Root
# List of metadata events that get dataset into its initial state
# See: https://docs.kamu.dev/odf/reference/#metadataevent
metadata:
# Specifies the source of data that can be periodically polled to refresh the dataset
# See: https://docs.kamu.dev/odf/reference/#setpollingsource
- kind: SetPollingSource
# Where to fetch the data from.
# Includes source URL, a protocol to use, cache control
# See: https://docs.kamu.dev/odf/reference/#fetchstep
fetch:
kind: Url
+ url: https://example.com/city_populations_over_time.zip
# OPTIONAL: How to prepare the binary data
# Includes decompression, file filtering, format conversions
prepare:
- kind: Decompress
format: Zip
# How to interpret the data.
# Includes data format, schema to apply, error handling
# See: https://docs.kamu.dev/odf/reference/#readstep
read:
kind: Csv
header: true
timestampFormat: yyyy-M-d
schema:
- "date TIMESTAMP"
- "city STRING"
- "population STRING"
# OPTIONAL: Pre-processing query that shapes the data.
# Useful for converting text data read from CSVs into strict types
# See: https://docs.kamu.dev/odf/reference/#transform
preprocess:
kind: Sql
# Use one of the supported engines and a query in its dialect
# See: https://docs.kamu.dev/cli/supported-engines/
engine: datafusion
query: |
select
date,
city,
-- remove commas between thousands
cast(replace(population, ",", "") as bigint)
from input
# How to combine data ingested in the past with the new data.
# See: https://docs.kamu.dev/odf/reference/#mergestrategy
merge:
kind: Ledger
primaryKey:
- date
- city
# Lets you manipulate names of the system columns to avoid conflicts
# or use names better suited for your data.
# See: https://docs.kamu.dev/odf/reference/#setvocab
- kind: SetVocab
eventTimeColumn: date
Derivative (snapshot)
---
kind: DatasetSnapshot
version: 1
content:
# A human-friendly alias of the dataset
name: 2
# Derivative datasets produce data by transforming and combining
# one or multiple existing datasets.
# See: https://docs.kamu.dev/cli/transform/
kind: Derivative
# List of metadata events that get dataset into its initial state
# See: https://docs.kamu.dev/odf/reference/#metadataevent
metadata:
# Transformation that will be applied to produce new data
# See: https://docs.kamu.dev/odf/reference/#settransform
- kind: SetTransform
# References the datasets that will be used as inputs.
# Note: We are associating inputs by name, but could also use IDs.
inputs:
- datasetRef: com.example.city-populations
# Transformation steps that ise one of the supported engines and query dialects
# See: https://docs.kamu.dev/cli/supported-engines/
transform:
kind: Sql
engine: datafusion
query: |
select
date,
city,
population + 1 as population
+ from `com.example.city-populations`
# Lets you manipulate names of the system columns to avoid
# conflicts or use names better suited for your data.
# See: https://docs.kamu.dev/odf/reference/#setvocab
- kind: SetVocab
eventTimeColumn: date
The text was updated successfully, but these errors were encountered:
As a user, I'd like to generate template datasets that can be used immediately without the analysis of which fields need to be filled with real data.
Root (snapshot)
--- kind: DatasetSnapshot version: 1 content: # A human-friendly alias of the dataset name: root # Root datasets are the points of entry of external data into the system # See: https://docs.kamu.dev/cli/ingest/ kind: Root # List of metadata events that get dataset into its initial state # See: https://docs.kamu.dev/odf/reference/#metadataevent metadata: # Specifies the source of data that can be periodically polled to refresh the dataset # See: https://docs.kamu.dev/odf/reference/#setpollingsource - kind: SetPollingSource # Where to fetch the data from. # Includes source URL, a protocol to use, cache control # See: https://docs.kamu.dev/odf/reference/#fetchstep fetch: kind: Url + url: https://example.com/city_populations_over_time.zip # OPTIONAL: How to prepare the binary data # Includes decompression, file filtering, format conversions prepare: - kind: Decompress format: Zip # How to interpret the data. # Includes data format, schema to apply, error handling # See: https://docs.kamu.dev/odf/reference/#readstep read: kind: Csv header: true timestampFormat: yyyy-M-d schema: - "date TIMESTAMP" - "city STRING" - "population STRING" # OPTIONAL: Pre-processing query that shapes the data. # Useful for converting text data read from CSVs into strict types # See: https://docs.kamu.dev/odf/reference/#transform preprocess: kind: Sql # Use one of the supported engines and a query in its dialect # See: https://docs.kamu.dev/cli/supported-engines/ engine: datafusion query: | select date, city, -- remove commas between thousands cast(replace(population, ",", "") as bigint) from input # How to combine data ingested in the past with the new data. # See: https://docs.kamu.dev/odf/reference/#mergestrategy merge: kind: Ledger primaryKey: - date - city # Lets you manipulate names of the system columns to avoid conflicts # or use names better suited for your data. # See: https://docs.kamu.dev/odf/reference/#setvocab - kind: SetVocab eventTimeColumn: date
Derivative (snapshot)
--- kind: DatasetSnapshot version: 1 content: # A human-friendly alias of the dataset name: 2 # Derivative datasets produce data by transforming and combining # one or multiple existing datasets. # See: https://docs.kamu.dev/cli/transform/ kind: Derivative # List of metadata events that get dataset into its initial state # See: https://docs.kamu.dev/odf/reference/#metadataevent metadata: # Transformation that will be applied to produce new data # See: https://docs.kamu.dev/odf/reference/#settransform - kind: SetTransform # References the datasets that will be used as inputs. # Note: We are associating inputs by name, but could also use IDs. inputs: - datasetRef: com.example.city-populations # Transformation steps that ise one of the supported engines and query dialects # See: https://docs.kamu.dev/cli/supported-engines/ transform: kind: Sql engine: datafusion query: | select date, city, population + 1 as population + from `com.example.city-populations` # Lets you manipulate names of the system columns to avoid # conflicts or use names better suited for your data. # See: https://docs.kamu.dev/odf/reference/#setvocab - kind: SetVocab eventTimeColumn: date
The text was updated successfully, but these errors were encountered: