Skip to content

Commit

Permalink
Generate CLI doc
Browse files Browse the repository at this point in the history
  • Loading branch information
demusdev committed Dec 19, 2024
1 parent 459a08e commit 26ca651
Showing 1 changed file with 55 additions and 0 deletions.
55 changes: 55 additions & 0 deletions resources/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ To regenerate this schema from existing code, use the following command:
* `completions` — Generate tab-completion scripts for your shell
* `config` — Get or set configuration options
* `delete [rm]` — Delete a dataset
* `export` — Exports a dataset
* `ingest` — Adds data to the root dataset according to its push source configuration
* `init` — Initialize an empty workspace in the current directory
* `inspect` — Group of commands for exploring dataset metadata
Expand Down Expand Up @@ -254,6 +255,38 @@ Delete local datasets matching pattern:



## `kamu export`

Exports a dataset

**Usage:** `kamu export [OPTIONS] --output-format <OUTPUT_FORMAT> <DATASET>`

**Arguments:**

* `<DATASET>` — Local dataset reference

**Options:**

* `--output-path <OUTPUT_PATH>` — Export destination. Dafault is `<current workdir>/<dataset name>`
* `--output-format <OUTPUT_FORMAT>` — Output format

Possible values: `parquet`, `ndjson`, `csv`

* `--partition-size <PARTITION_SIZE>` — Number of records per file, if stored into a directory. Default is 5m. It's a soft limit. For a sake of export performance real number of records may be slightly different

This command exports a dataset to a file or set of files of a given format.

Output path may be either file or directory.
When a path contains extention, and no trailing separator, it is considered as a file.
In all other cases a path is considered as a directory. Examples:
- `export/dataset.csv` is a file path
- `export/dataset.csv/` is a directory path
- `export/dataset/` is a directory path
- `export/dataset` is a directory path




## `kamu ingest`

Adds data to the root dataset according to its push source configuration
Expand Down Expand Up @@ -438,6 +471,8 @@ List all datasets in the workspace
Array of arrays - compact and efficient and preserves column order
- `table`:
A pretty human-readable table
- `parquet`:
Parquet columnar storage. Only available when exporting to file(s)

* `-w`, `--wide` — Show more details (repeat for more)

Expand Down Expand Up @@ -882,6 +917,8 @@ Lists known repositories
Array of arrays - compact and efficient and preserves column order
- `table`:
A pretty human-readable table
- `parquet`:
Parquet columnar storage. Only available when exporting to file(s)



Expand Down Expand Up @@ -981,6 +1018,8 @@ Lists remote aliases
Array of arrays - compact and efficient and preserves column order
- `table`:
A pretty human-readable table
- `parquet`:
Parquet columnar storage. Only available when exporting to file(s)



Expand Down Expand Up @@ -1012,6 +1051,8 @@ Searches for datasets in the registered repositories
Array of arrays - compact and efficient and preserves column order
- `table`:
A pretty human-readable table
- `parquet`:
Parquet columnar storage. Only available when exporting to file(s)

* `--repo <REPO>` — Repository name(s) to search in

Expand Down Expand Up @@ -1057,6 +1098,8 @@ Executes an SQL query or drops you into an SQL shell
Array of arrays - compact and efficient and preserves column order
- `table`:
A pretty human-readable table
- `parquet`:
Parquet columnar storage. Only available when exporting to file(s)

* `--engine <ENG>` — Engine type to use for this SQL session

Expand All @@ -1065,9 +1108,19 @@ Executes an SQL query or drops you into an SQL shell
* `--url <URL>` — URL of a running JDBC server (e.g. jdbc:hive2://example.com:10000)
* `-c`, `--command <CMD>` — SQL command to run
* `--script <FILE>` — SQL script file to execute
* `--output-path <OUTPUT_PATH>` — When set, result will be stored to a given path instead of being printed to stdout
* `--partition-size <PARTITION_SIZE>` — Number of records per file, if stored into a directory. Default is 5m. It's a soft limit. For a sake of export performance real number of records may be slightly different

SQL shell allows you to explore data of all dataset in your workspace using one of the supported data processing engines. This can be a great way to prepare and test a query that you cal later turn into derivative dataset.

Output path may be either file or directory.
When a path contains extention, and no trailing separator, it is considered as a file.
In all other cases a path is considered as a directory. Examples:
- `export/dataset.csv` is a file path
- `export/dataset.csv/` is a directory path
- `export/dataset/` is a directory path
- `export/dataset` is a directory path

**Examples:**

Drop into SQL shell:
Expand Down Expand Up @@ -1344,6 +1397,8 @@ Displays a sample of most recent records in a dataset
Array of arrays - compact and efficient and preserves column order
- `table`:
A pretty human-readable table
- `parquet`:
Parquet columnar storage. Only available when exporting to file(s)

* `-n`, `--num-records <NUM>` — Number of records to display

Expand Down

0 comments on commit 26ca651

Please sign in to comment.