Generate CLI doc

kamu-data · Dec 19, 2024 · 26ca651 · 26ca651
1 parent 459a08e
commit 26ca651
Showing 1 changed file with 55 additions and 0 deletions.
diff --git a/resources/cli-reference.md b/resources/cli-reference.md
@@ -15,6 +15,7 @@ To regenerate this schema from existing code, use the following command:
 * `completions` — Generate tab-completion scripts for your shell
 * `config` — Get or set configuration options
 * `delete [rm]` — Delete a dataset
+* `export` — Exports a dataset
 * `ingest` — Adds data to the root dataset according to its push source configuration
 * `init` — Initialize an empty workspace in the current directory
 * `inspect` — Group of commands for exploring dataset metadata
@@ -254,6 +255,38 @@ Delete local datasets matching pattern:
 
 
 
+## `kamu export`
+
+Exports a dataset
+
+**Usage:** `kamu export [OPTIONS] --output-format <OUTPUT_FORMAT> <DATASET>`
+
+**Arguments:**
+
+* `<DATASET>` — Local dataset reference
+
+**Options:**
+
+* `--output-path <OUTPUT_PATH>` — Export destination. Dafault is `<current workdir>/<dataset name>`
+* `--output-format <OUTPUT_FORMAT>` — Output format
+
+  Possible values: `parquet`, `ndjson`, `csv`
+
+* `--partition-size <PARTITION_SIZE>` — Number of records per file, if stored into a directory. Default is 5m. It's a soft limit. For a sake of export performance real number of records may be slightly different
+
+This command exports a dataset to a file or set of files of a given format.
+
+Output path may be either file or directory.
+When a path contains extention, and no trailing separator, it is considered as a file.
+In all other cases a path is considered as a directory. Examples:
+ - `export/dataset.csv` is a file path
+ - `export/dataset.csv/` is a directory path
+ - `export/dataset/` is a directory path
+ - `export/dataset` is a directory path
+
+
+
+
 ## `kamu ingest`
 
 Adds data to the root dataset according to its push source configuration
@@ -438,6 +471,8 @@ List all datasets in the workspace
     Array of arrays - compact and efficient and preserves column order
   - `table`:
     A pretty human-readable table
+  - `parquet`:
+    Parquet columnar storage. Only available when exporting to file(s)
 
 * `-w`, `--wide` — Show more details (repeat for more)
 
@@ -882,6 +917,8 @@ Lists known repositories
     Array of arrays - compact and efficient and preserves column order
   - `table`:
     A pretty human-readable table
+  - `parquet`:
+    Parquet columnar storage. Only available when exporting to file(s)
 
 
 
@@ -981,6 +1018,8 @@ Lists remote aliases
     Array of arrays - compact and efficient and preserves column order
   - `table`:
     A pretty human-readable table
+  - `parquet`:
+    Parquet columnar storage. Only available when exporting to file(s)
 
 
 
@@ -1012,6 +1051,8 @@ Searches for datasets in the registered repositories
     Array of arrays - compact and efficient and preserves column order
   - `table`:
     A pretty human-readable table
+  - `parquet`:
+    Parquet columnar storage. Only available when exporting to file(s)
 
 * `--repo <REPO>` — Repository name(s) to search in
 
@@ -1057,6 +1098,8 @@ Executes an SQL query or drops you into an SQL shell
     Array of arrays - compact and efficient and preserves column order
   - `table`:
     A pretty human-readable table
+  - `parquet`:
+    Parquet columnar storage. Only available when exporting to file(s)
 
 * `--engine <ENG>` — Engine type to use for this SQL session
 
@@ -1065,9 +1108,19 @@ Executes an SQL query or drops you into an SQL shell
 * `--url <URL>` — URL of a running JDBC server (e.g. jdbc:hive2://example.com:10000)
 * `-c`, `--command <CMD>` — SQL command to run
 * `--script <FILE>` — SQL script file to execute
+* `--output-path <OUTPUT_PATH>` — When set, result will be stored to a given path instead of being printed to stdout
+* `--partition-size <PARTITION_SIZE>` — Number of records per file, if stored into a directory. Default is 5m. It's a soft limit. For a sake of export performance real number of records may be slightly different
 
 SQL shell allows you to explore data of all dataset in your workspace using one of the supported data processing engines. This can be a great way to prepare and test a query that you cal later turn into derivative dataset.
 
+Output path may be either file or directory.
+When a path contains extention, and no trailing separator, it is considered as a file.
+In all other cases a path is considered as a directory. Examples:
+ - `export/dataset.csv` is a file path
+ - `export/dataset.csv/` is a directory path
+ - `export/dataset/` is a directory path
+ - `export/dataset` is a directory path
+
 **Examples:**
 
 Drop into SQL shell:
@@ -1344,6 +1397,8 @@ Displays a sample of most recent records in a dataset
     Array of arrays - compact and efficient and preserves column order
   - `table`:
     A pretty human-readable table
+  - `parquet`:
+    Parquet columnar storage. Only available when exporting to file(s)
 
 * `-n`, `--num-records <NUM>` — Number of records to display