Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/ingest/etl/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,21 @@ outlines how to use them effectively. Additionally, see support for {ref}`cdc` s
dbt is an SQL-first platform for transforming data in data warehouses using
Python and SQL. The data abstraction layer provided by dbt-core allows the
decoupling of the models on which reports and dashboards rely from the source data.
- {ref}`dlt`

dlt is a popular production-ready Python library for moving data:
Think ELT as Python code.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Software-defined ELT 😄


- {ref}`flink`

Apache Flink is a programming framework and distributed processing engine for
stateful computations over unbounded and bounded data streams, written in Java.

- {ref}`ingestr`

ingestr is a command-line application that allows copying data from any
source into any destination database.

- {ref}`kestra`

Kestra is an open-source workflow automation and orchestration toolkit with a rich
Expand Down Expand Up @@ -230,13 +239,15 @@ Load data from datasets and open table formats.
- {ref}`aws-lambda`
- {ref}`azure-functions`
- {ref}`dbt`
- {ref}`dlt`
- {ref}`dms`
- {ref}`dynamodb`
- {ref}`estuary`
- {ref}`flink`
- {ref}`hop`
- {ref}`iceberg`
- {ref}`influxdb`
- {ref}`ingestr`
- {ref}`kafka`
- {ref}`kestra`
- {ref}`kinesis`
Expand Down
106 changes: 106 additions & 0 deletions docs/integrate/dlt/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
(dlt)=
# dlt

```{div} .float-right .text-right
![dlt logo](https://cdn.sanity.io/images/nsq559ov/production/7f85e56e715b847c5519848b7198db73f793448d-82x25.svg?w=2000&auto=format){loading=lazy}[dlt]
<br><br>
<a href="https://github.com/crate/cratedb-examples/actions/workflows/framework-dlt.yml" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/github/actions/workflow/status/crate/cratedb-examples/framework-dlt.yml?branch=main&label=dlt" loading="lazy" alt="CI status: dlt"></a>
```
```{div} .clearfix
```

[dlt] (data load tool)—think ELT as Python code—is a popular,
production-ready Python library for moving data. It loads data from
various and often messy data sources into well-structured, live datasets.
dlt is used by {ref}`ingestr`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it relevant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to change the wording any time. For the first version, I usually copy the upstream slogan verbatim. Sometimes it is good, sometimes not. Feel free to submit any suggestions and patches how to do it differently.

Or are you specifically referring to how we cross-link to the page about ingestr?


::::{grid}

:::{grid-item}
- **Just code**: no need to use any backends or containers.

- **Platform agnostic**: Does not replace your data platform, deployments, or security
models. Simply import dlt in your favorite code editor, or add it to your Jupyter
Notebook.

- **Versatile**: You can load data from any source that produces Python data structures,
including APIs, files, databases, and more.
:::

::::


## Synopsis

Prerequisites:
Install dlt and the CrateDB destination adapter:
```shell
pip install dlt dlt-cratedb
```

Load data from cloud storage or files into CrateDB.
```python
import dlt
from dlt.sources.filesystem import filesystem

resource = filesystem(
bucket_url="s3://example-bucket",
file_glob="*.csv"
)

pipeline = dlt.pipeline(
pipeline_name="filesystem_example",
destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"),
dataset_name="doc",
)

pipeline.run(resource)
```

Load data from SQL databases into CrateDB.
```python
from dlt.sources.sql_database import sql_database

source = sql_database(
"mysql+pymysql://[email protected]:4497/Rfam"
)

pipeline = dlt.pipeline(
pipeline_name="sql_database_example",
destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"),
dataset_name="doc",
)

pipeline.run(source)
```

## Learn

::::{grid}

:::{grid-item-card} Examples: Use dlt with CrateDB
:link: https://github.com/crate/cratedb-examples/tree/main/framework/dlt
:link-type: url
Executable code examples that demonstrate how to use dlt with CrateDB.
:::

:::{grid-item-card} Adapter: The dlt destination adapter for CrateDB
:link: https://github.com/crate/dlt-cratedb
:link-type: url
Based on the dlt PostgreSQL adapter, the package enables you to work
with dlt and CrateDB.
:::

:::{grid-item-card} See also: ingestr
:link: ingestr
:link-type: ref
The ingestr data import/export application uses dlt.
:::

::::



[databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/
[dlt]: https://dlthub.com/
2 changes: 2 additions & 0 deletions docs/integrate/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ dbeaver/index
dbt/index
debezium/index
django/index
dlt/index
dms/index
dynamodb/index
estuary/index
Expand All @@ -36,6 +37,7 @@ grafana/index
hop/index
iceberg/index
influxdb/index
ingestr/index
kafka/index
kestra/index
kinesis/index
Expand Down
134 changes: 134 additions & 0 deletions docs/integrate/ingestr/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
(ingestr)=
# ingestr

```{div} .float-right .text-right
<a href="https://github.com/crate/cratedb-examples/actions/workflows/application-ingestr.yml" target="_blank" rel="noopener noreferrer">
<img src="https://img.shields.io/github/actions/workflow/status/crate/cratedb-examples/application-ingestr.yml?branch=main&label=ingestr" loading="lazy" alt="CI status: ingestr"></a>
```
```{div} .clearfix
```

[ingestr] is a command-line application for copying data from any source
to any destination database. It supports CrateDB on both the source and
destination sides. ingestr builds on {ref}`dlt`.

::::{grid}

:::{grid-item}
- **Single command**: ingestr allows copying & ingesting data from any source
to any destination with a single command.

- **Many sources & destinations**: ingestr supports all common source and
destination databases.

- **Incremental Loading**: ingestr supports both full-refresh and
incremental loading modes.
:::

:::{grid-item}
![ingestr in a nutshell](https://github.com/bruin-data/ingestr/blob/main/resources/demo.gif?raw=true){loading=lazy}
:::

::::


## Synopsis

Invoke ingestr for exporting data from CrateDB.
```shell
ingestr ingest \
--source-uri 'crate://crate@localhost:4200/' \
--source-table 'sys.summits' \
--dest-uri 'duckdb:///cratedb.duckdb' \
--dest-table 'dest.summits'
```

Invoke ingestr for loading data into CrateDB.
```shell
ingestr ingest \
--source-uri 'csv://input.csv' \
--source-table 'sample' \
--dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \
--dest-table 'doc.sample'
```

:::{note}
Please note there are subtle differences between the CrateDB source and target URLs.
While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect,
`--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL
with a protocol schema designating CrateDB. The source adapter uses
CrateDB's HTTP protocol, while the destination adapter uses CrateDB's
PostgreSQL interface.
:::


## Coverage

ingestr supports migration from 20-plus databases, data platforms, and analytics
engines, including all [databases supported by SQLAlchemy].

:::{rubric} Traditional Databases
:::
CockroachDB, CrateDB, Firebird, HyperSQL (hsqldb), IBM DB2 and Informix,
Microsoft Access, Microsoft SQL Server, MonetDB, MySQL and MariaDB,
OpenGauss, Oracle, PostgreSQL, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere,
SQLite, TiDB, YDB, YugabyteDB

:::{rubric} Cloud Data Warehouses & Analytics
:::
Amazon Athena, Amazon Redshift, Databend, Databricks, Denodo, DuckDB,
EXASOL DB, Firebolt, Google BigQuery, Greenplum, IBM Netezza Performance Server,
Impala, Kinetica, Rockset, Snowflake, Teradata Vantage

:::{rubric} Specialized Data Stores
:::
Apache Drill, Apache Druid, Apache Hive and Presto, Clickhouse, Elasticsearch,
InfluxDB, MongoDB, OpenSearch

:::{rubric} Message Brokers
:::
Amazon Kinesis, Apache Kafka (Amazon MSK, Confluent Kafka, Redpanda, RobustMQ)

:::{rubric} File Formats
:::
CSV, JSONL/NDJSON, Parquet

:::{rubric} Object Stores
:::
Amazon S3, Google Cloud Storage

:::{rubric} SaaS Platforms & Services
:::
Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot,
Notion, Personio, Salesforce, Slack, Stripe, Zendesk, etc.


## Learn

::::{grid}

:::{grid-item-card} Documentation: ingestr CrateDB source
:link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#source
:link-type: url
Documentation about the CrateDB source adapter for ingestr.
:::

:::{grid-item-card} Documentation: ingestr CrateDB destination
:link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#destination
:link-type: url
Documentation about the CrateDB destination adapter for ingestr.
:::

:::{grid-item-card} Examples: Use ingestr with CrateDB
:link: https://github.com/crate/cratedb-examples/tree/main/application/ingestr
:link-type: url
Executable code examples / rig that demonstrates how to use ingestr to
load data from Kafka to CrateDB.
:::

::::



[databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/
[ingestr]: https://bruin-data.github.io/ingestr/