-
Notifications
You must be signed in to change notification settings - Fork 1
Integrate: Add sections about dlt and ingestr #268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,106 @@ | ||
| (dlt)= | ||
| # dlt | ||
|
|
||
| ```{div} .float-right .text-right | ||
| {loading=lazy}[dlt] | ||
| <br><br> | ||
| <a href="https://github.com/crate/cratedb-examples/actions/workflows/framework-dlt.yml" target="_blank" rel="noopener noreferrer"> | ||
| <img src="https://img.shields.io/github/actions/workflow/status/crate/cratedb-examples/framework-dlt.yml?branch=main&label=dlt" loading="lazy" alt="CI status: dlt"></a> | ||
| ``` | ||
| ```{div} .clearfix | ||
| ``` | ||
|
|
||
| [dlt] (data load tool)—think ELT as Python code—is a popular, | ||
| production-ready Python library for moving data. It loads data from | ||
| various and often messy data sources into well-structured, live datasets. | ||
| dlt is used by {ref}`ingestr`. | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it relevant?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Happy to change the wording any time. For the first version, I usually copy the upstream slogan verbatim. Sometimes it is good, sometimes not. Feel free to submit any suggestions and patches how to do it differently. Or are you specifically referring to how we cross-link to the page about ingestr? |
||
|
|
||
| ::::{grid} | ||
|
|
||
| :::{grid-item} | ||
| - **Just code**: no need to use any backends or containers. | ||
|
|
||
| - **Platform agnostic**: Does not replace your data platform, deployments, or security | ||
| models. Simply import dlt in your favorite code editor, or add it to your Jupyter | ||
| Notebook. | ||
|
|
||
| - **Versatile**: You can load data from any source that produces Python data structures, | ||
| including APIs, files, databases, and more. | ||
| ::: | ||
|
|
||
| :::: | ||
|
|
||
|
|
||
| ## Synopsis | ||
|
|
||
| Prerequisites: | ||
| Install dlt and the CrateDB destination adapter: | ||
| ```shell | ||
| pip install dlt dlt-cratedb | ||
| ``` | ||
|
|
||
| Load data from cloud storage or files into CrateDB. | ||
| ```python | ||
| import dlt | ||
| from dlt.sources.filesystem import filesystem | ||
|
|
||
| resource = filesystem( | ||
| bucket_url="s3://example-bucket", | ||
| file_glob="*.csv" | ||
| ) | ||
|
|
||
| pipeline = dlt.pipeline( | ||
| pipeline_name="filesystem_example", | ||
| destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"), | ||
| dataset_name="doc", | ||
| ) | ||
|
|
||
| pipeline.run(resource) | ||
| ``` | ||
|
|
||
| Load data from SQL databases into CrateDB. | ||
| ```python | ||
| from dlt.sources.sql_database import sql_database | ||
|
|
||
| source = sql_database( | ||
| "mysql+pymysql://[email protected]:4497/Rfam" | ||
| ) | ||
|
|
||
| pipeline = dlt.pipeline( | ||
| pipeline_name="sql_database_example", | ||
| destination=dlt.destinations.cratedb("postgresql://crate:crate@localhost:5432/"), | ||
| dataset_name="doc", | ||
| ) | ||
|
|
||
| pipeline.run(source) | ||
| ``` | ||
|
|
||
| ## Learn | ||
|
|
||
| ::::{grid} | ||
|
|
||
| :::{grid-item-card} Examples: Use dlt with CrateDB | ||
| :link: https://github.com/crate/cratedb-examples/tree/main/framework/dlt | ||
| :link-type: url | ||
| Executable code examples that demonstrate how to use dlt with CrateDB. | ||
| ::: | ||
|
|
||
| :::{grid-item-card} Adapter: The dlt destination adapter for CrateDB | ||
| :link: https://github.com/crate/dlt-cratedb | ||
| :link-type: url | ||
| Based on the dlt PostgreSQL adapter, the package enables you to work | ||
| with dlt and CrateDB. | ||
| ::: | ||
|
|
||
| :::{grid-item-card} See also: ingestr | ||
| :link: ingestr | ||
| :link-type: ref | ||
| The ingestr data import/export application uses dlt. | ||
| ::: | ||
|
|
||
| :::: | ||
|
|
||
|
|
||
|
|
||
| [databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/ | ||
| [dlt]: https://dlthub.com/ | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| (ingestr)= | ||
| # ingestr | ||
|
|
||
| ```{div} .float-right .text-right | ||
| <a href="https://github.com/crate/cratedb-examples/actions/workflows/application-ingestr.yml" target="_blank" rel="noopener noreferrer"> | ||
| <img src="https://img.shields.io/github/actions/workflow/status/crate/cratedb-examples/application-ingestr.yml?branch=main&label=ingestr" loading="lazy" alt="CI status: ingestr"></a> | ||
| ``` | ||
| ```{div} .clearfix | ||
| ``` | ||
|
|
||
| [ingestr] is a command-line application for copying data from any source | ||
| to any destination database. It supports CrateDB on both the source and | ||
| destination sides. ingestr builds on {ref}`dlt`. | ||
|
|
||
| ::::{grid} | ||
|
|
||
| :::{grid-item} | ||
| - **Single command**: ingestr allows copying & ingesting data from any source | ||
| to any destination with a single command. | ||
|
|
||
| - **Many sources & destinations**: ingestr supports all common source and | ||
| destination databases. | ||
|
|
||
| - **Incremental Loading**: ingestr supports both full-refresh and | ||
| incremental loading modes. | ||
| ::: | ||
|
|
||
| :::{grid-item} | ||
| {loading=lazy} | ||
| ::: | ||
|
|
||
| :::: | ||
|
|
||
|
|
||
| ## Synopsis | ||
|
|
||
| Invoke ingestr for exporting data from CrateDB. | ||
| ```shell | ||
| ingestr ingest \ | ||
| --source-uri 'crate://crate@localhost:4200/' \ | ||
| --source-table 'sys.summits' \ | ||
| --dest-uri 'duckdb:///cratedb.duckdb' \ | ||
| --dest-table 'dest.summits' | ||
| ``` | ||
|
|
||
| Invoke ingestr for loading data into CrateDB. | ||
| ```shell | ||
| ingestr ingest \ | ||
| --source-uri 'csv://input.csv' \ | ||
| --source-table 'sample' \ | ||
| --dest-uri 'cratedb://crate:@localhost:5432/?sslmode=disable' \ | ||
| --dest-table 'doc.sample' | ||
| ``` | ||
|
|
||
| :::{note} | ||
| Please note there are subtle differences between the CrateDB source and target URLs. | ||
| While `--source-uri=crate://...` addresses CrateDB's SQLAlchemy dialect, | ||
| `--dest-uri=cratedb://...` is effectively a PostgreSQL connection URL | ||
| with a protocol schema designating CrateDB. The source adapter uses | ||
| CrateDB's HTTP protocol, while the destination adapter uses CrateDB's | ||
| PostgreSQL interface. | ||
| ::: | ||
|
|
||
|
|
||
| ## Coverage | ||
|
|
||
| ingestr supports migration from 20-plus databases, data platforms, and analytics | ||
| engines, including all [databases supported by SQLAlchemy]. | ||
|
|
||
| :::{rubric} Traditional Databases | ||
| ::: | ||
| CockroachDB, CrateDB, Firebird, HyperSQL (hsqldb), IBM DB2 and Informix, | ||
| Microsoft Access, Microsoft SQL Server, MonetDB, MySQL and MariaDB, | ||
| OpenGauss, Oracle, PostgreSQL, SAP ASE, SAP HANA, SAP Sybase SQL Anywhere, | ||
| SQLite, TiDB, YDB, YugabyteDB | ||
|
|
||
| :::{rubric} Cloud Data Warehouses & Analytics | ||
| ::: | ||
| Amazon Athena, Amazon Redshift, Databend, Databricks, Denodo, DuckDB, | ||
| EXASOL DB, Firebolt, Google BigQuery, Greenplum, IBM Netezza Performance Server, | ||
| Impala, Kinetica, Rockset, Snowflake, Teradata Vantage | ||
|
|
||
| :::{rubric} Specialized Data Stores | ||
| ::: | ||
| Apache Drill, Apache Druid, Apache Hive and Presto, Clickhouse, Elasticsearch, | ||
| InfluxDB, MongoDB, OpenSearch | ||
|
|
||
| :::{rubric} Message Brokers | ||
| ::: | ||
| Amazon Kinesis, Apache Kafka (Amazon MSK, Confluent Kafka, Redpanda, RobustMQ) | ||
|
|
||
| :::{rubric} File Formats | ||
| ::: | ||
| CSV, JSONL/NDJSON, Parquet | ||
|
|
||
| :::{rubric} Object Stores | ||
| ::: | ||
| Amazon S3, Google Cloud Storage | ||
|
|
||
| :::{rubric} SaaS Platforms & Services | ||
| ::: | ||
| Airtable, Asana, GitHub, Google Ads, Google Analytics, Google Sheets, HubSpot, | ||
| Notion, Personio, Salesforce, Slack, Stripe, Zendesk, etc. | ||
|
|
||
|
|
||
| ## Learn | ||
|
|
||
| ::::{grid} | ||
|
|
||
| :::{grid-item-card} Documentation: ingestr CrateDB source | ||
| :link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#source | ||
| :link-type: url | ||
| Documentation about the CrateDB source adapter for ingestr. | ||
| ::: | ||
|
|
||
| :::{grid-item-card} Documentation: ingestr CrateDB destination | ||
| :link: https://bruin-data.github.io/ingestr/supported-sources/cratedb.html#destination | ||
| :link-type: url | ||
| Documentation about the CrateDB destination adapter for ingestr. | ||
| ::: | ||
|
|
||
| :::{grid-item-card} Examples: Use ingestr with CrateDB | ||
| :link: https://github.com/crate/cratedb-examples/tree/main/application/ingestr | ||
| :link-type: url | ||
| Executable code examples / rig that demonstrates how to use ingestr to | ||
| load data from Kafka to CrateDB. | ||
| ::: | ||
|
|
||
| :::: | ||
|
|
||
|
|
||
|
|
||
| [databases supported by SQLAlchemy]: https://docs.sqlalchemy.org/en/20/dialects/ | ||
| [ingestr]: https://bruin-data.github.io/ingestr/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Software-defined ELT 😄