-
Notifications
You must be signed in to change notification settings - Fork 1
Trino: Index page and starter tutorial #309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Trino integration documentation: updates Changes
Sequence Diagram(s)sequenceDiagram
rect rgb(230,240,255)
participant User as Developer (CLI)
participant Trino as Trino Coordinator
participant Connector as PostgreSQL Connector
participant Crate as CrateDB
end
User->>Trino: trino-cli -> execute query
Trino->>Connector: route query via catalog (postgresql.properties)
Connector->>Crate: SQL over PostgreSQL protocol (no DB name)
Crate-->>Connector: results (note: object quoting/type mappings)
Connector-->>Trino: results (may limit pushdown)
Trino-->>User: query results (with possible performance caveats)
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (4)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai review and please refactor to active voice. |
|
@amotl I'll review the PR and help refactor the content to use active voice as requested. Let me start by reviewing the changes. 🧠 Learnings used✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
docs/integrate/trino/tutorial.md (6)
4-4: Refactor to active voice and consistent second person.Tighten phrasing and avoid first-person singular. Suggested edits:
- [Trino](https://trino.io/) (formerly known as Presto SQL) is a distributed query engine, that allows running analytical queries across different data sources via SQL. One of those data sources can be CrateDB and this article is going to look at how to configure the connection. + [Trino](https://trino.io/) is a distributed SQL query engine. This tutorial shows how to configure Trino to connect to CrateDB.- We assume a Trino client/server installation is already in place as per [Trino’s installation instructions](https://trino.io/docs/current/installation.html). + Assume you have a Trino client/server installation as per the [installation instructions](https://trino.io/docs/current/installation.html).- For this post, I installed Trino on macOS using Homebrew with `brew install trino` and my installation directory is `/usr/local/Cellar/trino/375`. Depending on your installation method, there might be different ways to start the Trino server. For the sake of this post, I start it in my console from the installation directory with the command `./bin/trino-server run`. Your preferred way of starting might differ. + For example, on macOS you can `brew install trino`. Start the server with `trino-server run` from your installation’s `bin` directory. Depending on your installation, the command and paths may differ.- Due to CrateDB’s PostgreSQL protocol compatibility, we can make use of Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a new file `/usr/local/Cellar/trino/375/libexec/etc/catalog/postgresql.properties` to configure the connection: + Because CrateDB speaks the PostgreSQL wire protocol, use Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a catalog properties file to configure the connection:- Please replace the placeholders for the CrateDB hostname, username, and password to match your setup. Besides the connection details, the configuration has two particularities: + Replace the placeholders for the CrateDB hostname, username, and password. Besides the connection details, note two specifics:- Once the PostgreSQL connector is configured, we can connect to the Trino server using its CLI: + After configuring the connector, connect to the Trino server using its CLI:- A `SHOW TABLES` query should successfully list all existing tables in the specified CrateDB schema and you can proceed with querying them. + Run `SHOW TABLES` to list all tables in the specified CrateDB schema, then query them.- As CrateDB differs in some aspects from PostgreSQL, there are a few particularities to consider for your queries: + Because CrateDB differs in some aspects from PostgreSQL, consider the following nuances when writing queries:- With a few parameter tweaks, Trino can successfully connect to CrateDB. The information presented in this post is the result of a short compatibility test and is likely not exhaustive. If you use Trino with CrateDB and are aware of any additional aspects, please let us know! + With a few parameter tweaks, Trino connects to CrateDB. This guide reflects a short compatibility test and is not exhaustive. If you discover additional aspects, please let us know.Also applies to: 8-8, 10-10, 14-14, 24-24, 30-30, 40-40, 42-42, 54-54
16-22: Add a language to the fenced code block (fixes MD040).Use INI/properties highlighting for the catalog config:
-``` +```ini connector.name=postgresql connection-url=jdbc:postgresql://<CrateDB hostname>:5432/ connection-user=<CrateDB username> connection-password=<CrateDB password> insert.non-transactional-insert.enabled=true--- `14-22`: **Prefer stable config paths over versioned Homebrew Cellar paths.** Point to TRINO_HOME or etc paths that survive upgrades: ```diff -Due to CrateDB’s PostgreSQL protocol compatibility, we can make use of Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a new file `/usr/local/Cellar/trino/375/libexec/etc/catalog/postgresql.properties` to configure the connection: +Because CrateDB speaks the PostgreSQL wire protocol, use Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a catalog file, for example: + +- macOS (Homebrew): `/usr/local/etc/trino/catalog/postgresql.properties` (or `/opt/homebrew/etc/trino/catalog/...` on Apple Silicon) +- Linux (tarball/systemd): `$TRINO_HOME/etc/catalog/postgresql.properties` or `/etc/trino/catalog/postgresql.properties`Would you like me to adjust the rest of the doc to reference these stable paths consistently?
26-29: Clarify the two specifics with tighter phrasing.Minor wording to improve scanability:
-* No database name: With PostgreSQL, a JDBC connection URL usually ends with a database name. We intentionally omit the database name when connecting to CrateDB for compatibility reasons. -CrateDB consists of a single database with multiple schemas, hence we do not specify a database name in the `connection-url`. If a database name is specified, you will run into an error message on certain operations (`ERROR: Table with more than 2 QualifiedName parts is not supported. Only <schema>.<tableName> works`). -* Disabling transactions: Being a database with eventual consistency, CrateDB doesn’t support transactions. By default, the PostgreSQL connector will wrap `INSERT` queries into transactions and attempt to create a temporary table. We disable this behavior with the `insert.non-transactional-insert.enabled` parameter. +* No database name: CrateDB provides a single database with multiple schemas, so omit the database name in `connection-url`. Specifying a database triggers errors for operations that include `catalog.schema.table` (e.g., `ERROR: Table with more than 2 QualifiedName parts is not supported. Only <schema>.<tableName> works`). +* Non‑transactional inserts: CrateDB doesn’t support transactions. By default, the PostgreSQL connector wraps `INSERT` statements in a transaction and uses a temporary table. Disable this with `insert.non-transactional-insert.enabled=true`.
45-45: Offer a practical workaround forcatalog.schema.tableon INSERT.Until crate/crate#12658 is resolved, call out two options:
- Run INSERTs directly against CrateDB (psql/PgJDBC) outside Trino.
- Or create a view in CrateDB that Trino can target via two-part names and document that DML via Trino may be limited.
Add a short note after this bullet.
50-50: Set expectations on performance with a version stamp.Add a sentence like “Tested with Trino X.Y and CrateDB Z.W” to time‑box the pushdown behavior, which changes across releases.
docs/integrate/trino/index.md (2)
4-6: Avoid hotlinking external images; vendor the logo.Hotlinking the logo from Wikimedia introduces an external runtime dependency and potential license/availability issues. Store the asset locally (e.g.,
_static/img/trino-logo.png) and reference it.I can add the asset and update references if you confirm the preferred path.
14-16: Tighten the “About” copy.Slightly reduce marketing tone and keep active voice:
-[Trino] is a fast distributed SQL query engine for big data analytics -that helps you explore your data universe. +[Trino] is a fast, distributed SQL query engine for analytics.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/integrate/index.md(1 hunks)docs/integrate/trino/index.md(1 hunks)docs/integrate/trino/tutorial.md(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-14T19:02:43.308Z
Learnt from: amotl
PR: crate/cratedb-guide#253
File: docs/integrate/mindsdb/index.md:21-33
Timestamp: 2025-08-14T19:02:43.308Z
Learning: In CrateDB integration documentation examples, default connection parameters (user="crate", password="", host="127.0.0.1") are intentionally used to provide working out-of-the-box examples for users with local CrateDB instances, rather than using placeholder values that require customization.
Applied to files:
docs/integrate/trino/tutorial.md
📚 Learning: 2025-08-15T18:56:11.508Z
Learnt from: amotl
PR: crate/cratedb-guide#255
File: docs/integrate/influxdb/tutorial.md:120-123
Timestamp: 2025-08-15T18:56:11.508Z
Learning: In CrateDB integration tutorials, when using `ctk load table` with `--cluster-url="crate://userhost:port/schema/table"`, the verification query should reference the target schema.table combination specified in the cluster URL, not derive it from the source system's naming. For example, if loading from InfluxDB bucket/measurement `testdrive/demo` to CrateDB `doc/testdrive`, the verification query should be `SELECT * FROM doc.testdrive`.
Applied to files:
docs/integrate/trino/tutorial.md
🪛 markdownlint-cli2 (0.17.2)
docs/integrate/trino/tutorial.md
16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🔇 Additional comments (3)
docs/integrate/trino/tutorial.md (1)
16-22: Use working defaults for out‑of‑the‑box local setups.Per prior guidance, prefer defaults over placeholders to let readers copy/paste and connect to a local CrateDB:
-```ini -connector.name=postgresql -connection-url=jdbc:postgresql://<CrateDB hostname>:5432/ -connection-user=<CrateDB username> -connection-password=<CrateDB password> -insert.non-transactional-insert.enabled=true -``` +```ini +connector.name=postgresql +connection-url=jdbc:postgresql://127.0.0.1:5432/ +connection-user=crate +connection-password= +insert.non-transactional-insert.enabled=true +```[ suggest_recommended_refactor ]
This aligns with the “working defaults” preference you set earlier. If you prefer, we can keep both: a default block plus a placeholders block.docs/integrate/index.md (1)
77-77: LGTM: ToC entry added for Trino.Entry placement matches surrounding style. No action needed.
docs/integrate/trino/index.md (1)
20-28: Cross‑reference works as intended.
grid-item-cardcorrectly links totrino-tutorialvia:link-type: ref.Please confirm the Sphinx extensions (e.g., sphinx-design) are enabled in this project build, as required by
grid/grid-item-card.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
docs/feature/query/index.md (1)
157-158: Use active‑voice link text (keep targets unchanged).Suggest making the visible link text active while still referencing the same targets.
- - [Analyzing Device Readings with Metadata Integration] - - [Time Series: Analyzing Weather Data] + - [Analyze Device Readings with Metadata Integration][Analyzing Device Readings with Metadata Integration] + - [Analyze Weather Data (Time Series)][Time Series: Analyzing Weather Data]
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/feature/query/index.md(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build docs
🔇 Additional comments (2)
docs/feature/query/index.md (2)
241-242: Same verification for these targets.Ensure the “project:” links for weather analysis and UNNEST resolve (see script above).
231-233: Confirm "project:" cross‑ref targets exist and Sphinx resolves them
- Cross‑refs to check: docs/feature/query/index.md (lines 231–233).
- Found labels:
- (timeseries-with-metadata) → docs/topic/timeseries/learn/with-metadata.md:2
- (timeseries-analysis-weather) → docs/topic/timeseries/learn/query.md:2
- Not found in repo: inserts_bulk_operations, inserts_unnest — add those labels at the intended targets or update the refs.
- I could not run a Sphinx build in the sandbox (sphinx-build/tee: command not found). Run locally to confirm: sphinx-build -nW -b html docs/ _build/html and fix any unresolved‑ref warnings.
kneth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only smaller - and non-blocker - comments
docs/integrate/trino/tutorial.md
Outdated
|
|
||
| ## Connector configuration | ||
|
|
||
| Because CrateDB speaks the PostgreSQL wire protocol, use Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a catalog properties file to configure the connection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Because CrateDB speaks the PostgreSQL wire protocol, use Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a catalog properties file to configure the connection: | |
| Because CrateDB speaks the PostgreSQL wire protocol, you can use Trino’s [PostgreSQL connector](https://trino.io/docs/current/connector/postgresql.html). Create a catalog properties file to configure the connection: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed with 60f22fc.
docs/integrate/trino/tutorial.md
Outdated
| - macOS (Homebrew): `/usr/local/etc/trino/catalog/postgresql.properties` (or `/opt/homebrew/etc/trino/catalog/...` on Apple Silicon) | ||
| - Linux (tarball/systemd): `$TRINO_HOME/etc/catalog/postgresql.properties` or `/etc/trino/catalog/postgresql.properties` | ||
|
|
||
| * Querying `OBJECT` columns: Columns of the data type `OBJECT` can usually be queried using the bracket notation, e.g. `SELECT my_object_column['my_object_key'] FROM my_table`. In Trino’s SQL dialect, the identifier needs to be wrapped in double quotes, such as `SELECT "my_object_column['my_object_key']" FROM my_table`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Querying `OBJECT` columns: Columns of the data type `OBJECT` can usually be queried using the bracket notation, e.g. `SELECT my_object_column['my_object_key'] FROM my_table`. In Trino’s SQL dialect, the identifier needs to be wrapped in double quotes, such as `SELECT "my_object_column['my_object_key']" FROM my_table`. | |
| * Querying `OBJECT` columns: Columns of the data type `OBJECT` can usually be queried using the bracket notation e.g., `SELECT my_object_column['my_object_key'] FROM my_table`. In Trino’s SQL dialect, the identifier needs to be wrapped in double quotes, such as `SELECT "my_object_column['my_object_key']" FROM my_table`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed with 60f22fc.
About
Continue adding tutorials from the community forum.
Preview
References