-
Notifications
You must be signed in to change notification settings - Fork 40
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
How to query Iceberg topics using Snowflake and Open Catalog (#957)
Co-authored-by: Joyce Fee <[email protected]>
- Loading branch information
Showing
4 changed files
with
229 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
= Integrate Redpanda with Iceberg | ||
:description: Generate Iceberg tables for your Redpanda topics for data lakehouse access. | ||
:page-layout: index |
214 changes: 214 additions & 0 deletions
214
modules/manage/pages/iceberg/redpanda-topics-iceberg-snowflake-catalog.adoc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
= Query Iceberg Topics using Snowflake and Open Catalog | ||
:description: Add Redpanda topics as Iceberg tables that you can query in Snowflake using an Open Catalog integration. | ||
:page-categories: Iceberg, Tiered Storage, Management, High Availability, Data Replication, Integration | ||
:page-beta: true | ||
|
||
[NOTE] | ||
==== | ||
include::shared:partial$enterprise-license.adoc[] | ||
==== | ||
|
||
This guide walks you through querying Redpanda topics as Iceberg tables in https://docs.snowflake.com/en/user-guide/tables-iceberg[Snowflake^], with AWS S3 as object storage and a catalog integration using https://other-docs.snowflake.com/en/opencatalog/overview[Open Catalog^]. | ||
|
||
== Prerequisites | ||
|
||
* xref:manage:tiered-storage.adoc#configure-object-storage[Object storage configured] for your cluster and xref:manage:tiered-storage.adoc#enable-tiered-storage[Tiered Storage enabled] for the topics for which you want to generate Iceberg tables. | ||
** The S3 bucket URI so that you can configure it as external storage for Open Catalog. | ||
* A Snowflake account. | ||
* An Open Catalog account. To https://other-docs.snowflake.com/en/opencatalog/create-open-catalog-account[create an Open Catalog account], you require ORGADMIN access in Snowflake. | ||
* An internal catalog created in Open Catalog with your Tiered Storage AWS S3 bucket configured as external storage. | ||
** Follow this guide to https://other-docs.snowflake.com/en/opencatalog/create-catalog#create-a-catalog-using-amazon-simple-storage-service-amazon-s3[create a catalog] with the S3 bucket configured as external storage. You require admin permissions to carry out these steps in AWS: | ||
. If you don't already have one, create an IAM policy that gives Open Catalog read and write access to your S3 bucket. | ||
. Create an IAM role and attach the IAM policy to the role. | ||
. After creating a new catalog in Open Catalog, grant the catalog's AWS IAM user access to the S3 bucket. | ||
* A Snowflake https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume[external volume] set up using the Tiered Storage bucket. | ||
** Follow this guide to https://docs.snowflake.com/en/user-guide/tables-iceberg-configure-external-volume-s3[configure the external volume with S3]. You can use the same IAM policy as the catalog for the external volume's IAM role and user. | ||
|
||
== Set up catalog integration using Open Catalog | ||
|
||
=== Create a new Open Catalog service connection for Redpanda | ||
|
||
To create a new service connection to integrate the Iceberg-enabled topics into Open Catalog: | ||
|
||
. In Open Catalog, select *Connections*, then *+ Connection*. | ||
. In *Configure Service Connection*, provide a name. Open Catalog creates a new principal with this name. | ||
. Make sure *Create new principal role* is toggled on. | ||
. Enter a name for the principal role. Then, click *Create*. | ||
|
||
After you create the connection, you are provided the client ID and client secret. Save these credentials to add to your cluster configuration in a later step. | ||
|
||
=== Create a catalog role | ||
|
||
Grant privileges to the principal created in the previous step: | ||
|
||
. In Open Catalog, select *Catalogs*, and select your catalog. | ||
. On the *Roles* tab of your catalog, click *+ Catalog Role*. | ||
. Give the catalog role a name. | ||
. Under *Privileges*, select `CATALOG_MANAGE_CONTENT`. This provides full management https://other-docs.snowflake.com/en/opencatalog/access-control#catalog-privileges[privileges] for the catalog. Then, click *Create*. | ||
. On the *Roles* tab of the catalog, click *Grant to Principal Role*. | ||
. Select the catalog role you just created. | ||
. Select the principal role you created earlier. Click *Grant*. | ||
|
||
=== Update cluster configuration | ||
|
||
To configure your Redpanda cluster to enable Iceberg on a topic, as well as set up the integration with Open Catalog: | ||
|
||
. Edit your cluster configuration to set the `iceberg_enabled` property to `true`, and set the catalog integration properties listed in the example below. You must restart your cluster if you change this configuration for a running cluster. You can run `rpk cluster config edit` to update these properties: | ||
+ | ||
[,bash] | ||
---- | ||
iceberg_enabled: true | ||
iceberg_catalog_type: rest | ||
iceberg_rest_catalog_endpoint: https://<snowflake-orgname>-<open-catalog-account-name>.snowflakecomputing.com/polaris/api/catalog | ||
iceberg_rest_catalog_client_id: <open-catalog-connection-client-id> | ||
iceberg_rest_catalog_client_secret: <open-catalog-connection-client-secret> | ||
iceberg_rest_catalog_prefix: <open-catalog-name> | ||
# Optional | ||
iceberg_translation_interval_ms_default: 1000 | ||
iceberg_catalog_commit_interval_ms: 1000 | ||
---- | ||
+ | ||
Use your own values for the following placeholders: | ||
+ | ||
-- | ||
- `<snowflake-orgname>` and `<open-catalog-account-name>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI] is composed of these values. | ||
+ | ||
TIP: In Snowflake, navigate to **Admin**, then **Accounts**. Click the ellipsis near your Open Catalog account name, and select **Manage URLs**. The **Current URL** contains `<snowflake-orgname>` and `<open-catalog-account-name>`. | ||
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step. | ||
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step. | ||
- `<open-catalog-name>`: The name of your catalog in Open Catalog. | ||
-- | ||
+ | ||
[,bash,role=no-copy] | ||
---- | ||
Successfully updated configuration. New configuration version is 2. | ||
---- | ||
|
||
. You must restart your cluster so that the configuration changes take effect. | ||
|
||
. Enable the integration for a topic by configuring the topic property `redpanda.iceberg.mode`. This mode creates an Iceberg table for the topic consisting of two columns, one for the record metadata including the key, and another binary column for the record's value. See xref:manage:iceberg/topic-iceberg-integration.adoc#enable-iceberg-integration[Enable Iceberg integration] for more details on Iceberg modes. The following examples show how to use xref:get-started:rpk-install.adoc[`rpk`] to either create a new topic, or alter the configuration for an existing topic, to set the Iceberg mode to `key_value`. | ||
+ | ||
.Create a new topic and set `redpanda.iceberg.mode`: | ||
[,bash] | ||
---- | ||
rpk topic create <topic-name> --topic-config=redpanda.iceberg.mode=key_value | ||
---- | ||
+ | ||
.Set `redpanda.iceberg.mode` for an existing topic: | ||
[,bash] | ||
---- | ||
rpk topic alter-config <topic-name> --set redpanda.iceberg.mode=key_value | ||
---- | ||
|
||
. Produce to the topic. For example, | ||
+ | ||
[,bash] | ||
---- | ||
echo "hello world\nfoo bar\nbaz qux" | rpk topic produce <topic-name> --format='%k %v\n' | ||
---- | ||
|
||
You should see the topic as a table in Open Catalog. | ||
|
||
. In Open Catalog, select *Catalogs*, then open your catalog. | ||
. Under your catalog, you should see the `redpanda` namespace, and a table with the name of your topic. The `redpanda` namespace and the table are automatically added for you. | ||
|
||
== Query Iceberg table in Snowflake | ||
|
||
To query the topic in Snowflake, you must create a https://docs.snowflake.com/en/user-guide/tables-iceberg#catalog-integration[catalog integration^] so that Snowflake has access to the table data and metadata. | ||
|
||
=== Configure catalog integration with Snowflake | ||
|
||
. Run the https://docs.snowflake.com/sql-reference/sql/create-catalog-integration-open-catalog[`CREATE CATALOG INTEGRATION`] command in Snowflake: | ||
+ | ||
[,sql] | ||
---- | ||
CREATE CATALOG INTEGRATION <catalog-integration-name> | ||
CATALOG_SOURCE = POLARIS | ||
TABLE_FORMAT = ICEBERG | ||
CATALOG_NAMESPACE = 'redpanda' | ||
REST_CONFIG = ( | ||
CATALOG_URI = '<open-catalog-uri>' | ||
WAREHOUSE = '<open-catalog-name>' | ||
) | ||
REST_AUTHENTICATION = ( | ||
TYPE = OAUTH | ||
OAUTH_CLIENT_ID = '<open-catalog-connection-client-id>' | ||
OAUTH_CLIENT_SECRET = '<open-catalog-connection-client-secret>' | ||
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL') | ||
) | ||
REFRESH_INTERVAL_SECONDS = 30 | ||
ENABLED = TRUE; | ||
---- | ||
+ | ||
Use your own values for the following placeholders: | ||
+ | ||
- `<catalog-integration-name>`: Provide a name for your Iceberg catalog integration in Snowflake. | ||
- `<open-catalog-uri>`: Your https://docs.snowflake.com/en/sql-reference/sql/create-catalog-integration-open-catalog#required-parameters[Open Catalog account URI] (`https://<snowflake-orgname>-<account-name>.snowflakecomputing.com/polaris/api/catalog`). | ||
- `<open-catalog-name>`: The name of your catalog in Open Catalog. | ||
- `<open-catalog-connection-client-id>`: The client ID of the service connection you created in an earlier step. | ||
- `<open-catalog-connection-client-secret>`: The client secret of the service connection you created in an earlier step. | ||
|
||
. Run the following command to verify that the catalog is integrated correctly: | ||
+ | ||
[,sql] | ||
---- | ||
SELECT SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog-integration-name>'); | ||
---- | ||
+ | ||
[,bash,role="no-copy no-placeholders"] | ||
---- | ||
# Example result for redpanda.iceberg.mode=key_value | ||
+-----------------------------------------------------------------------+ | ||
| SYSTEM$LIST_ICEBERG_TABLES_FROM_CATALOG('<catalog_integration_name>') | | ||
+-----------------------------------------------------------------------+ | ||
| [{"namespace":"redpanda","name":"<table_name>"}] | | ||
+-----------------------------------------------------------------------+ | ||
---- | ||
|
||
=== Create Iceberg table in Snowflake | ||
|
||
After creating the catalog integration, you must create an externally-managed table in Snowflake. You must run your Snowflake queries against this table. | ||
|
||
In your Snowflake database, run the https://docs.snowflake.com/en/sql-reference/sql/create-iceberg-table-rest[CREATE ICEBERG TABLE] command. The following example also specifies that the table should automatically refresh metadata: | ||
|
||
[,sql] | ||
---- | ||
CREATE ICEBERG TABLE <table-name> | ||
CATALOG = '<catalog-integration-name>' | ||
EXTERNAL_VOLUME = '<iceberg-external-volume-name>' | ||
CATALOG_TABLE_NAME = '<topic-name>' | ||
AUTO_REFRESH = TRUE | ||
---- | ||
|
||
Use your own values for the following placeholders: | ||
|
||
- `<table-name>`: Provide a name for your table in Snowflake. | ||
- `<catalog-integration-name>`: The name of the catalog integration you configured in an earlier step. | ||
- `<iceberg-external-volume-name>`: The name of the external volume you configured using the Tiered Storage bucket. | ||
- `<topic-name>`: The name of the table in your catalog, which is the same as your Redpanda topic name. | ||
|
||
=== Query table | ||
|
||
To verify that Snowflake has successfully created the table containing the topic data, run the following: | ||
|
||
[,sql] | ||
---- | ||
SELECT * FROM <table-name>; | ||
---- | ||
|
||
Your query results should look like the following: | ||
|
||
[,bash,role=no-copy] | ||
---- | ||
# Example for redpanda.iceberg.mode=key_value with 3 records produced to topic | ||
+--------------------------------------------------------------------------------------------------------------+------------+ | ||
| REDPANDA | VALUE | | ||
+--------------------------------------------------------------------------------------------------------------+------------+ | ||
| { "partition": 0, "offset": 0, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "68656C6C6F"} | 776F726C64 | | ||
| { "partition": 0, "offset": 1, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "666F6F"} | 626172 | | ||
| { "partition": 0, "offset": 2, "timestamp": "2025-02-07 16:29:50.122", "headers": null, "key": "62617A" } | 717578 | | ||
+--------------------------------------------------------------------------------------------------------------+------------+ | ||
---- |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters