-
Notifications
You must be signed in to change notification settings - Fork 353
Revamping the README #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
236d089
54f4ee0
4e3cfdc
156e1af
c69c1a3
c4ec9e2
3b57851
7ddf56d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -16,184 +16,76 @@ | |
| --> | ||
|
|
||
| # Polaris Catalog | ||
|  | ||
|
|
||
| <a href="https://www.snowflake.com/blog/polaris-catalog-open-source/" target="_blank">Polaris Catalog</a> is an open source catalog for Apache Iceberg :tm:. Polaris Catalog implements Iceberg’s open <a href="https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml" target="_blank">REST API</a> for multi-engine interoperability with Apache Doris :tm:, Apache Flink® , Apache Spark :tm:, StarRocks and Trino. | ||
| Polaris is an open-source, fully-featured catalog for Apache Iceberg™. It implements Iceberg's | ||
| [REST API](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml), | ||
| enabling seamless multi-engine interoperability across a wide range of platforms, including Apache Doris™, Apache Flink®, | ||
| Apache Spark™, StarRocks, and Trino. | ||
|
|
||
|  | ||
| Documentation is available at https://polaris.io, including | ||
| [Polaris management API doc](https://polaris.io/index.html#tag/polaris-management-service_other) | ||
| and [Apache Iceberg REST API doc](https://polaris.io/index.html#tag/Configuration-API). | ||
|
|
||
| ## Status | ||
|
|
||
| ## Status | ||
| Polaris Catalog is open source under an Apache 2.0 license. | ||
|
|
||
| - ⭐ Star this repo if you’d like to bookmark and come back to it! | ||
| - 📖 Read the <a href="https://www.snowflake.com/blog/polaris-catalog-open-source/" target="_blank">announcement blog post<a/> for more details! | ||
|
|
||
| ## API Docs | ||
|
|
||
| API docs are hosted via Github Pages at https://polaris.io. All updates to the main branch | ||
| update the hosted docs. | ||
|
|
||
| The Polaris management API docs are found [here](https://polaris.io/index.html#tag/polaris-management-service_other) | ||
|
|
||
| The Apache Iceberg REST API docs are found [here](https://polaris.io/index.html#tag/Configuration-API) | ||
|
|
||
| Docs are generated using [Redocly](https://redocly.com/docs/cli/installation). They can be regenerated by running the following commands | ||
| from the project root directory | ||
|
|
||
| ```bash | ||
| docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli join spec/docs.yaml spec/polaris-management-service.yml spec/rest-catalog-open-api.yaml -o spec/index.yaml --prefix-components-with-info-prop title | ||
| docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli build-docs spec/index.yaml --output=docs/index.html --config=spec/redocly.yaml | ||
| ``` | ||
|
|
||
| # Setup | ||
|
|
||
| ## Requirements / Setup | ||
|
|
||
| - Java JDK >= 21, see [CONTRIBUTING.md](./CONTRIBUTING.md#java-version-requirements). | ||
| - Gradle - This is included in the project and can be run using `./gradlew` in the project root. | ||
| - Docker (Suggested Version: 27+) - If you want to run the project in a containerized environment or run integration tests. | ||
|
|
||
| Command-Line getting started | ||
| ------------------- | ||
| Polaris is a multi-module project with three modules: | ||
| ## Building and Running | ||
|
|
||
| Polaris is organized into the following modules: | ||
| - `polaris-core` - The main Polaris entity definitions and core business logic | ||
| - `polaris-server` - The Polaris REST API server | ||
| - `polaris-eclipselink` - The Eclipselink implementation of the MetaStoreManager interface | ||
|
|
||
| Build the binary (first time may require installing new JDK version). This build will run IntegrationTests by default. | ||
| Make sure docker is running, as the integration tests require a running docker daemon. | ||
|
|
||
| ``` | ||
| ./gradlew build | ||
| ``` | ||
|
|
||
| To skip tests. | ||
|
|
||
| ``` | ||
| ./gradlew assemble | ||
| ``` | ||
|
|
||
| Run the Polaris server locally on localhost:8181 | ||
|
|
||
| ``` | ||
| ./gradlew runApp | ||
| ``` | ||
|
|
||
| The server will start using the in-memory mode, and it will print its auto-generated credentials to STDOUT in a message like the following: | ||
|
|
||
| ```text | ||
| realm: default-realm root principal credentials: <id>:<secret> | ||
| ``` | ||
|
|
||
| These credentials can be used as "Client ID" and "Client Secret" in OAuth2 requests (e.g. the `curl` command below). | ||
|
|
||
| While the Polaris server is running, run regression tests, or end-to-end tests in another terminal | ||
|
|
||
| ``` | ||
| ./regtests/run.sh | ||
| ``` | ||
|
|
||
| Docker Instructions | ||
| ------------------- | ||
|
|
||
| Build the image: | ||
|
|
||
| ``` | ||
| docker build -t localhost:5001/polaris:latest . | ||
| ``` | ||
|
|
||
| Run it in a standalone mode. This runs a single container that binds the container's port `8181` to localhosts `8181`: | ||
|
|
||
| ``` | ||
| docker run -p 8181:8181 localhost:5001/polaris:latest | ||
| ``` | ||
|
|
||
| # Running the tests | ||
|
|
||
| ## Unit and Integration tests | ||
|
|
||
| Unit and integration tests are run using gradle. To run all tests, use the following command: | ||
|
|
||
| ```bash | ||
| ./gradlew test | ||
| ``` | ||
|
|
||
| ## Regression tests | ||
|
|
||
| Regression tests, or functional tests, are stored in the `regtests` directory. They can be executed in a docker | ||
| environment by using the `docker-compose.yml` file in the project root. | ||
|
|
||
| ```bash | ||
| docker compose up --build --exit-code-from regtest | ||
| ``` | ||
|
|
||
| They can also be executed outside of docker by following the setup instructions in | ||
| the [README](regtests/README.md) | ||
|
|
||
| # Kubernetes Instructions | ||
| ----------------------- | ||
|
|
||
| You can run Polaris as a mini-deployment locally. This will create two pods that bind themselves to port `8181`: | ||
|
|
||
| ``` | ||
| ./setup.sh | ||
| ``` | ||
|
|
||
| You can check the pod and deployment status like so: | ||
|
|
||
| ``` | ||
| kubectl get pods | ||
| kubectl get deployment | ||
| ``` | ||
|
|
||
| If things aren't working as expected you can troubleshoot like so: | ||
|
|
||
| ``` | ||
| kubectl describe deployment polaris-deployment | ||
| ``` | ||
|
|
||
| ## Creating a Catalog manually | ||
|
|
||
| Before connecting with Spark, you'll need to create a catalog. To create a catalog, generate a token for the root | ||
| principal: | ||
|
|
||
|
|
||
| Polaris is built using Gradle with Java 21+ and Docker 27+. | ||
| - `./gradlew build` - To build and run tests. Make sure Docker is running, as the integration tests depend on it. | ||
| - `./gradlew assemble` - To skip tests. | ||
| - `./gradlew test` - To run unit tests and integration tests. | ||
| - `./gradlew runApp` - To run the Polaris server locally on localhost:8181. | ||
| - The server starts with the in-memory mode, and it prints the auto-generated credentials to STDOUT in a message like this `realm: default-realm root principal credentials: <id>:<secret>` | ||
| - These credentials can be used as "Client ID" and "Client Secret" in OAuth2 requests (e.g. the `curl` command below). | ||
| - `./regtests/run.sh` - To run regression tests or end-to-end tests in another terminal. | ||
|
|
||
| Running in Docker | ||
| - `docker build -t localhost:5001/polaris:latest .` - To build the image. | ||
| - `docker run -p 8181:8181 localhost:5001/polaris:latest` - To run the image in standalone mode. | ||
| - `docker compose up --build --exit-code-from regtest` - To run regression tests in a Docker environment. | ||
|
|
||
| Running in Kubernetes | ||
| - `./setup.sh` - To run Polaris as a mini-deployment locally. This will create two pods that bind themselves to port `8181`. | ||
| - `kubectl get pods` - To check the status of the pods. | ||
| - `kubectl get deployment` - To check the status of the deployment. | ||
| - `kubectl describe deployment polaris-deployment` - To troubleshoot if things aren't working as expected. | ||
|
|
||
| Building docs | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To simplify the README more, we can move the doc building part to a more detailed doc. WDYT?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. From my POV it is good to have all essential building tips in the top-level README. I suppose contributors will be changing docs often. |
||
| - Docs are generated using [Redocly](https://redocly.com/docs/cli/installation). To regenerate them, run the following | ||
| commands from the project root directory. | ||
| ```bash | ||
| curl -i -X POST \ | ||
| http://localhost:8181/api/catalog/v1/oauth/tokens \ | ||
| -d 'grant_type=client_credentials&client_id=<principalClientId>&client_secret=<mainSecret>&scope=PRINCIPAL_ROLE:ALL' | ||
| ``` | ||
|
|
||
| The response output will contain an access token: | ||
|
|
||
| ```json | ||
| { | ||
| "access_token": "ver:1-hint:1036-ETMsDgAAAY/GPANareallyverylongstringthatissecret", | ||
| "token_type": "bearer", | ||
| "expires_in": 3600 | ||
| } | ||
| docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli join spec/docs.yaml spec/polaris-management-service.yml spec/rest-catalog-open-api.yaml -o spec/index.yaml --prefix-components-with-info-prop title | ||
| docker run -p 8080:80 -v ${PWD}:/spec docker.io/redocly/cli build-docs spec/index.yaml --output=docs/index.html --config=spec/redocly.yaml | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LGTM! This is a much tidier way of presenting this info -- agree with you that the earlier links are no longer relevant now that everything is on a single GitHub Page. 🚢 |
||
| ``` | ||
|
|
||
| Set the contents of the `access_token` field as the `PRINCIPAL_TOKEN` variable. Then use curl to invoke the | ||
| createCatalog | ||
| api: | ||
|
|
||
| ## Connecting from an Engine | ||
| To connect from an engine like Spark, first create a catalog with these steps: | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about creating a default catalog to make it even easier for first-timer? We can remove this section completely by doing that. |
||
| ```bash | ||
| $ export PRINCIPAL_TOKEN=ver:1-hint:1036-ETMsDgAAAY/GPANareallyverylongstringthatissecret | ||
|
|
||
| $ curl -i -X POST -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' \ | ||
| http://${POLARIS_HOST:-localhost}:8181/api/management/v1/catalogs \ | ||
| # Generate a token for the root principal, replacing <CLIENT_ID> and <CLIENT_SECRET> with | ||
| # the values from the Polaris server output. | ||
| export PRINCIPAL_TOKEN=$(curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \ | ||
| -d 'grant_type=client_credentials&client_id=<CLIENT_ID>&client_secret=<CLIENT_SECRET>&scope=PRINCIPAL_ROLE:ALL' \ | ||
| | jq -r '.access_token') | ||
|
Comment on lines
+77
to
+79
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This script has been verified manually |
||
|
|
||
| # Create a catalog named `polaris` | ||
| curl -i -X POST -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' \ | ||
| http://localhost:8181/api/management/v1/catalogs \ | ||
| -d '{"name": "polaris", "id": 100, "type": "INTERNAL", "readOnly": false, "storageConfigInfo": {"storageType": "FILE"}, "properties": {"default-base-location": "file:///tmp/polaris"}}' | ||
| ``` | ||
|
|
||
| This creates a catalog called `polaris`. From here, you can use Spark to create namespaces, tables, etc. | ||
|
|
||
| You must run the following as the first query in your spark-sql shell to actually use Polaris: | ||
|
|
||
| ``` | ||
| use polaris; | ||
| ``` | ||
| From here, you can use Spark to create namespaces, tables, etc. More details can be found in the | ||
| [Quick Start Guide](https://polaris.io/#section/Quick-Start/Using-Iceberg-and-Polarise). | ||
|
|
||
| ### Trademark Attribution | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may not need these k8s debug hints as well, but I'm fine with it.