-
Notifications
You must be signed in to change notification settings - Fork 28
Add Polaris synchronization and migration tool to polaris-tools. #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
eric-maynard
merged 19 commits into
apache:main
from
mansehajsingh:polaris-migrator-only
Apr 18, 2025
Merged
Changes from all commits
Commits
Show all changes
19 commits
Select commit
Hold shift + click to select a range
f2cd430
Initial commit of synchronizer code
mansehajsingh b03958e
Added optional principal migrations
mansehajsingh b24f648
Updated tests to accommodate principal creation
mansehajsingh 2afaff4
Added migration of principal roles to principals
mansehajsingh 05f2988
Addressed comments
mansehajsingh 05a2803
Updated tests
mansehajsingh a0d024d
Add generic Polaris entity source and target- not tied to API
mansehajsingh e7431f9
Updated docs
mansehajsingh 8604742
Remove type in options
mansehajsingh 67c3e41
update docs
mansehajsingh 2ea47ed
Added license headers
mansehajsingh e06f149
Merge pull request #4 from mansehajsingh/generalize-polaris-service
mansehajsingh fbf51a5
Add hard failure flag
mansehajsingh c4a2539
make flag final
mansehajsingh 5f1d35a
Added explanation to README.md
mansehajsingh 4491f9f
Make ETagManager configurable
mansehajsingh 0525d9a
Add configurable oauth server for omnipotent principal
mansehajsingh 6f551b4
Set iceberg write access as connection property explicitly outside of…
mansehajsingh 8cb745d
Add external id to aws storage config ignore list
mansehajsingh File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,74 @@ | ||
|
|
||
| ### Java ### | ||
| # Compiled class file | ||
| *.class | ||
|
|
||
| # Log file | ||
| *.log | ||
|
|
||
| # BlueJ files | ||
| *.ctxt | ||
|
|
||
| # Mobile Tools for Java (J2ME) | ||
| .mtj.tmp/ | ||
|
|
||
| # Package Files # | ||
| *.jar | ||
| *.war | ||
| *.nar | ||
| *.ear | ||
| *.zip | ||
| *.tar.gz | ||
| *.rar | ||
|
|
||
| # virtual machine crash logs, see http://www.java.com/en/download/help/error_hotspot.xml | ||
| hs_err_pid* | ||
|
|
||
| #misc | ||
| target/ | ||
| dependency-reduced-pom.xml | ||
| *.patch | ||
| *.DS_Store | ||
| .DS_Store | ||
|
|
||
| #intellij | ||
| *.iml | ||
| .idea | ||
| *.ipr | ||
| *.iws | ||
|
|
||
| # vscode | ||
| .vscode | ||
|
|
||
| # node | ||
| node_modules/ | ||
| ui/src/generated/ | ||
|
|
||
| # Eclipse IDE | ||
| .classpath | ||
| .factorypath | ||
| .project | ||
| .settings | ||
| .checkstyle | ||
| out/ | ||
|
|
||
| # gradle | ||
| .gradle/ | ||
| build/ | ||
| gradle/wrapper/gradle-wrapper.jar | ||
| version.txt | ||
|
|
||
| # Python venv | ||
| venv/ | ||
|
|
||
| # Maven flatten plugin | ||
| .flattened-pom.xml | ||
|
|
||
| # Site | ||
| site/site | ||
|
|
||
| # Ignore Gradle project-specific cache directory | ||
| .gradle | ||
|
|
||
| # Ignore Gradle build output directory | ||
| build |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,173 @@ | ||
| # Objective | ||
|
|
||
| To provide users of [Apache Polaris (Incubating)](https://github.com/apache/polaris) a tool to be able to easily and efficiently | ||
| migrate their entities from one Polaris instance to another. | ||
|
|
||
| Polaris is a catalog of catalogs. It can become cumbersome to perform catalog-by-catalog migration of each and every catalog contained | ||
| within a Polaris instance. Additionally, while migrating catalog-by-catalog Iceberg entities is achievable using the | ||
| existing generic [iceberg-catalog-migrator](../iceberg-catalog-migrator/README.md), the existing tool will not migrate | ||
| Polaris specific entities, like principal-roles, catalog-roles, grants. | ||
|
|
||
| ## Use Cases | ||
| * **Migration:** A user may have an active Polaris deployment that they want to migrate to a managed cloud offering like | ||
| [Snowflake Open Catalog](https://www.snowflake.com/en/product/features/open-catalog/). | ||
| * **Preventing Vendor Lock-In:** A user may currently have a managed Polaris offering and want the freedom to switch providers or to host Polaris themselves. | ||
| * **Backup:** Modern data solutions often require employing redundancy. This tool can be run on a periodic cron to keep snapshots of a Polaris instance. | ||
|
|
||
| In the case of migration to/from a cloud offering, access to the Polaris metastore is possibly limited or entirely restricted. | ||
| This tool instead uses the Polaris REST API to perform the migration/synchronization. | ||
|
|
||
| The tool currently supports migrating the following Polaris Management entities: | ||
| * Optionally, Principals (with `--sync-principals` flag). Credentials will be different on the target instance. | ||
| * Optionally, assignment of Principal Roles to Principals (with `--sync-principals` flag) | ||
| * Principal roles | ||
| * Catalogs | ||
| * Catalog Roles | ||
| * Assignment of Catalog Roles to Principal Roles | ||
| * Grants | ||
|
|
||
| The tool currently supports migrating the following Iceberg entities: | ||
| * Namespaces | ||
| * Tables | ||
|
|
||
| # Building the Tool from Source | ||
|
|
||
| **Prerequisite:** Must have Java installed in your machine (Java 21 is recommended as the minimum Java version) to use this CLI tool. | ||
|
|
||
| ``` | ||
| gradlew build # build and run tests | ||
| gradlew assemble # build without running tests | ||
| ``` | ||
|
|
||
| The default build location for the built JAR will be `cli/build/libs/` | ||
|
|
||
| # Migrating between Polaris Instances | ||
|
|
||
| ### Step 1: Create a principal with read-only access to catalog internals on the source Polaris instance. | ||
|
|
||
| **This step only has to be completed once.** | ||
|
|
||
| Polaris is built with a separation between access and metadata management permissions. The `service_admin` | ||
| may have permissions to create access related entities like principal roles, catalog roles, and grants, but may not necessarily | ||
| possess the ability to view Iceberg content of catalogs, like namespaces and tables. We need to create a super user principal | ||
| that has access to all entities on the source Polaris instance in order to migrate them. | ||
|
|
||
| To do this, we can use the `create-omnipotent-principal` command to create a principal, principal role, | ||
| and a catalog role per catalog with the appropriate grants to read all entities on the source Polaris instance. | ||
|
|
||
| **Example:** Create a **read-only** principal on the source Polaris instance, and replace it if it already exists, | ||
| with 10 concurrent catalog setup threads: | ||
| ``` | ||
| java -jar cli/build/libs/polaris-synchronizer-cli.jar create-omnipotent-principal \ | ||
| --polaris-api-connection-properties base-url=http://localhost:8181 \ | ||
| --polaris-api-connection-properties oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens \ | ||
| --polaris-api-connection-properties client-id=root \ | ||
| --polaris-api-connection-properties client-secret=<client_secret> \ | ||
| --polaris-api-connection-properties scope=PRINCIPAL_ROLE:ALL \ | ||
| --replace \ # replace it if it already exists | ||
| --concurrency 10 # 10 concurrent catalog setup threads | ||
| ``` | ||
|
|
||
| Upon finishing execution, the tool will output the principal name and client credentials for this | ||
| principal. **Make sure to note these down as they will be necessary for the migration step.** | ||
|
|
||
| **Example Output:** | ||
| ``` | ||
| ====================================================== | ||
| Omnipotent Principal Credentials: | ||
| name = omnipotent-principal-XXXXX | ||
| clientId = ff7s8f9asbX10 | ||
| clientSecret = <client-secret> | ||
| ====================================================== | ||
| ``` | ||
|
|
||
| Additionally, at the end of execution the command will output a list of catalogs for which catalog setup failed. | ||
| **These catalogs may experience failure during migration**. | ||
|
|
||
| **Example Output:** | ||
| ``` | ||
| Encountered issues creating catalog roles for the following catalogs: [catalog-1, catalog-2] | ||
| ``` | ||
|
|
||
| ### Step 2: Create a principal with read-write access to catalog internals on the target Polaris instance. | ||
|
|
||
| **This step only has to be completed once.** | ||
|
|
||
| The same `create-omnipotent-principal` command can also be used to now create a **read-write** principal on the target | ||
| Polaris instance so that the tool can create entities on the target. | ||
|
|
||
| To create a read-write principal, we simply specify the `--write-access` option. | ||
|
|
||
| **Example:** Create a read-write principal on your target Polaris instance, replacing it if it exists, with 10 concurrent | ||
| catalog setup threads. | ||
| ``` | ||
| java -jar cli/build/libs/polaris-synchronizer-cli.jar \ | ||
| create-omnipotent-principal \ | ||
| --polaris-api-connection-properties base-url=http://localhost:8181 \ | ||
| --polaris-api-connection-properties oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens \ | ||
| --polaris-api-connection-properties client-id=root \ | ||
| --polaris-api-connection-properties client-secret=<client_secret> \ | ||
| --polaris-api-connection-properties scope=PRINCIPAL_ROLE:ALL \ | ||
| --replace \ # replace if it already exists | ||
| --concurrency 10 \ # 10 concurrent catalog setup threads | ||
| --write-access # give the principal write access to catalog internals | ||
| ``` | ||
|
|
||
| Similarly to the last step, the tool will output the client credentials and principal name. Again, these need to be noted | ||
| for subsequent steps. | ||
|
|
||
| **Example Output:** | ||
| ``` | ||
| ====================================================== | ||
| Omnipotent Principal Credentials: | ||
| name = omnipotent-principal-YYYYY | ||
| clientId = 0af20a3a0037a40d | ||
| clientSecret = <client-secret> | ||
| ====================================================== | ||
| ``` | ||
|
|
||
| > :warning: `service_admin` is not guaranteed to have access management level grants on every catalog. This is usually | ||
| > delegated to the `catalog_admin` role, which is automatically granted to whichever principal role was used to create | ||
| > the catalog. This means that while the tool can detect this catalog when run with `service_admin` level access, | ||
| > it cannot create an omnipotent principal for this catalog. To remedy this, create a catalog-role with `CATALOG_MANAGE_ACCESS` | ||
| > grants for the catalog, and assign it to the principal used to run this tool (presumably, a principal with the `servic_admin` | ||
| > principal role). Then, re-running `create-omnipotent-principal` should be able to create the relevant entities for that catalog. | ||
|
|
||
| ### Step 3: Running the Migration/Synchronization | ||
|
|
||
| Running the synchronization requires minimal reconfiguration, can be run idempotently, and will attempt to only copy over the | ||
| diff between the source and target Polaris instances. This can be achieved using the `sync-polaris` command. | ||
|
|
||
| > :warning: If you want to migrate principals and their assignments to principal-roles as well, run the tool with the | ||
| > `--sync-principals` flag. Please note that this will reset the client credentials for that principal on the target | ||
| > Polaris instance. The new credentials will be logged to stdout, ONLY for each newly created or overwritten principal. | ||
| > Please note that this output should be securely managed, client credentials should only ever be stored in a secure vault. | ||
|
|
||
| **Example** Running the synchronization between source Polaris instance using an access token, and a target Polaris instance | ||
| using client credentials. | ||
| ``` | ||
| java -jar cli/build/libs/polaris-synchronizer-cli.jar sync-polaris \ | ||
| --source-properties base-url=http://localhost:8181 \ | ||
| --source-properties client-id=root \ | ||
| --source-properties client-secret=<client_secret> \ | ||
| --source-properties oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens \ | ||
| --source-properties scope=PRINCIPAL_ROLE:ALL \ | ||
| --source-properties omnipotent-principal-name=omnipotent-principal-XXXXX \ | ||
| --source-properties omnipotent-principal-client-id=589550e8b23d271e \ | ||
| --source-properties omnipotent-principal-client-secret=<omni_client_secret> \ | ||
| --source-properties omnipotent-principal-oauth2-server-uri=http://localhost:8181/api/catalog/v1/oauth/tokens \ | ||
| --target-properties base-url=http://localhost:5858 \ | ||
| --target-properties client-id=root \ | ||
| --target-properties client-secret=<client_secret> \ | ||
| --target-properties oauth2-server-uri=http://localhost:5858/api/catalog/v1/oauth/tokens \ | ||
| --target-properties scope=PRINCIPAL_ROLE:ALL \ | ||
| --target-properties omnipotent-principal-name=omnipotent-principal-YYYYY \ | ||
| --target-properties omnipotent-principal-client-id=9b8ac0f1e4e2e614 \ | ||
| --target-properties omnipotent-principal-client-secret=<omni_client_secret> \ | ||
| --target-properties omnipotent-principal-oauth2-server-uri=http://localhost:5858/api/catalog/v1/oauth/tokens | ||
| ``` | ||
|
|
||
| > :warning: The tool will not migrate the `service_admin`, `catalog_admin`, nor the omnipotent principals from the source | ||
| > nor remove or modify them or their assignments to principals/principal-roles on the target. This is to accommodate that | ||
| > the tool itself will be running with the permission levels for these principals and roles, and we do not want to modify | ||
| > the tool's permissions at runtime. | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.