-
Notifications
You must be signed in to change notification settings - Fork 375
NoSQL: Metastore implementation #3237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dennishuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still want to take a closer look at the rename/move logic, but the basic updateIfNotChanged flow looks good to me.
Left some initial comments. In general, it would be nice to expand javadoc comments whenever there are implicit format/parsing conventions defined in code, such as in the GrantTriplet (or class javadoc can link to wherever it's explained elsewhere if I just didn't see the file where it's documented).
| return identifier(namespace.levels()); | ||
| } | ||
|
|
||
| static ContentIdentifier identifierFromLocationString(String locationString) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other factory methods are mostly self-explanatory, but this one could use some javadocs explaining what locationString is, expected format, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, added
| var off = -1; | ||
| for (var i = 0; i < len; i++) { | ||
| var c = locationString.charAt(i); | ||
| checkArgument(c >= ' ', "Control characters are forbidden in locations"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really intended to support all chars with ascii codepoint > ' ' or is there a more intuitive regex? In particular, though I don't know all ascii codes offhand, a cursory search seems to indicate for example that 127 is the "Delete control character" even though it comes after all the basic symbols and letters.
Maybe Character.isJavaIdentifierPart will help provide a better standard constrained set while also making it easier to reason about when reading the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This identifierFromLocationString is used (only) for location-overlap handling, assuming that the storage locations are already validated. Adding assumptions here about the actual format of object-store specific conventions or valid file-path chars feels a bit too risky?
...etastore/src/main/java/org/apache/polaris/persistence/nosql/metastore/ContentIdentifier.java
Show resolved
Hide resolved
...e/metastore/src/main/java/org/apache/polaris/persistence/nosql/metastore/NoSqlMetaStore.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/polaris/persistence/nosql/metastore/mutation/MutationResults.java
Outdated
Show resolved
Hide resolved
...tastore/src/main/java/org/apache/polaris/persistence/nosql/metastore/privs/GrantTriplet.java
Outdated
Show resolved
Hide resolved
...e/src/main/java/org/apache/polaris/persistence/nosql/metastore/mutation/MutationAttempt.java
Outdated
Show resolved
Hide resolved
| mutationResults.entityResult(ENTITY_NOT_FOUND); | ||
| return; | ||
| } | ||
| var originalRef = byName.get(originalNameKey); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the scenarios that can lead to this byName not having this originalNameKey that we just got from byId? Would it indicate byId and byName getting out of sync somehow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no thinkable scenario actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally I'd prefer to not have the byId index at all and just refer to entities by name.
But that requires a bigger change.
...e/src/main/java/org/apache/polaris/persistence/nosql/metastore/mutation/MutationAttempt.java
Outdated
Show resolved
Hide resolved
|
|
||
| private static final Logger LOGGER = LoggerFactory.getLogger(MutationAttempt.class); | ||
|
|
||
| public static ObjBase objForChangeComparison( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add javadoc comment for this one explaining intent and how it intends to canonicalize whichever fields for comparison.
Is this to detect cases where a real update actually happens to result in an entity whose full contents are unchanged, or is there a common concurrency situation that is expected to cause "no change" mutations (e.g. on some kind of retry?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's to detect whether a update is actually changing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that if someone does craft an update request which happens to be a no-op, that we'll skip the database write, meaning:
- The updateTimestamp won't change
- the entityVersion won't change
but event and audit-logging features would still show a successful update at the new timestamp? I can't remember if the current AtomicOperationMetaStore/TransactionalMetaStores do anything similar, or if this is divergent behavior.
...etastore/src/main/java/org/apache/polaris/persistence/nosql/metastore/ContentIdentifier.java
Show resolved
Hide resolved
dennishuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM. I'm not sure about how the "skip no-op updates" optimization if it's divergent from the existing persistence impls and what it would mean for discrepancies between audit logs and updatedTimestamp in persistence, but I'm okay just revisiting that discussion later or in a followup
|
Thanks for the reviews!
I think, it depends on the, say, "interpretation" of what's an update. |
* Doc cleanup for local deployment (apache#3213) * Doc cleanup for admin tool (apache#3214) * Bump version from 1.0.0 to 1.2.0 and fix health port (apache#3211) * fix(deps): update dependency io.opentelemetry:opentelemetry-bom to v1.57.0 (apache#3223) * fix(deps): update dependency org.apache.commons:commons-text to v1.15.0 (apache#3233) * fix(deps): update dependency software.amazon.awssdk:bom to v2.40.3 (apache#3234) * Core: Add timeout and retry logic to Azure token fetch (apache#3113) * update markdown lint check (apache#3187) use tcort/github-action-markdown-link-check, gaurav-nelson/github-action-markdown-link-check is deprecated * NoSQL: Add metastore types and mappings (apache#3207) Add the NoSQL specific metastore persistence types including the mapping from and to `*Polaris*Entity`. * NoSQL/nit: fix javadoc for `Realms` (apache#3229) * Fix build issue for docker not found when using latest docker desktop (apache#3227) * fix(deps): update dependency org.mongodb:mongodb-driver-sync to v5.6.2 (apache#3238) * fix(deps): update immutables to v2.12.0 (apache#3240) * fix(deps): update dependency io.micrometer:micrometer-bom to v1.16.1 (apache#3239) * [Core, Bug] CreateEntitiesIfNotExist/CreatePrincipal not return the same entity persisted. (apache#3219) The PR fixes the issue, "CreateEntitiesIfNotExist/CreatePrincipal not return the same entity persisted", by letting persistEntity return the entity persisted and include that in the EntityResult. The PR also include new unit tests to verify the behavior * (feat) doc: Update Makefile to fix admonitions in helm doc and remove redundant sections (apache#3232) * Change org.testcontainers:<dep> to org.testcontainers:testcontainers-<dep> (apache#3225) * Helm: add support for topologySpreadConstraints (apache#3216) * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.23-6.1764764731 (apache#3241) * NoSQL correctness tests: add missing `logback-test.xml` files (apache#3230) * Add Docker-based Ceph + Polaris cluster setup (apache#3022) --------- Co-authored-by: sarunas.svegzda <[email protected]> * Service: Remove *CommitTableEvent, Add *UpdateTableEvent to Transactions (apache#3195) * Update dependency pydantic to >=2.12.5,<2.13.0 (apache#2807) * fix(deps): update dependency com.github.dasniko:testcontainers-keycloak to v4.0.1 (apache#3244) * fix(deps): update mockito monorepo to v5.21.0 (apache#3245) * Allow retrieving a config directly from a `Map` (apache#3220) The current implementation deserializes the catalog configuration properties for each invocation of `getConfig*()` taking a `CatalogEntity`. This change adds another `getConfig*()` variant that takes a `Map` to allow call sites to memoize the properties, where possible. * Runtime/service: move getConfig() down to `IcebergCatalogHandler` (apache#3231) All catalog specific functionality is implemented in `IcebergCatalogHandler`, whereas `IcebergCatalogAdapter` is meant to act as a "REST wrapper" to it. This change moves the implementation of `getConfig` down to the handler, no functional changes. * chore(deps): update quay.io/ceph/ceph docker tag to v20 (apache#3242) * fix(deps): update quarkus platform and group to v3.30.3 (apache#3247) * NoSQL: Prepare for NoSQL tests (apache#3235) * Add an optional `bootstrapRealm()` implementation to `PolarisAuthzTestBase` * Allow extending `IcebergCatalogHandlerAuthzTest`, move tests to `AbstractIcebergCatalogHandlerAuthzTest` * No functional changes * Shell script to verify staged release candidate artifacts (apache#2824) Performs a bunch of verifications against a proposed (staged) release candidate using the new `tools/verify-release/verify-release.sh` script against Maven artifacts, main distributions and Helm chart. Checks: * GPG signature and checksum verifications * All expected artifacts are present * Build artifacts are reproducible (minus known exceptions) * jar files * Main distribution zip/tarball * Helm chart * Build passes. * DISCLAIMER/LICENSE/NOTICE files are present in artifacts that require those More information in the added web site page. Fixes apache#2822 --------- Co-authored-by: Pierre Laporte <[email protected]> * Core: Add GCP service account impersonation for credentials. (apache#3246) * fix(deps): update dependency ch.qos.logback:logback-classic to v1.5.22 (apache#3253) * fix(deps): update dependency com.google.cloud:google-cloud-iamcredentials to v2.80.0 (apache#3254) * feat: pass principal name as part of aws subscoped credentials session (apache#3224) * feat: pass principal name as part of aws subscoped credentials session name * feat: resolve principal from CurrentIdentityAssociation * fix: handle principal injection for async tasks * add feature flag for principal name include * add changelog, address comments * handle null identity, refactor tests * Added user token to the PolarisPrincipal (apache#3236) * Added user token to the PolarisPrincipal * added redacted * Fix compilation failures in GcpCredentialsStorageIntegrationTest (apache#3257) * chore(deps): update github artifact actions (apache#3260) * chore(deps): update medyagh/setup-minikube action to v0.0.21 (apache#3264) * NoSQL: Metastore implementation (apache#3237) * Fix typo in nosql (apache#3263) * Corrected a typo in a key configuration parameter in the 1.2.0 release notes (apache#3262) * fix(deps): update dependency software.amazon.awssdk:bom to v2.40.8 (apache#3271) * Add NOTES.txt to Helm chart with installation instructions (apache#3173) * Add NOTES.txt to Helm chart with installation instructions Provides port-forward commands, health check endpoint, and log viewing for users after installation. * Fix helm unittest for GH action (apache#3279) * [doc]: Doc fix for CLI usage (apache#3215) * [doc]: Add doc for helm prod deployment (apache#3265) * chore(deps): update docker.io/prom/prometheus docker tag to v3.8.1 (apache#3282) * chore(deps): update dependency jupyterlab to v4.5.1 (apache#3275) * fix(deps): update dependency com.google.cloud:google-cloud-storage-bom to v2.61.0 (apache#3274) * chore(deps): update dependency mypy to >=1.19, <=1.19.1 (apache#3272) * Bump to 1.4.0-incubating-SNAPSHOT (apache#3181) * Bump to 1.4.0-incubating-SNAPSHOT * Update Python client version * Add exclude check note in the release guide (apache#3182) * Add exclude check note in the release guide * Update site/content/release-guide.md Co-authored-by: Robert Stupp <[email protected]> --------- Co-authored-by: Robert Stupp <[email protected]> * docs(tools): Create the Tools Section in the Docs (apache#3189) * fix(deps): update dependency org.apache.logging.log4j:log4j-core to v2.25.3 (apache#3283) * Add Polaris Community Meeting 20251211 (apache#3284) * chore(deps): update dependency pre-commit to v4.5.1 (apache#3286) * fix(deps): update dependency com.google.cloud:google-cloud-iamcredentials to v2.81.0 (apache#3287) * ensure AddressResolver supports localhost even if ipv6 is disabled in sysctl but not /etc/hosts (apache#3285) * Migrate to Jackson mapper builder pattern (apache#3269) Mappers and factories are fully immutable objects in Jackson 3. This change is rather a no-op, but migrates the code to use the builder-pattern. This is only a little building-block for "real" Jackson 3 support, there's more to do and more that's required from other frameworks. * fix(deps): update quarkus platform and group to v3.30.4 (apache#3291) * Rework release guide to include workflows (apache#3273) * Add a release guides section * Rename current release guide to manual (deprecated) * Add new semi-automated release guide * Move release verification guide under release guides section * Add scss style for better screenshot separation * Add redirection from old pages to new ones Co-authored-by: Robert Stupp <[email protected]> * Site: Fix typos in release guide (apache#3296) * [chore]: Match openapi-generator-cli version in build system to dependency (apache#3266) * Fix openapi-generator-cli version in build system * Fix openapi-generator-cli version in build system * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-1 (apache#3297) * chore(deps): update dependency openapi-generator-cli to v7.17.0 (apache#3298) * chore(deps): update docker.io/mongo docker tag to v8.2.3 (apache#3299) * chore(deps): update mongo docker tag to v8.2.3 (apache#3300) * fix(deps): update dependency io.smallrye.config:smallrye-config-core to v3.15.0 (apache#3302) * fix(deps): update dependency org.apache.httpcomponents.client5:httpclient5 to v5.6 (apache#3301) * chore(deps): update plugin com.gradle.develocity to v4.3 (apache#3248) * Unify mongo image ref (apache#3303) To prevent duplicate version-bump PRs like apache#3299 and apache#3300 * fix(deps): update dependency org.testcontainers:testcontainers-bom to v2.0.3 (apache#3277) * Disable sectionPagesMenu (apache#3312) * Remove docker-java.properties (apache#3307) * Ensure release can only run from specific SHA (apache#3295) * Ensure release publish workflow can only run from last RC (apache#3290) * Enable use of second release workflow for RC>0 * Patch 3rd workflow to support commits with multiple RC tags * Force 4th workflow to only run from a release branch * Update release guide to match new workflows * fix(deps): update dependency ch.qos.logback:logback-classic to v1.5.23 (apache#3308) * fix(deps): update dependency software.amazon.awssdk:bom to v2.40.13 (apache#3309) * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-2 (apache#3313) * NoSQL: reduce heap pressure when running tests Some tests generate a lot of realms, likely one realm per test case. While the amount of data per realm is not much, it is nontheless nice to remove that data immediately (for tests). The maintenance service, which purges data of eligible realms, cannot be run against the in-memory backend (different JVM). This change adds a rather "test only" workaround to purge the realm data in the in-memory backend immediately. * NoSQL: Metastore maintenance Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database. * NoSQL: Add to runtime-service * NoSQL: Add metastore-maintenance to admin tool * NoSQL: revert LICENSE file change * Last merged commit 62d774f --------- Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: fivetran-rahulprakash <[email protected]> Co-authored-by: Kevin Liu <[email protected]> Co-authored-by: Honah (Jonas) J. <[email protected]> Co-authored-by: Šarūnas Švėgžda <[email protected]> Co-authored-by: sarunas.svegzda <[email protected]> Co-authored-by: Adnan Hemani <[email protected]> Co-authored-by: Pierre Laporte <[email protected]> Co-authored-by: Talat UYARER <[email protected]> Co-authored-by: Tornike Gurgenidze <[email protected]> Co-authored-by: cccs-cat001 <[email protected]> Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: zgxme <[email protected]> Co-authored-by: Tamas Mate <[email protected]> Co-authored-by: JB Onofré <[email protected]> Co-authored-by: Adam Christian <[email protected]> Co-authored-by: Romain Manni-Bucau <[email protected]>
No description provided.