Conversation
Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.
There was a problem hiding this comment.
Thanks @snazy !
-
The maintenance configs are not scoped down to NoSql which is required as per concencus, dev list discussion :
Prev discussion : #3135 (comment)
dev list discussion: https://lists.apache.org/thread/vmdpb45j1nmmlhswz7fm87gxgoldmr28 -
Unclear why maintenance module should be part of Polaris repo and not Polaris tools ? In Nosql presentation this module is not mentioned either
https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24
can you please open dev list thread for the same -
Using https://github.com/projectnessie/cel-java/ for rules, why such rules are required in the first place ? why not use https://github.com/google/cel-java directly which is actively maintained by google developers vs using https://github.com/projectnessie/cel-java/ which has just done dependency updates this year https://github.com/projectnessie/cel-java/pulls?page=8&q=is%3Apr+is%3Aclosed
| } | ||
| } | ||
| }, | ||
| x -> {}); |
There was a problem hiding this comment.
can we have meaningful name what does x denotes ?
...va/org/apache/polaris/persistence/nosql/metastore/maintenance/CatalogsMaintenanceConfig.java
Outdated
Show resolved
Hide resolved
| * Polaris stores a history of changes per kind of object (principals, principal roles, grants, | ||
| * immediate tasks, catalog roles and catalog state). | ||
| * | ||
| * <p>The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL |
There was a problem hiding this comment.
why are we using this library to define the rules, have we considered alternatives ? what kind of rules is required ?
There was a problem hiding this comment.
projectnessie/cel-java is an implementation of the CEL spec in java. Unlike Google's impl. it also works with Jackson.
IMHO, CEL is simple and performant, while wielding a decent expressive power. I think it is a reasonable initial option for the rules language. I suggest merging this PR witl CEL for the sake of making progress. It is certainly open for discussion if alternatives are proposed later.
...va/org/apache/polaris/persistence/nosql/metastore/maintenance/CatalogsMaintenanceConfig.java
Outdated
Show resolved
Hide resolved
The end-to-end maintenance workflow naturally includes some "service" code and "tool / CLI" code. This PR contains only the service portion, which has to align with the NoSQL metastore implementation, so it belongs in this repository, IMHO. |
dennishuo
left a comment
There was a problem hiding this comment.
Looks like understanding #3028 and #3077 are somewhat important to understanding this PR in context and there's not a lot of discussion in those to help explain the overall maintenance design here.
Looks like the main maintenance README https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/README.md just points at https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/api/src/main/java/org/apache/polaris/persistence/nosql/maintenance/api/package-info.java which is certainly useful for understanding the high-level mark-and-sweep approach, but doesn't quite cover how admins are actually expected to interact with the maintenance service (e.g. is it normally supposed to be bundled into the main Polaris service fat jar but invoked as a separate main? Is it supposed to be an always-running singleton service that runs in an infinite loop? Scheduled as a crontab? etc)
Maybe providing a user guide README (or pointing at it in these PRs if it already exists somewhere else) could help answer some of the questions others had such as how to actually configure the CEL expressions and what exactly is meant by The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL script</a>
Seeing the end-to-end admin flow would also help contextualize the split of "server-side" and "client-side" code related to this maintenance, that @dimas-b talked about
...va/org/apache/polaris/persistence/nosql/metastore/maintenance/CatalogsMaintenanceConfig.java
Outdated
Show resolved
Hide resolved
| String DEFAULT_CATALOGS_HISTORY_RETAIN = "false"; | ||
| String DEFAULT_CATALOG_ROLES_RETAIN = "false"; | ||
| String DEFAULT_CATALOG_POLICIES_RETAIN = "ageDays < 30 || commits <= 1"; | ||
| String DEFAULT_CATALOG_STATE_RETAIN = "ageDays < 30 || commits <= 1"; |
There was a problem hiding this comment.
Is the ageDays condition the only thing to distinguish between "staged but not yet committed" states that are being assembled as part of a transaction and old states that were once committed/valid but are expired now?
There was a problem hiding this comment.
This has nothing to do with "not yet committed" data.
This maintenance cleans up stale + unreferenced as well as "unwanted" history. What "unwanted" means is defined by these defaults, which can optionally be customized by users.
There was a problem hiding this comment.
I do not think this is the "only thing". This rule is just a fairly safe common sense default.
Hypothetically (ATM), the Admin tool could involve the maintenance code to perform its job. Other implementations are possible too. It's an open-ended design where downstream projects are free to implement their own triggering mechanisms. I imagine k8s jobs is one of possible options. |
|
Thanks all for the helpful reviews! Documentation of "everything" is definitely planned to be added to the web site once everything is in. I have renamed the config mapping. |
…ava/org/apache/polaris/persistence/nosql/metastore/maintenance/CatalogsMaintenanceConfig.java Co-authored-by: Prashant Singh <35593236+singhpk234@users.noreply.github.com>
dimas-b
left a comment
There was a problem hiding this comment.
I propose to merge this PR in it current state for the sake of making progress (3 weeks in review already :) ). If further changes are needed, let's discuss and address on main.
|
@snazy can you please explain how were the feedbacks mentioned here addressed : #3268 (review) specially request for raising this is the devlist since this module hasn't been discussed anywhere. I am not sure until we have resolved the feedbacks how can we move forward, if the feedbacks is not addressed for 3 week hence the pr is opened for 3 weeks. |
|
@singhpk234 : The config concern was in fact addressed. Config names are like Re: the location of this code - I commented above. I do believe it explained the intention. A possible user-side tool is provided in #3395 Re: CEL, please see my comment above. This is why Nessie's CEL impl. is convenient. Please note that |
|
I appreciate the additional context and the link to the user-side tool. However, my primary concern remains the introduction of new constructs—like retention expressions—that don’t currently exist in Polaris. This feels like a significant architectural shift that warrants a broader discussion on the dev list to reach project-wide consensus. Regarding the cel-java dependency: my concern isn't its origin, but its long-term maintenance health. Given that we have diverging views on this merge (+1 vs -1), I believe the most collaborative path forward is to revert the PR for now. This would allow us to have that broader discussion on the dev list without the pressure of the code already being in the main branch. |
|
Let's discuss on |
|
About the concern or about the revert ? |
It is a real concern. Before the concern was resolved, I don't think the PR should be merged. It's not acceptable to do so in an ASF project. |
|
I mean any/all issues brought up here are probably best discussed on |
|
@dimas-b , would you mind sharing links about the discussion and conclusion of consensus of the following item?
|
|
@flyrain : my points about these items are available in this PR's comments trail. The need for rules is evident from the related java code. If you prefer to go deeper into these aspects please bring your concerns to the |
|
Related |
Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.
* feat: Add AWS STS Session Tags support for credential vending (apache#3327) * feat: Add AWS STS Session Tags support for credential vending Fixes apache#3325 * Minor fixes * Review comments * Review comments. * Remove request_id from the PR * Review comment: added activated roles * Review comments * Update AwsSessionTagsBuilder.java * Spotless fixes * Update StorageCredentialCacheKey.java * fix failing tests * Review comments. * Reverting last commit based on review comments for cache collisions. --------- Co-authored-by: Anand Kumar Sankaran <anand.sankaran@workday.com> * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-2.1767639862 (apache#3383) * Added HttpRoute and Gateway to Helm Chart (apache#3314) * Added httproute and gateway * added to readme * updated helm site docs * updated changelog * Added tests * fixed broken test * fixed test part 2 * removed odd comment * added check for httproute and gateway * shuffled the gateway documentation * better gateway instructions * removed extra case in validateRouting * Errorprone: prepare for v2.46.0 (apache#3384) This tackles the current failure in apache#3382: `-XDaddTypeAnnotationsToSymbol=true is required by Error Prone on JDK 21` * NoSQL: Metastore maintenance (apache#3268) Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database. * Update release workflows to use the new GPG private key (apache#3399) * [tech debt] Cleanup `gradle/libs.versions.toml` (apache#3394) This change removes unreferenced dependency references, inlines single usages of a version-reference and adds a dependency for checkstyle to get that version managed by renovate. * fix(deps): update dependency org.keycloak:keycloak-admin-client to v26.0.8 (apache#3405) * fix(deps): update dependency com.google.errorprone:error_prone_core to v2.46.0 (apache#3382) * [doc]: Add Minio OSS disclaimer (apache#3390) * [Releasy] Let Maven artifact publication propagate failures (apache#3407) The Gradle build to publish the Maven artifacts is invoked like `./gradlew ... | tee <log>`. The (overall) exit code of pipes is the exit code of the _last_ command. The exit codes of all pipe "parts" is available in bash's `PIPESTATUS` array and needs to be checked. * Fix Gradle up-to-date of jars (apache#3401) The change apache#1036 added the Maven pom.xml to all built jars. However, the `GenerateMavenPom` Gradle task type is never up-to-date, in consequence no jar can ever be up-to-date, leading to unnecessarily longer build times. This change ensures that the pom.xml is included in release builds, but not in developer/snapshot builds. * fix(deps): update dependency net.ltgt.gradle:gradle-errorprone-plugin to v4.4.0 (apache#3406) * fix(deps): update dependency com.puppycrawl.tools:checkstyle to v13 (apache#3403) * fix(deps): update dependency io.opentelemetry:opentelemetry-bom to v1.58.0 (apache#3408) * fix(deps): update dependency software.amazon.awssdk:bom to v2.41.5 (apache#3416) * chore(deps): update dependency jupyterlab to v4.5.2 (apache#3419) * Release workflows should retry svn checkout in case of failure (apache#3393) * Change parser option to be required (apache#3413) * Support hierarchical namespace in Azure (apache#3347) * Support hierarchical namespace in Azure * Add `hierarchical` to `AzureStorageConfigInfo` (the default is unset translating to current behaviour). * Use `DataLakeDirectoryClient` instead of `DataLakeFileSystemClient` for generating SAS tokens when `hierarchical` is set to `true`. * Add `cloudTest` classes for testing with credential vending in ADLS. * chore(test): Increase Authorization Test Coverage (apache#3332) * increase test coverage * update docstrings * clean up * make use of same helper method * remove duplicate tests from OpaIntegrationTest * use tempdir * fix(deps): update dependency io.micrometer:micrometer-bom to v1.16.2 (apache#3422) * chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-2.1767878250 (apache#3421) * Last merged commit 1996156 * Add free-disk-space action * Add free-disk-space action to regtest + spark_client_regtests --------- Co-authored-by: Anand K Sankaran <lists@anands.net> Co-authored-by: Anand Kumar Sankaran <anand.sankaran@workday.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: cccs-cat001 <56204545+cccs-cat001@users.noreply.github.com> Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr> Co-authored-by: Yong Zheng <yongzheng0809@gmail.com> Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.