Skip to content

NoSQL: Metastore maintenance#3268

Merged
snazy merged 4 commits intoapache:mainfrom
snazy:nosql-ms-maint
Jan 9, 2026
Merged

NoSQL: Metastore maintenance#3268
snazy merged 4 commits intoapache:mainfrom
snazy:nosql-ms-maint

Conversation

@snazy
Copy link
Member

@snazy snazy commented Dec 14, 2025

Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.

Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @snazy !

  1. The maintenance configs are not scoped down to NoSql which is required as per concencus, dev list discussion :
    Prev discussion : #3135 (comment)
    dev list discussion: https://lists.apache.org/thread/vmdpb45j1nmmlhswz7fm87gxgoldmr28

  2. Unclear why maintenance module should be part of Polaris repo and not Polaris tools ? In Nosql presentation this module is not mentioned either
    https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24
    can you please open dev list thread for the same

  3. Using https://github.com/projectnessie/cel-java/ for rules, why such rules are required in the first place ? why not use https://github.com/google/cel-java directly which is actively maintained by google developers vs using https://github.com/projectnessie/cel-java/ which has just done dependency updates this year https://github.com/projectnessie/cel-java/pulls?page=8&q=is%3Apr+is%3Aclosed

}
}
},
x -> {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have meaningful name what does x denotes ?

* Polaris stores a history of changes per kind of object (principals, principal roles, grants,
* immediate tasks, catalog roles and catalog state).
*
* <p>The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using this library to define the rules, have we considered alternatives ? what kind of rules is required ?

Copy link
Contributor

@dimas-b dimas-b Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

projectnessie/cel-java is an implementation of the CEL spec in java. Unlike Google's impl. it also works with Jackson.

IMHO, CEL is simple and performant, while wielding a decent expressive power. I think it is a reasonable initial option for the rules language. I suggest merging this PR witl CEL for the sake of making progress. It is certainly open for discussion if alternatives are proposed later.

@singhpk234 singhpk234 requested a review from dennishuo December 15, 2025 04:25
@dimas-b
Copy link
Contributor

dimas-b commented Dec 15, 2025

Unclear why maintenance module should be part of Polaris repo and not Polaris tools ?

The end-to-end maintenance workflow naturally includes some "service" code and "tool / CLI" code. This PR contains only the service portion, which has to align with the NoSQL metastore implementation, so it belongs in this repository, IMHO.

Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like understanding #3028 and #3077 are somewhat important to understanding this PR in context and there's not a lot of discussion in those to help explain the overall maintenance design here.

Looks like the main maintenance README https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/README.md just points at https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/api/src/main/java/org/apache/polaris/persistence/nosql/maintenance/api/package-info.java which is certainly useful for understanding the high-level mark-and-sweep approach, but doesn't quite cover how admins are actually expected to interact with the maintenance service (e.g. is it normally supposed to be bundled into the main Polaris service fat jar but invoked as a separate main? Is it supposed to be an always-running singleton service that runs in an infinite loop? Scheduled as a crontab? etc)

Maybe providing a user guide README (or pointing at it in these PRs if it already exists somewhere else) could help answer some of the questions others had such as how to actually configure the CEL expressions and what exactly is meant by The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL script</a>

Seeing the end-to-end admin flow would also help contextualize the split of "server-side" and "client-side" code related to this maintenance, that @dimas-b talked about

String DEFAULT_CATALOGS_HISTORY_RETAIN = "false";
String DEFAULT_CATALOG_ROLES_RETAIN = "false";
String DEFAULT_CATALOG_POLICIES_RETAIN = "ageDays < 30 || commits <= 1";
String DEFAULT_CATALOG_STATE_RETAIN = "ageDays < 30 || commits <= 1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ageDays condition the only thing to distinguish between "staged but not yet committed" states that are being assembled as part of a transaction and old states that were once committed/valid but are expired now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has nothing to do with "not yet committed" data.
This maintenance cleans up stale + unreferenced as well as "unwanted" history. What "unwanted" means is defined by these defaults, which can optionally be customized by users.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think this is the "only thing". This rule is just a fairly safe common sense default.

@dimas-b
Copy link
Contributor

dimas-b commented Dec 16, 2025

Seeing the end-to-end admin flow would also help contextualize the split of "server-side" and "client-side" code related to this maintenance, that @dimas-b talked about

Hypothetically (ATM), the Admin tool could involve the maintenance code to perform its job.

Other implementations are possible too. It's an open-ended design where downstream projects are free to implement their own triggering mechanisms. I imagine k8s jobs is one of possible options.

@snazy
Copy link
Member Author

snazy commented Dec 20, 2025

Thanks all for the helpful reviews!

Documentation of "everything" is definitely planned to be added to the web site once everything is in.

I have renamed the config mapping.

snazy and others added 2 commits January 7, 2026 16:27
…ava/org/apache/polaris/persistence/nosql/metastore/maintenance/CatalogsMaintenanceConfig.java

Co-authored-by: Prashant Singh <35593236+singhpk234@users.noreply.github.com>
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I propose to merge this PR in it current state for the sake of making progress (3 weeks in review already :) ). If further changes are needed, let's discuss and address on main.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jan 7, 2026
@snazy snazy merged commit 27c3f96 into apache:main Jan 9, 2026
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jan 9, 2026
@snazy snazy deleted the nosql-ms-maint branch January 9, 2026 10:22
@singhpk234
Copy link
Contributor

singhpk234 commented Jan 9, 2026

@snazy can you please explain how were the feedbacks mentioned here addressed : #3268 (review) specially request for raising this is the devlist since this module hasn't been discussed anywhere.

I am not sure until we have resolved the feedbacks how can we move forward, if the feedbacks is not addressed for 3 week hence the pr is opened for 3 weeks.

@dimas-b
Copy link
Contributor

dimas-b commented Jan 10, 2026

@singhpk234 : The config concern was in fact addressed. Config names are like polaris.persistence.nosql.maintenance. If something was unintentionally missed in config names, let's fix-forward on main.

Re: the location of this code - I commented above. I do believe it explained the intention. A possible user-side tool is provided in #3395

Re: CEL, please see my comment above. This is why Nessie's CEL impl. is convenient. Please note that projectnessie/cel-java is a pure CEL impl. It has nothing to do with the Nessie Catalog. If this is still a concern, let's discuss (as I noted above) and possible use alternatives.

@singhpk234
Copy link
Contributor

I appreciate the additional context and the link to the user-side tool. However, my primary concern remains the introduction of new constructs—like retention expressions—that don’t currently exist in Polaris. This feels like a significant architectural shift that warrants a broader discussion on the dev list to reach project-wide consensus.

Regarding the cel-java dependency: my concern isn't its origin, but its long-term maintenance health. Given that we have diverging views on this merge (+1 vs -1), I believe the most collaborative path forward is to revert the PR for now. This would allow us to have that broader discussion on the dev list without the pressure of the code already being in the main branch.

@dimas-b
Copy link
Contributor

dimas-b commented Jan 10, 2026

Let's discuss on dev

@singhpk234
Copy link
Contributor

About the concern or about the revert ?

@flyrain
Copy link
Contributor

flyrain commented Jan 10, 2026

If this is still a concern, let's discuss (as I noted above) and possible use alternatives.

It is a real concern. Before the concern was resolved, I don't think the PR should be merged. It's not acceptable to do so in an ASF project.

@dimas-b
Copy link
Contributor

dimas-b commented Jan 12, 2026

I mean any/all issues brought up here are probably best discussed on dev. I suppose an email thread per topic would be advisable on dev.

@flyrain
Copy link
Contributor

flyrain commented Jan 12, 2026

@dimas-b , would you mind sharing links about the discussion and conclusion of consensus of the following item?

Using https://github.com/projectnessie/cel-java/ for rules, why such rules are required in the first place ? why not use https://github.com/google/cel-java directly which is actively maintained by google developers vs using https://github.com/projectnessie/cel-java/ which has just done dependency updates this year https://github.com/projectnessie/cel-java/pulls?page=8&q=is%3Apr+is%3Aclosed

@dimas-b
Copy link
Contributor

dimas-b commented Jan 12, 2026

@flyrain : my points about these items are available in this PR's comments trail.

The need for rules is evident from the related java code.

If you prefer to go deeper into these aspects please bring your concerns to the dev ML and I'll be happy to discuss further.

@dimas-b
Copy link
Contributor

dimas-b commented Jan 14, 2026

evindj pushed a commit to evindj/polaris that referenced this pull request Jan 26, 2026
Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.
snazy added a commit to snazy/polaris that referenced this pull request Feb 11, 2026
* feat: Add AWS STS Session Tags support for credential vending (apache#3327)

* feat: Add AWS STS Session Tags support for credential vending
Fixes apache#3325

* Minor fixes

* Review comments

* Review comments.

* Remove request_id from the PR

* Review comment: added activated roles

* Review comments

* Update AwsSessionTagsBuilder.java

* Spotless fixes

* Update StorageCredentialCacheKey.java

* fix failing tests

* Review comments.

* Reverting last commit based on review comments for cache collisions.

---------

Co-authored-by: Anand Kumar Sankaran <anand.sankaran@workday.com>

* chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-2.1767639862 (apache#3383)

* Added HttpRoute and Gateway to Helm Chart (apache#3314)

* Added httproute and gateway

* added to readme

* updated helm site docs

* updated changelog

* Added tests

* fixed broken test

* fixed test part 2

* removed odd comment

* added check for httproute and gateway

* shuffled the gateway documentation

* better gateway instructions

* removed extra case in validateRouting

* Errorprone: prepare for v2.46.0 (apache#3384)

This tackles the current failure in apache#3382: `-XDaddTypeAnnotationsToSymbol=true is required by Error Prone on JDK 21`

* NoSQL: Metastore maintenance (apache#3268)

Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.

* Update release workflows to use the new GPG private key (apache#3399)

* [tech debt] Cleanup `gradle/libs.versions.toml` (apache#3394)

This change removes unreferenced dependency references, inlines single usages of a version-reference and adds a dependency for checkstyle to get that version managed by renovate.

* fix(deps): update dependency org.keycloak:keycloak-admin-client to v26.0.8 (apache#3405)

* fix(deps): update dependency com.google.errorprone:error_prone_core to v2.46.0 (apache#3382)

* [doc]: Add Minio OSS disclaimer (apache#3390)

* [Releasy] Let Maven artifact publication propagate failures (apache#3407)

The Gradle build to publish the Maven artifacts is invoked like `./gradlew ... | tee <log>`. The (overall) exit code of pipes is the exit code of the _last_ command. The exit codes of all pipe "parts" is available in bash's `PIPESTATUS` array and needs to be checked.

* Fix Gradle up-to-date of jars (apache#3401)

The change apache#1036 added the Maven pom.xml to all built jars. However, the `GenerateMavenPom` Gradle task type is never up-to-date, in consequence no jar can ever be up-to-date, leading to unnecessarily longer build times.

This change ensures that the pom.xml is included in release builds, but not in developer/snapshot builds.

* fix(deps): update dependency net.ltgt.gradle:gradle-errorprone-plugin to v4.4.0 (apache#3406)

* fix(deps): update dependency com.puppycrawl.tools:checkstyle to v13 (apache#3403)

* fix(deps): update dependency io.opentelemetry:opentelemetry-bom to v1.58.0 (apache#3408)

* fix(deps): update dependency software.amazon.awssdk:bom to v2.41.5 (apache#3416)

* chore(deps): update dependency jupyterlab to v4.5.2 (apache#3419)

* Release workflows should retry svn checkout in case of failure (apache#3393)

* Change parser option to be required (apache#3413)

* Support hierarchical namespace in Azure (apache#3347)

* Support hierarchical namespace in Azure

* Add `hierarchical` to `AzureStorageConfigInfo` (the default is unset translating to current behaviour).

* Use `DataLakeDirectoryClient` instead of `DataLakeFileSystemClient` for generating SAS tokens
  when `hierarchical` is set to `true`.

* Add `cloudTest` classes for testing with credential vending in ADLS.

* chore(test): Increase Authorization Test Coverage (apache#3332)

* increase test coverage

* update docstrings

* clean up

* make use of same helper method

* remove duplicate tests from OpaIntegrationTest

* use tempdir

* fix(deps): update dependency io.micrometer:micrometer-bom to v1.16.2 (apache#3422)

* chore(deps): update registry.access.redhat.com/ubi9/openjdk-21-runtime docker tag to v1.24-2.1767878250 (apache#3421)

* Last merged commit 1996156

* Add free-disk-space action

* Add free-disk-space action to regtest + spark_client_regtests

---------

Co-authored-by: Anand K Sankaran <lists@anands.net>
Co-authored-by: Anand Kumar Sankaran <anand.sankaran@workday.com>
Co-authored-by: Mend Renovate <bot@renovateapp.com>
Co-authored-by: cccs-cat001 <56204545+cccs-cat001@users.noreply.github.com>
Co-authored-by: Pierre Laporte <pierre@pingtimeout.fr>
Co-authored-by: Yong Zheng <yongzheng0809@gmail.com>
Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com>
Co-authored-by: Sung Yun <107272191+sungwy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants