-
Notifications
You must be signed in to change notification settings - Fork 363
Make database column events.request_id nullable #2566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The request ID is an optional information coming from the REST request. This PR makes it nullable in the database schema. This PR also annotates the `ModelEvent.principalName` and `PolarisEvent.principalName` fields as nullable in code (the corresponding column was already nullable in the database schema).
| CREATE TABLE IF NOT EXISTS events ( | ||
| realm_id TEXT NOT NULL, | ||
| catalog_id TEXT NOT NULL, | ||
| event_id TEXT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: assuming event_id values are unique, this table can grow definitely, I guess... Do we have a mechanism to cap its size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does capping size means not persisting the events when the number of records > threshold or failing unless its fixed, my understanding is failing unless we fix it (like removing last x months events in a cold store type of things and then for users we can make a view which does union from both the stores ?) would be a better way to go if we try to put a cap as we want to repurpose this for audit POV
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect the listener to handle capping the size -- or some other maintenance service responsible for pruning this table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My primary concern is about naive users who deploy pre-built binaries and start accumulating data in this table 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created an issue to track this: #2573
dimas-b
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed changes LGTM 👍
singhpk234
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as well !
| CREATE TABLE IF NOT EXISTS events ( | ||
| realm_id TEXT NOT NULL, | ||
| catalog_id TEXT NOT NULL, | ||
| event_id TEXT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does capping size means not persisting the events when the number of records > threshold or failing unless its fixed, my understanding is failing unless we fix it (like removing last x months events in a cold store type of things and then for users we can make a view which does union from both the stores ?) would be a better way to go if we try to put a cap as we want to repurpose this for audit POV
* Support sdist client distribution (apache#2557) This PR addresses the binary distribution issue described in apache#2419 . The goal is to include only the files required for an end-user to build the client locally (the repository already supports wheel distribution). For the sdist build, this PR takes a slightly different approach than the symbolic link solution proposed in [apache#2419]. Instead of using symbolic links, it copies the necessary files from the project root into the client directory (if they do not already exist) and then uses that directory during sdist mode. This approach avoids errors caused by Poetry’s path checks, since symbolic links pointing outside the client directory fail validation. * Python client: remove tox (apache#2562) * Update dependency io.smallrye.common:smallrye-common-annotation to v2.13.9 (apache#2567) * Fix H2 JDBC schema init script (apache#2564) Change additional_properties column type from JSONB to TEXT in schema-v3.sql, since JSOnB is not a valid H2 type. * Pin virtualenv version to fix python client installation issue (apache#2569) ``` Package operations: 1 install, 1 update, 0 removals - Updating virtualenv (20.32.0 -> 20.34.0) - Installing pyiceberg (0.10.0): Failed AttributeError 'PythonInfo' object has no attribute 'tcl_lib' at ~/tmp/3/polaris/polaris-venv/lib/python3.13/site-packages/virtualenv/activation/via_template.py:50 in replacements 46│ "__VIRTUAL_ENV__": str(creator.dest), 47│ "__VIRTUAL_NAME__": creator.env_name, 48│ "__BIN_NAME__": str(creator.bin_dir.relative_to(creator.dest)), 49│ "__PATH_SEP__": os.pathsep, → 50│ "__TCL_LIBRARY__": creator.interpreter.tcl_lib or "", 51│ "__TK_LIBRARY__": creator.interpreter.tk_lib or "", 52│ } 53│ 54│ def _generate(self, replacements, templates, to_folder, creator): ``` Currently user may get the above error when running `./polaris` for the first time. This is caused by an upstream bug in `virtualenv>=20.33.0` and a bug in poetry that mistakenly upgrade the package version even if it is not compatible: python-poetry/poetry#10504 (comment). This PR fix the issue by pinning the `virtualenv` version to match what's in upstream poetry: https://github.com/python-poetry/poetry/blob/a8f0889a54a545ec4f7ceed7bf41f8c2a7677bbb/pyproject.toml#L31 * Make column events.request_id nullable (apache#2566) The request ID is an optional information coming from the REST request. This PR makes it nullable in the database schema. This PR also annotates the `ModelEvent.principalName` and `PolarisEvent.principalName` fields as nullable in code (the corresponding column was already nullable in the database schema). * Remove DROP statements from SQL init scripts (apache#2565) SQL init scripts must be idempotent, because they may be invoked several times by a Polaris server during realm bootstrapping (the script is invoked once per realm to bootstrap). It is therefore not possible to put any DROP statements in the scripts. * Avoid using jackson method for parsing YAML from any URL in RootCredentialsSet (apache#2543) * Avoid using jackson method for parsing YAML from any URL in RootCredentialsSet. As discussed FasterXML/jackson-core#803 this method can lead to hidden issues and got deprecated. Instead, we manage URL steams locally in RootCredentialsSet and permit only those URLs that do not have the host component (such as files and java resources)... which makes sense to do from a general security perspective too. * fix password in README.md for ``./gradlew run` (apache#2572) Use the password that matches what the `run` task actually configures. * Site: Add the blog link in the website (apache#2575) * Revert "fix password in README.md for ``./gradlew run` (apache#2572)" (apache#2576) This reverts commit 08086b3. * Site: Remove the dummy post (apache#2579) Co-authored-by: Yufei Gu <yufei.apache.org> * Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.23-6.1757607786 (apache#2577) * Last merged commit 6c4e1b8 --------- Co-authored-by: Yong Zheng <yongzheng0809@gmail.com> Co-authored-by: Mend Renovate <bot@renovateapp.com> Co-authored-by: Alexandre Dutra <adutra@apache.org> Co-authored-by: Honah (Jonas) J. <honahx@apache.org> Co-authored-by: Dmitri Bourlatchkov <dmitri.bourlatchkov@gmail.com> Co-authored-by: Yufei Gu <yufei@apache.org>
The request ID is an optional information coming from the REST request. This PR makes it nullable in the database schema.
This PR also annotates the
ModelEvent.principalNameandPolarisEvent.principalNamefields as nullable in code (the corresponding column was already nullable in the database schema).