Skip to content

[#6076] improve(core): Support model pre event to Gravitino server #6250

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Mar 11, 2025

Conversation

Abyss-lord
Copy link
Collaborator

@Abyss-lord Abyss-lord commented Jan 15, 2025

What changes were proposed in this pull request?

Support model pre event to Gravitino server, based on #6129 , Both synchronize Dispatcher changes with each other

https://docs.google.com/document/d/1_aVCd_tKiEebpzp9tg07Lzdk1j6EalNn8YOIdRn-3z4/edit?tab=t.0#heading=h.k85t4bueowc5

PreEvent

PreEvent OperationType ModelCatalog
RegisterModelPreEvent REGISTER_MODEL registerModel
GetModelPreEvent LOAD_MODEL getModel
DeleteModelEvent Delete_MODEL deleteModel
ListModelPreEvent LIST_MODEL listModels
LinkModelVersionPreEvent LINK_MODEL_VERSION linkModelVersion
GetModelVersionPreEvent GET_MODEL_VERSION getModelVersion
DeleteModelVersionPreEvent Delete_MODEL_VERSION deleteModelVersion
ListModelVersionsPreEvent LIST_MODEL_VERSIONS listModelVersions

ModelEventDispatcher

image

Why are the changes needed?

Fix: #6076

Does this PR introduce any user-facing change?

No

How was this patch tested?

Model Event

  1. testRegisterModelEvent
  2. testGetModelEvent
  3. testDeleteExistsModelEvent
  4. testDeleteNotExistsModelEvent
  5. testListModelEvent

Model Version Event

  1. testLinkModelVersionEvent
  2. testGetModelVersionEventViaVersion
  3. testGetModelVersionEventViaAlias
  4. testDeleteModelVersionEventViaVersion
  5. testDeleteModelVersionEventViaAlias
  6. testDeleteModelVersionEventViaVersionNotExists
  7. testListModelVersionsEvent

@Abyss-lord Abyss-lord changed the title [#6067] improve(CLI): Support model pre event to Gravitino server [#6076] improve(CLI): Support model pre event to Gravitino server Jan 16, 2025
@Abyss-lord
Copy link
Collaborator Author

@FANNG1 @xunliu @orenccl could you please review this PR when you have time? I’d really appreciate your feedback.

Support model event to Gravitino server. Add list, get, create and delete events to both model and model version.
Fix some error and add ModelEventDispatcher to Gravitino server.
…t been refactored

rename drop event and operation type.
Support model pre event to Gravitino server.

[apache#6067] improve(CLI): Support model pre event to Gravitino server

add model version related pre event to Gravitino server.

[apache#6067] improve(CLI): Support model pre event to Gravitino server

add preEvent support for ModelEventDispatcher.

[apache#6067] improve(CLI): Support model pre event to Gravitino server

add test cases.
@Abyss-lord
Copy link
Collaborator Author

@FANNG1 should we add ModelEventDispatcher to the Context in this pr? Or wait until all the events are complete?

@FANNG1
Copy link
Contributor

FANNG1 commented Feb 24, 2025

@Abyss-lord sorry for the delay, could you check the comments and add document in https://github.com/apache/gravitino/blob/main/docs/gravitino-server-config.md?plain=1#L129 ?

@tengqm
Copy link
Collaborator

tengqm commented Feb 25, 2025

Just took a quick glance over the eventing code base ... and I got a question.
Given that we want to emit events for every operation on every entity, the current case-by-case implementation is problematic. It doesn't scale well.

Driven by this observation, I'm thinking if this can be drastically simplified. For example, we can use a handful of abstractions rather than tens of them to model events. Maybe we simply don't need all the fields in every event? I mean we can have a gemeroc link field pointing to the URI of the relevant object?

@FANNG1
Copy link
Contributor

FANNG1 commented Feb 25, 2025

Just took a quick glance over the eventing code base ... and I got a question. Given that we want to emit events for every operation on every entity, the current case-by-case implementation is problematic. It doesn't scale well.

Driven by this observation, I'm thinking if this can be drastically simplified. For example, we can use a handful of abstractions rather than tens of them to model events. Maybe we simply don't need all the fields in every event? I mean we can have a gemeroc link field pointing to the URI of the relevant object?

Yes, adding events one by one for all operations is painful. This is a trade-off, in this one-by-one way, the user could handle the event more flexibly, and if do with some general abstraction, we may lose some specific information in the event, any suggestions?

@tengqm
Copy link
Collaborator

tengqm commented Feb 25, 2025

if do with some general abstraction, we may lose some specific information in the event, any suggestions?

It all depends on the use case, IMHO. We can always add "specific information" later on when needed. At this moment, we may want to rethink whether we really want to dump all details into events. If the event is about to be triggered for all requests and responses, we can have a nice abstraction, which is just to dump the request/response body. It is gonna be a heavy burden for such a detailed dump, especially for LIST operations.

@FANNG1
Copy link
Contributor

FANNG1 commented Feb 26, 2025

Totally, please try to avoid doing another IO to get model or model version information for the event, using the parameter in the request seems enough.

FourFriends and others added 7 commits March 2, 2025 16:10
… roles is bound to many metadata. (apache#6455)

### What changes were proposed in this pull request?

fix issue apache#6238
improve performance when a single role is bound to many metadata.

### Why are the changes needed?

Use batch queries when getting role securable object full names instead
of loop queries to get each securable object full name.

Fix: apache#6238

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Unit tests and integration tests have all passed, this feature has been
running internally at Xiaomi for two weeks.

Co-authored-by: luoxin5 <[email protected]>
…o configure gvfs-fuse. (apache#6465)

### What changes were proposed in this pull request?

Fix the bug of using a relative path to configure gvfs-fuse.

### Why are the changes needed?

Fix: apache#6464

### Does this PR introduce _any_ user-facing change?

NO

### How was this patch tested?

Manually test
…FSFileSystemProvider` problem (apache#6463)

### What changes were proposed in this pull request?

In the current code base, we need to add `catalog-hadoop` to make GVFS
client works or class `HDFSFileSystemProvider` and
`LocalFileSystemProvider` can't be found.

### Why are the changes needed?

It's a bug.

Fix: apache#6462 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

ITs, UTs and test locally.
### What changes were proposed in this pull request?
This pr removes outdated tips from `web-ui.md` because, as indicated in
the subsequent text, operations such as creating or modifying schemas,
tables, or filesets can now be performed through the web UI.

### Why are the changes needed?
Remove outdated tips from `web-ui.md`


### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
manual review
…on parameters through environment variables (apache#6458)

### What changes were proposed in this pull request?

Add the environment variables `GRAVITINO_JDBC_USER`,
`GRAVITINO_JDBC_PASSWORD`, and `GRAVITINO_S3_ENDPOINT` to the
`rewrite_config.py`, so database user and password can be set by the
user in a container environment as well as a custom S3 endpoint.

### Why are the changes needed?

Currently, JDBC user and PW are hardcoded to `iceberg` and cannot be
changed when using the container images. One workaround is including the
values in the JDBC connection string, but this neither elegant nor in
line with how containers commonly handle this issue.

Additionally, it wasn't possible to set the value of
`gravitino.iceberg-rest.s3-endpoint` in a container environment, which
can be useful in a variety of situations when using a provider other
than AWS.

### Does this PR introduce _any_ user-facing change?

Added the three aforementioned two environment variables that can be
used by the user.

### How was this patch tested?

Was tested locally only, since this is a minimal change.
### What changes were proposed in this pull request?
fix uri title of version detail

### Why are the changes needed?
N/A

Fix: apache#6478

### Does this PR introduce _any_ user-facing change?
N/A

### How was this patch tested?
manually
…che#6405)

### What changes were proposed in this pull request?

 Add Readme document of gvfs-fuse

### Why are the changes needed?

Fix: apache#6404

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

No
@Abyss-lord
Copy link
Collaborator Author

@FANNG1 @tengqm I’ve finished updating the code and design doc. Please take a look at the PR again when you have time.

@Abyss-lord Abyss-lord requested a review from FANNG1 March 3, 2025 01:30
@Abyss-lord
Copy link
Collaborator Author

@FANNG1 Plz approve the workflows.

@Abyss-lord
Copy link
Collaborator Author

@FANNG1 It was my mistake that caused the problem?Why do so many workflows fail?

@FANNG1
Copy link
Contributor

FANNG1 commented Mar 3, 2025

@FANNG1 It was my mistake that caused the problem?Why do so many workflows fail?

seems not related, the docker container is not started successfully.

new RegisterModelPreEvent(PrincipalUtils.getCurrentUserName(), ident, registerRequest));
try {
Model model = dispatcher.registerModel(ident, comment, properties);
// TODO: ModelEvent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view, this is not a good practice for raising PRs.
Each PR is supposed to be:

  • small, e.g. <500 lines of code changes;
  • self-contained, with all irrelevant changes stripped off;
  • good enough, i.e. not half-baked;
  • easy to roll-back, if some bugs are introduced unfortunately

It is perfectly okay to raise 10+ smaller PRs a day, because we know that small steps forward make big progress.

For this specific PR, I'd suggest we implement Model Event dispatcher with test cases;
then we raise PRs for each specific events, each with its own test cases.

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it's preferred to make the PR small, but for the event listener PRs, most codes are similar and cohesive. I'm afraid it may be hard to propose the PR and review to split smaller PRs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whatever ... if we believe this code is good enough, not half-baked, I'm fine with it.

@Abyss-lord Abyss-lord requested a review from FANNG1 March 3, 2025 09:51
@Abyss-lord
Copy link
Collaborator Author

@FANNG1 I’ve finished updating the code. Please take a look at the PR again when you have time.

new RegisterModelPreEvent(PrincipalUtils.getCurrentUserName(), ident, registerRequest));
try {
Model model = dispatcher.registerModel(ident, comment, properties);
// TODO: ModelEvent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it's preferred to make the PR small, but for the event listener PRs, most codes are similar and cohesive. I'm afraid it may be hard to propose the PR and review to split smaller PRs.

@FANNG1
Copy link
Contributor

FANNG1 commented Mar 4, 2025

Generally LGTM except minor comments, @jerryshao do you have time to review again?

@Abyss-lord
Copy link
Collaborator Author

@FANNG1 @jerryshao I’ve finished updating the code(add alias and version to GetModelVersionPreEvent ). Please take a look at the PR again when you have time.

@FANNG1 FANNG1 closed this Mar 6, 2025
@FANNG1 FANNG1 reopened this Mar 6, 2025
@Abyss-lord Abyss-lord requested a review from FANNG1 March 10, 2025 01:43
@Abyss-lord
Copy link
Collaborator Author

@FANNG1 @jerryshao I’ve finished updating the code. Please take a look at the PR again when you have time.

@Abyss-lord Abyss-lord changed the title [#6076] improve(CLI): Support model pre event to Gravitino server [#6076] improve(core): Support model pre event to Gravitino server Mar 10, 2025
@FANNG1 FANNG1 merged commit c9c9a99 into apache:main Mar 11, 2025
28 checks passed
@FANNG1
Copy link
Contributor

FANNG1 commented Mar 11, 2025

merged to main, thanks @Abyss-lord for your contribution and @tengqm for the review.

@Abyss-lord Abyss-lord deleted the model-pre-event branch March 11, 2025 05:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Subtask] Support model pre event to Gravitino server