HDDS-5984. S3 Event Notification design doc #8449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Closed

chungen0126 wants to merge 2 commits into apache:master from chungen0126:HDDS-5984

Contributor

chungen0126 commented May 14, 2025

What changes were proposed in this pull request?

This is the design proposal for the S3 Event Notification feature. The design document is shared in the Markdown file. Please comment in the document if you have any feedback.

What is the link to the Apache JIRA

How was this patch tested?

NA


          HDDS-5984. S3 Event Notification design doc

f12d05b

adoroszlai added documentation design labels

jojochuang requested review from ChenSammi and ivandika3

May 14, 2025 21:15

Contributor

jojochuang commented May 14, 2025

@ivandika3 please review as it may be related to your S3 real time replication project.

jojochuang added the s3 label

This comment was marked as duplicate.

Sign in to view

peterxcli reviewed

View reviewed changes

Member

peterxcli left a comment

Thanks for this design!
I think we can add a NotificationManager, which would receive event message from OMExecutionFlow and store in queue. So if those sender client support batch operation, we would have better throughput. Additionally, the logic between OMExecutionFlow, OMMetadataManager and NotificationSenderManager can move into itself, make OMExecutionFlow cleaner.
(can have a better name, NotificationManager is easy to collide in name)

hadoop-hdds/docs/content/design/s3-event-notification.md Outdated Show resolved Hide resolved

hadoop-hdds/docs/content/design/s3-event-notification.md Outdated Show resolved Hide resolved

hadoop-hdds/docs/content/design/s3-event-notification.md Outdated


		### Overview

		A callback is introduced post-Ratis commit and pre-client response to handle event notification logic.

Member

peterxcli May 21, 2025

Do you mean the notification logic would happend after ratis commit and before OMKeyRequest#validateAndUpdateCache here? If there are some exception happened, then the action on that key wouldn't be done, but the notification of it would be sent out.

Contributor Author

chungen0126 May 22, 2025

No, actually the callback will run after the request has been completed. Maybe I wasn't clear enough.

hadoop-hdds/docs/content/design/s3-event-notification.md


		Furthermore, the creation and reuse of notification client instances (such as Kafka or RabbitMQ producers) within the Ozone Manager must be carefully controlled. Poor resource management in this area could increase memory or thread usage and degrade OM performance over time.

		### Metrics

Member

peterxcli May 21, 2025 •

edited

Loading

Can add some metrics about: event_triggered, event_lost, push_ok, push_fail and push_pending.
ref: https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats

Contributor Author

chungen0126 May 22, 2025

What's the difference between event_lost and push_fail?

Member

peterxcli May 23, 2025

Sorry I didn't notice that I pasted the wrong link. Updated, please take another look~

https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats

What's the difference between event_lost and push_fail?

In https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats:

pubsub_event_lost: running counter of events that had topics associated with them but that were not pushed to any of the endpoints
pubsub_push_fail: running counter, for all notifications, of events failed to be pushed to their endpoint

Member

peterxcli May 23, 2025

I not sure if the notification here that can be create through java api is similar to topic in ceph pubsub.

And if yes, then:
In ozone(or this design), it would become:

event_lost: running counter of events that had notification config associated with them but that were not pushed to any of the target
push_fail: running counter, for all notifications, of targets failed to be pushed to their endpoint

hadoop-hdds/docs/content/design/s3-event-notification.md Show resolved Hide resolved

hadoop-hdds/docs/content/design/s3-event-notification.md Show resolved Hide resolved


          address comments

caeacaf

peterxcli reviewed

View reviewed changes

Member

peterxcli left a comment

Thanks for updating this document!

hadoop-hdds/docs/content/design/s3-event-notification.md


		Furthermore, the creation and reuse of notification client instances (such as Kafka or RabbitMQ producers) within the Ozone Manager must be carefully controlled. Poor resource management in this area could increase memory or thread usage and degrade OM performance over time.

		### Metrics

Member

peterxcli May 23, 2025

Sorry I didn't notice that I pasted the wrong link. Updated, please take another look~

https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats

What's the difference between event_lost and push_fail?

In https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats:

pubsub_event_lost: running counter of events that had topics associated with them but that were not pushed to any of the endpoints
pubsub_push_fail: running counter, for all notifications, of events failed to be pushed to their endpoint

hadoop-hdds/docs/content/design/s3-event-notification.md


		### Overview

		A callback is introduced after the Ratis request is completed and before the response is returned to the client, to handle event notification logic.

Member

peterxcli May 23, 2025

Will the event notification logic be handled asynchronously? If not, I’m concerned that the sender could slow down the end-to-end response time. This also raises another question: if the event notification fails, should the client still be informed that the original request succeeded, or should the failure be communicated to the client as well?

hadoop-hdds/docs/content/design/s3-event-notification.md


		Furthermore, the creation and reuse of notification client instances (such as Kafka or RabbitMQ producers) within the Ozone Manager must be carefully controlled. Poor resource management in this area could increase memory or thread usage and degrade OM performance over time.

		### Metrics

Member

peterxcli May 23, 2025

I not sure if the notification here that can be create through java api is similar to topic in ceph pubsub.

And if yes, then:
In ozone(or this design), it would become:

event_lost: running counter of events that had notification config associated with them but that were not pushed to any of the target
push_fail: running counter, for all notifications, of targets failed to be pushed to their endpoint

peterxcli mentioned this pull request

HDDS-13513 Ozone Event Notification Design #8871

Open

github-actions bot commented Nov 12, 2025

This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days.

github-actions bot added the stale label

github-actions bot commented Nov 19, 2025

Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it.

github-actions bot closed this

Contributor Author

chungen0126 commented Nov 21, 2025

This PR is now covered by #8871 . Please see/refer to that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

design documentation s3 stale