-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-5984. S3 Event Notification design doc #8449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ivandika3 please review as it may be related to your S3 real time replication project. |
peterxcli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this design!
I think we can add a NotificationManager, which would receive event message from OMExecutionFlow and store in queue. So if those sender client support batch operation, we would have better throughput. Additionally, the logic between OMExecutionFlow, OMMetadataManager and NotificationSenderManager can move into itself, make OMExecutionFlow cleaner.
(can have a better name, NotificationManager is easy to collide in name)
|
|
||
| ### Overview | ||
|
|
||
| A callback is introduced post-Ratis commit and pre-client response to handle event notification logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean the notification logic would happend after ratis commit and before OMKeyRequest#validateAndUpdateCache here? If there are some exception happened, then the action on that key wouldn't be done, but the notification of it would be sent out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, actually the callback will run after the request has been completed. Maybe I wasn't clear enough.
|
|
||
| Furthermore, the creation and reuse of notification client instances (such as Kafka or RabbitMQ producers) within the Ozone Manager must be carefully controlled. Poor resource management in this area could increase memory or thread usage and degrade OM performance over time. | ||
|
|
||
| ### Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can add some metrics about: event_triggered, event_lost, push_ok, push_fail and push_pending.
ref: https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the difference between event_lost and push_fail?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice that I pasted the wrong link. Updated, please take another look~
https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats
What's the difference between event_lost and push_fail?
In https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats:
pubsub_event_lost: running counter of events that had topics associated with them but that were not pushed to any of the endpointspubsub_push_fail: running counter, for all notifications, of events failed to be pushed to their endpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I not sure if the notification here that can be create through java api is similar to topic in ceph pubsub.
And if yes, then:
In ozone(or this design), it would become:
event_lost: running counter of events that had notification config associated with them but that were not pushed to any of the targetpush_fail: running counter, for all notifications, of targets failed to be pushed to their endpoint
peterxcli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this document!
|
|
||
| Furthermore, the creation and reuse of notification client instances (such as Kafka or RabbitMQ producers) within the Ozone Manager must be carefully controlled. Poor resource management in this area could increase memory or thread usage and degrade OM performance over time. | ||
|
|
||
| ### Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I didn't notice that I pasted the wrong link. Updated, please take another look~
https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats
What's the difference between event_lost and push_fail?
In https://docs.ceph.com/en/quincy/radosgw/notifications/#notification-performance-stats:
pubsub_event_lost: running counter of events that had topics associated with them but that were not pushed to any of the endpointspubsub_push_fail: running counter, for all notifications, of events failed to be pushed to their endpoint
|
|
||
| ### Overview | ||
|
|
||
| A callback is introduced after the Ratis request is completed and before the response is returned to the client, to handle event notification logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will the event notification logic be handled asynchronously? If not, I’m concerned that the sender could slow down the end-to-end response time. This also raises another question: if the event notification fails, should the client still be informed that the original request succeeded, or should the failure be communicated to the client as well?
|
|
||
| Furthermore, the creation and reuse of notification client instances (such as Kafka or RabbitMQ producers) within the Ozone Manager must be carefully controlled. Poor resource management in this area could increase memory or thread usage and degrade OM performance over time. | ||
|
|
||
| ### Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I not sure if the notification here that can be create through java api is similar to topic in ceph pubsub.
And if yes, then:
In ozone(or this design), it would become:
event_lost: running counter of events that had notification config associated with them but that were not pushed to any of the targetpush_fail: running counter, for all notifications, of targets failed to be pushed to their endpoint
|
This PR has been marked as stale due to 21 days of inactivity. Please comment or remove the stale label to keep it open. Otherwise, it will be automatically closed in 7 days. |
|
Thank you for your contribution. This PR is being closed due to inactivity. If needed, feel free to reopen it. |
|
This PR is now covered by #8871 . Please see/refer to that one. |
What changes were proposed in this pull request?
This is the design proposal for the S3 Event Notification feature. The design document is shared in the Markdown file. Please comment in the document if you have any feedback.
What is the link to the Apache JIRA
HDDS-5984
How was this patch tested?
NA