Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo zero copy grpc #966

Draft
wants to merge 4 commits into
base: release-2.5.1
Choose a base branch
from

Conversation

duongkame
Copy link
Contributor

@duongkame duongkame commented Nov 14, 2023

What changes were proposed in this pull request?

Demo zero-copy in GrpcService, including GrpcClientProtocolService and GrpcServerProtocolService (appendEntries).
This PR is for an early review to get suggestions for correctly shaping the code.

Zero-copy is done by a simple trick, any protobuf's ByteString object parsed will refer to the original netty buffers instead of having a separated copy in heap. This helps avoid copying data to heap memory and thus saves the cost of buffer copy and GC (for intermediate heap buffers).
Yet, it comes with a challenge: The application needs to explicitly close the original netty buffers when it knows the original protobuf objects (and the it's descendant) is no longer needed. In Ratis, it means to decide when a LogEntryProto is no longer used.

Today, Ratis caches LogEntryProto in SegmentedRaftLogCache. However, for data-intensive applications like Apache Ozone, the cached log entries get their StateMachine data truncated and Ratis relies on the StateMachine to cache the StateMachine data. This behavior is defined by the config raft.server.log.statemachine.data.caching.enabled.

This demo solves the cleanup problem by having DirectBufferCleaner that keeps track of all opening original buffers (handled by an InputStream interface). The cleaner is invoked when:

  1. SegmentedRaftLogCache evicts LogEntryProto: while this sounds like the point when we're sure Ratis no longer need a particular log, it doesn't realse memory fast enough for Raft group with raft.server.log.statemachine.data.caching.enabled, because the log size with StateMachine data truncated doesn't reflect the right size of the original buffer, and this defer cache eviction. We need another strategy for data-intensive StateMachine.
  2. On leader replica, when the 2 follower has caught up with a particular index, and the log index has been applied to StateMachine, it's safe to discard the original buffers of the log. In the follower replica, after a particular index is applied, it's safe to release buffers. This is done for data-intensive StateMachine.

A quick thought, as data-intensive StateMachine may cache data referring to the original buffers, we may need a new StateMachine API to tell when StateMachine should evict data (up to a particular index).

This demo also has a fix to avoid RaftId like (RaftPeerId, RaftGroupId) from referring to the original data source, because that is not zero-copy friendly. This fix will go as a separate PR.

And the code in this demo is not well-structure. For the easy of making the demo, I put DirectBufferCleaner in ratis-server so that it can be invoked directly from ratis-server code. The component should be in ratis-grpc and get invoked based on subscribing events from RaftServer.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/RATIS-1925
https://issues.apache.org/jira/browse/RATIS-1934

How was this patch tested?

To be tested.

@duongkame duongkame marked this pull request as draft November 14, 2023 18:09
@duongkame
Copy link
Contributor Author

@szetszwo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant