Skip to content

Add support for partial http requests handling with pipelining#111258

Closed
mhl-b wants to merge 2 commits intoelastic:partial-rest-requestsfrom
mhl-b:split-http-pipelining
Closed

Add support for partial http requests handling with pipelining#111258
mhl-b wants to merge 2 commits intoelastic:partial-rest-requestsfrom
mhl-b:split-http-pipelining

Conversation

@mhl-b
Copy link
Copy Markdown
Contributor

@mhl-b mhl-b commented Jul 25, 2024

UPDATE:
Created a leaner and cleaner version here #111438.
Most of changes here are irrelevant in new PR.


This PR adds support to Netty4HttpPipeliningHandler to handle parts of HTTP request. Right now we always aggregate HTTP request content before passing to RestController. With this change Netty4HttpPipeliningHandler can receive FullHttpRequests and HttpRequests/HttpContents with correct sequence number.

To make it possible I made several changes in Netty channel pipeline.

  1. Add HTTP pipeline sequence to parts of HTTP request. Before Netty4HttpPipeliningHandler would keep track of all incoming FullHttpRequests and increase sequence number. With this change addition of sequence number happens sooner, after HttpRequestDecoder, but before decompression and aggregation. It happens in Netty4InboundHttpPipeliningHandler.

  2. Propagation of sequence number through HttpObjectAggregator and HttpContentDecompressor. Since netty does not know about our pipelining implementation, I have to wrap both classes that can consume and produce a "pipelined" versions of HttpObjects. Besides wrapper handler classes I introduced strong-typed versions of HttpRequest, HttpContent, LastHttpContent, FullHttpRequest with pipeline sequence number, for example
    public record PipelinedHttpContent(HttpContent httpContent, int sequence)

  3. Make HTTP aggregation optional. In this PR we still do full aggregation to all requests. But I added a predicate that can control which requests can skip aggregation, and unit tests.

After change pipeline looks like this:

| from                        | to                                 | note                       |
|-----------------------------+------------------------------------+----------------------------|
| HttpRequestDecoder          | HttpRequestDecoder                 |                            |
| Netty4HttpHeaderValidator   | Netty4HttpHeaderValidator          |                            |
|                             | Netty4InboundHttpPipeliningHandler | emits PipelinedHttpObjects |
| HttpContentDecompressor     | Netty4ContentDecompressor          | wrapped decompressor       |
| HttpObjectAggregator        | Netty4HttpAggregator               | optional aggregation       |
| Netty4HttpPipeliningHandler | Netty4HttpPipeliningHandler        |                            |

Most interesting changes in Netty4ContentDecompressor, Netty4HttpAggregator, Netty4InboundHttpPipeliningHandler , Netty4HttpPipeliningHandler and related unit tests at the end. The rest is boilerplate.

Update:
Added example of RestHandler with chunked content. I use reactive streams model. Rather than exposing BytesReference as content, I use Flow.Publisher<HttpContent>. The RestHandler will subscribe to publisher and consume parts at it's own rate - by Subscription.request(long n). The publisher will read from netty channel on demand and invoke onNext(HttpContent) when number of requested parts is greater than 0.

Content publisher - Netty4RequestContentPublisher
Rest handler example - Netty4RequestContentPublisherIT

I omitted error handling and edge cases for brevity. There many things to consider with publisher state machine: for example avoid multiple subscriptions, handling terminal states, handling errors, handling subscriber cancellation. I did some drafts, and I can say it's pretty easy to add them.

@mhl-b mhl-b added >enhancement :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team. v8.16.0 labels Jul 25, 2024
@mhl-b mhl-b requested review from DaveCTurner and Tim-Brooks July 25, 2024 03:01
@elasticsearchmachine
Copy link
Copy Markdown
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@mhl-b mhl-b requested a review from ywangd July 25, 2024 03:02
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Mikhail. I'll take a more detailed look in due course but two things straight away:

  1. Could we have a test case which shows how a REST handler would use this API? Probably a ESNetty4IntegTestCase derivative would be best, see e.g. Netty4ChunkedContinuationsIT for an example of the sort of thing I mean.

  2. Could we merge this into a feature branch for now rather than main? That way we've somewhere to coordinate the work to adopt this in the bulk indexing path, but we're not committing to it until we have the whole solution in place and can run some benchmarks?

@mhl-b mhl-b changed the base branch from main to partial-rest-requests July 25, 2024 15:32
@mhl-b
Copy link
Copy Markdown
Contributor Author

mhl-b commented Jul 25, 2024

@DaveCTurner

  1. Make sense, will do
  2. Feature branch - done

Copy link
Copy Markdown
Member

@ywangd ywangd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be an exciting piece of work! Thanks for working on it. I had a first read-through and left some minor comments. I plan to come back to it and have a closer look. In the meantime, I second on David comment about having a test case that demonstrates how this new feature is leveraged.

Comment on lines +36 to +50
out.replaceAll(obj -> {
if (obj instanceof PipelinedHttpObject) {
return obj;
} else if (obj instanceof FullHttpRequest request) {
return new PipelinedFullHttpRequest(request, sequence);
} else if (obj instanceof HttpRequest request) {
return new PipelinedHttpRequest(request, sequence);
} else if (obj instanceof LastHttpContent lastContent) {
return new PipelinedLastHttpContent(lastContent, sequence);
} else if (obj instanceof HttpContent content) {
return new PipelinedHttpContent(content, sequence);
} else {
throw new IllegalArgumentException();
}
});
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more efficient to have a delegating List that does these wrapping in its add method and pass it to super.decode(...). Maybe a pre-mature optimisation for the early stage.


@Override
public boolean acceptInboundMessage(Object msg) throws Exception {
return msg instanceof PipelinedHttpObject;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For my education: we don't need call super.acceptInboundMessage here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped Decompressor should call this method on every "read(ctx, msg)", I added this check to avoid surprises in "decode" method that assumes message has sequence number.

@mhl-b
Copy link
Copy Markdown
Contributor Author

mhl-b commented Jul 28, 2024

@DaveCTurner @ywangd
I updated PR description with approach and implementation of RestHandler with chunked request content.
Please see Netty4RequestContentPublisherIT for an API design. TLDR I use reactive streams.

*/
public class Netty4HttpAggregator extends HttpObjectAggregator {

private static final Predicate<PipelinedHttpRequest> IGNORE_TEST = (req) -> req.uri().startsWith("/_test/request-stream") == false;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to the source for the sake of PR size

} else {
netty4HttpRequest = new Netty4HttpRequest(readSequence++, fullHttpRequest);
assert currentRequest != null;
currentRequest.contentPublisher().sendChunk((HttpContent) msg);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attach currentRequest to the Netty4PipeliningHandler and consequent chunks can be published right a way to the RestHandler from here.

Comment on lines +57 to +67
Netty4HttpRequest(int sequence, HttpRequest request, Netty4RequestContentPublisher contentPublisher) {
this.sequence = sequence;
this.nettyRequest = request;
this.nettyContent = new DefaultHttpContent(Unpooled.EMPTY_BUFFER);
this.contentPublisher = contentPublisher;
this.content = BytesArray.EMPTY;
this.released = new AtomicBoolean(false);
this.pooled = false;
this.headers = getHttpHeadersAsMap(request.headers());
this.inboundException = null;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add support to Netty4HttpRequest to be HttpRequest and optional HttpContent, rather than always FullHttpRequest. I do drop trailing headers for not full requests, not sure do we use them or not. Haven't found traces of them yet.


import java.util.concurrent.Flow;

public class Netty4RequestContentPublisher implements Flow.Publisher<HttpContent> {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bare bones, nothing fancy, no error handling, no state management, no multiple subscriptions, just happy path

@mhl-b mhl-b requested review from DaveCTurner and ywangd July 28, 2024 04:43
Copy link
Copy Markdown
Member

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reactive stream is the right conceptual model for sure but I'm not convinced we want to stick to these exact APIs. There's always exactly one subscriber, known ahead of time, so there's no need for a separate subscribe() API; we really want .request() to request a certain number of bytes but will typically deliver this as a single chunk, which differs from the reactive streams API that expects the argument to be the number of chunks; finally the Throwable in the onError path is very concerning. Remember this code is all on an incredibly hot path, we need to avoid unnecessary abstractions wherever possible.

@mhl-b mhl-b removed the request for review from Tim-Brooks July 30, 2024 02:23
@mhl-b
Copy link
Copy Markdown
Contributor Author

mhl-b commented Jul 30, 2024

@DaveCTurner @ywangd
I created new PR #111438.
I found simpler and smaller implementation, that does not have problems introduced here. So I think continue discussion there would be easier since less clutter/noise.

In new PR no changes in pipelining sequencing, no more PipelinedObjects, simplified wrapper for HttpObjectAggregator that does not intercept "decode" method, simplified interface for streamed content, I still use reactive streams as a model but only with minimal required methods.

@ywangd
Copy link
Copy Markdown
Member

ywangd commented Jul 31, 2024

Would it be OK to close this PR if this is not intended to be continued? It would help to tell which one to follow more easily.

@mhl-b mhl-b closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed/Network Http and internode communication implementations >enhancement Team:Distributed Meta label for distributed team. v8.16.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants