Add support for partial http requests handling with pipelining by mhl-b · Pull Request #111258 · elastic/elasticsearch

mhl-b · 2024-07-25T03:01:45Z

UPDATE:
Created a leaner and cleaner version here #111438.
Most of changes here are irrelevant in new PR.

This PR adds support to Netty4HttpPipeliningHandler to handle parts of HTTP request. Right now we always aggregate HTTP request content before passing to RestController. With this change Netty4HttpPipeliningHandler can receive FullHttpRequests and HttpRequests/HttpContents with correct sequence number.

To make it possible I made several changes in Netty channel pipeline.

Add HTTP pipeline sequence to parts of HTTP request. Before Netty4HttpPipeliningHandler would keep track of all incoming FullHttpRequests and increase sequence number. With this change addition of sequence number happens sooner, after HttpRequestDecoder, but before decompression and aggregation. It happens in Netty4InboundHttpPipeliningHandler.
Propagation of sequence number through HttpObjectAggregator and HttpContentDecompressor. Since netty does not know about our pipelining implementation, I have to wrap both classes that can consume and produce a "pipelined" versions of HttpObjects. Besides wrapper handler classes I introduced strong-typed versions of HttpRequest, HttpContent, LastHttpContent, FullHttpRequest with pipeline sequence number, for example
public record PipelinedHttpContent(HttpContent httpContent, int sequence)
Make HTTP aggregation optional. In this PR we still do full aggregation to all requests. But I added a predicate that can control which requests can skip aggregation, and unit tests.

After change pipeline looks like this:

| from                        | to                                 | note                       |
|-----------------------------+------------------------------------+----------------------------|
| HttpRequestDecoder          | HttpRequestDecoder                 |                            |
| Netty4HttpHeaderValidator   | Netty4HttpHeaderValidator          |                            |
|                             | Netty4InboundHttpPipeliningHandler | emits PipelinedHttpObjects |
| HttpContentDecompressor     | Netty4ContentDecompressor          | wrapped decompressor       |
| HttpObjectAggregator        | Netty4HttpAggregator               | optional aggregation       |
| Netty4HttpPipeliningHandler | Netty4HttpPipeliningHandler        |                            |

Most interesting changes in Netty4ContentDecompressor, Netty4HttpAggregator, Netty4InboundHttpPipeliningHandler , Netty4HttpPipeliningHandler and related unit tests at the end. The rest is boilerplate.

Update:
Added example of RestHandler with chunked content. I use reactive streams model. Rather than exposing BytesReference as content, I use Flow.Publisher<HttpContent>. The RestHandler will subscribe to publisher and consume parts at it's own rate - by Subscription.request(long n). The publisher will read from netty channel on demand and invoke onNext(HttpContent) when number of requested parts is greater than 0.

Content publisher - Netty4RequestContentPublisher
Rest handler example - Netty4RequestContentPublisherIT

I omitted error handling and edge cases for brevity. There many things to consider with publisher state machine: for example avoid multiple subscriptions, handling terminal states, handling errors, handling subscriber cancellation. I did some drafts, and I can say it's pretty easy to add them.

elasticsearchmachine · 2024-07-25T03:02:08Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Thanks Mikhail. I'll take a more detailed look in due course but two things straight away:

Could we have a test case which shows how a REST handler would use this API? Probably a ESNetty4IntegTestCase derivative would be best, see e.g. Netty4ChunkedContinuationsIT for an example of the sort of thing I mean.
Could we merge this into a feature branch for now rather than main? That way we've somewhere to coordinate the work to adopt this in the bulk indexing path, but we're not committing to it until we have the whole solution in place and can run some benchmarks?

mhl-b · 2024-07-25T15:42:51Z

@DaveCTurner

Make sense, will do
Feature branch - done

ywangd

This seems to be an exciting piece of work! Thanks for working on it. I had a first read-through and left some minor comments. I plan to come back to it and have a closer look. In the meantime, I second on David comment about having a test case that demonstrates how this new feature is leveraged.

ywangd · 2024-07-26T07:41:38Z

+        out.replaceAll(obj -> {
+            if (obj instanceof PipelinedHttpObject) {
+                return obj;
+            } else if (obj instanceof FullHttpRequest request) {
+                return new PipelinedFullHttpRequest(request, sequence);
+            } else if (obj instanceof HttpRequest request) {
+                return new PipelinedHttpRequest(request, sequence);
+            } else if (obj instanceof LastHttpContent lastContent) {
+                return new PipelinedLastHttpContent(lastContent, sequence);
+            } else if (obj instanceof HttpContent content) {
+                return new PipelinedHttpContent(content, sequence);
+            } else {
+                throw new IllegalArgumentException();
+            }
+        });


Might be more efficient to have a delegating List that does these wrapping in its add method and pass it to super.decode(...). Maybe a pre-mature optimisation for the early stage.

ywangd · 2024-07-26T07:44:19Z

+
+    @Override
+    public boolean acceptInboundMessage(Object msg) throws Exception {
+        return msg instanceof PipelinedHttpObject;


For my education: we don't need call super.acceptInboundMessage here?

Wrapped Decompressor should call this method on every "read(ctx, msg)", I added this check to avoid surprises in "decode" method that assumes message has sequence number.

mhl-b · 2024-07-28T01:56:57Z

@DaveCTurner @ywangd
I updated PR description with approach and implementation of RestHandler with chunked request content.
Please see Netty4RequestContentPublisherIT for an API design. TLDR I use reactive streams.

mhl-b · 2024-07-28T02:00:31Z

+ */
+public class Netty4HttpAggregator extends HttpObjectAggregator {
+
+    private static final Predicate<PipelinedHttpRequest> IGNORE_TEST = (req) -> req.uri().startsWith("/_test/request-stream") == false;


added to the source for the sake of PR size

mhl-b · 2024-07-28T02:06:13Z

            } else {
-                netty4HttpRequest = new Netty4HttpRequest(readSequence++, fullHttpRequest);
+                assert currentRequest != null;
+                currentRequest.contentPublisher().sendChunk((HttpContent) msg);


I attach currentRequest to the Netty4PipeliningHandler and consequent chunks can be published right a way to the RestHandler from here.

mhl-b · 2024-07-28T02:08:32Z

+    Netty4HttpRequest(int sequence, HttpRequest request, Netty4RequestContentPublisher contentPublisher) {
+        this.sequence = sequence;
+        this.nettyRequest = request;
+        this.nettyContent = new DefaultHttpContent(Unpooled.EMPTY_BUFFER);
+        this.contentPublisher = contentPublisher;
+        this.content = BytesArray.EMPTY;
+        this.released = new AtomicBoolean(false);
+        this.pooled = false;
+        this.headers = getHttpHeadersAsMap(request.headers());
+        this.inboundException = null;
+    }


Add support to Netty4HttpRequest to be HttpRequest and optional HttpContent, rather than always FullHttpRequest. I do drop trailing headers for not full requests, not sure do we use them or not. Haven't found traces of them yet.

mhl-b · 2024-07-28T02:09:58Z

+
+import java.util.concurrent.Flow;
+
+public class Netty4RequestContentPublisher implements Flow.Publisher<HttpContent> {


bare bones, nothing fancy, no error handling, no state management, no multiple subscriptions, just happy path

DaveCTurner

A reactive stream is the right conceptual model for sure but I'm not convinced we want to stick to these exact APIs. There's always exactly one subscriber, known ahead of time, so there's no need for a separate subscribe() API; we really want .request() to request a certain number of bytes but will typically deliver this as a single chunk, which differs from the reactive streams API that expects the argument to be the number of chunks; finally the Throwable in the onError path is very concerning. Remember this code is all on an incredibly hot path, we need to avoid unnecessary abstractions wherever possible.

mhl-b · 2024-07-30T02:31:30Z

@DaveCTurner @ywangd
I created new PR #111438.
I found simpler and smaller implementation, that does not have problems introduced here. So I think continue discussion there would be easier since less clutter/noise.

In new PR no changes in pipelining sequencing, no more PipelinedObjects, simplified wrapper for HttpObjectAggregator that does not intercept "decode" method, simplified interface for streamed content, I still use reactive streams as a model but only with minimal required methods.

ywangd · 2024-07-31T05:08:57Z

Would it be OK to close this PR if this is not intended to be continued? It would help to tell which one to follow more easily.

add support for partial http requests with pipelining

a0a293a

mhl-b added >enhancement :Distributed/Network Http and internode communication implementations Team:Distributed Meta label for distributed team. v8.16.0 labels Jul 25, 2024

mhl-b requested review from DaveCTurner and Tim-Brooks July 25, 2024 03:01

mhl-b requested a review from ywangd July 25, 2024 03:02

mhl-b commented Jul 25, 2024

View reviewed changes

Comment thread ...ransport-netty4/src/main/java/org/elasticsearch/http/netty4/Netty4HttpPipeliningHandler.java Outdated

DaveCTurner reviewed Jul 25, 2024

View reviewed changes

mhl-b changed the base branch from main to partial-rest-requests July 25, 2024 15:32

ywangd reviewed Jul 26, 2024

View reviewed changes

add rest handler example

2203572

mhl-b commented Jul 28, 2024

View reviewed changes

mhl-b requested review from DaveCTurner and ywangd July 28, 2024 04:43

DaveCTurner reviewed Jul 28, 2024

View reviewed changes

mhl-b mentioned this pull request Jul 30, 2024

Add http request content stream support #111438

Merged

mhl-b removed the request for review from Tim-Brooks July 30, 2024 02:23

mhl-b closed this Aug 1, 2024


		import java.util.concurrent.Flow;

		public class Netty4RequestContentPublisher implements Flow.Publisher<HttpContent> {

Conversation

mhl-b commented Jul 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jul 25, 2024

Uh oh!

Uh oh!

DaveCTurner left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mhl-b commented Jul 25, 2024

Uh oh!

ywangd left a comment

Choose a reason for hiding this comment

Uh oh!

ywangd Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

ywangd Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mhl-b commented Jul 28, 2024

Uh oh!

mhl-b Jul 28, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b Jul 28, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b Jul 28, 2024

Choose a reason for hiding this comment

Uh oh!

mhl-b Jul 28, 2024

Choose a reason for hiding this comment

Uh oh!

DaveCTurner left a comment

Choose a reason for hiding this comment

Uh oh!

mhl-b commented Jul 30, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ywangd commented Jul 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mhl-b commented Jul 25, 2024 •

edited

Loading

DaveCTurner left a comment •

edited

Loading

mhl-b commented Jul 30, 2024 •

edited

Loading

ywangd commented Jul 31, 2024 •

edited

Loading