Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Lazy Parsing #331

Draft
wants to merge 42 commits into
base: master
Choose a base branch
from
Draft

WIP: Lazy Parsing #331

wants to merge 42 commits into from

Conversation

samoconnor
Copy link
Contributor

This branch adds a number of modules (with passing tests) but does not integrate them into the HTTP request/response processing code.

Lazy parsing has potential performance and security benefits for HTTP clients and servers.

Consider a multi layered server request routing framework. The top level request handling function may only need to look at a single auth-token header to know that it should just close the connection. It can avoid waiting time parting the rest of the headers, which reduces the impact of DOS attacks (and it is immune to malformed header attacks that might be designed to waste memory and/or time processing headers.). If it is decided that a request is authorised, it may be that the top-level router only needs to look at the target URI to decide what to do next, so it does not need to bother parsing all the headers. This pattern can continue down the tree of handlers, so that each layer only needs to decode the parts of the header that are actually needed.

There are similar benefits for clients too. There are many cases where the client of a web services API call just wants to check for 200 OK and then read some json from the response body. The headers may be full of all kinds of meta-data from various intermediate proxies that are never used.

This PR is intended to be a first step at adding some components that can later be used to implement lazy HTTP Message processing.

(The so/lazyintegrate branch has a working (but not optimal) integration of the LazyHTTP Parser/Generator with Messages.jl.)

LazyStrings.jl

This module defines AbstractString methods for accessing sub-strings whose length is not known in advance. Length is lazily determined during iteration.

LazyHTTP.jl

This module defines RequestHeader and ResponseHeader types for lazy parsing of HTTP headers.
RequestHeader has properties: method, target and version.
ResponseHeader has properties: version and status.
Both types have an AbstractDict-like interface for accessing header fields.

The implementation simply stores a reference to the input string. Parsing is deferred until the properties or header fields are accessed. The value objects returned by the parser are also lazy. They store a reference to the input string and the start index of the value. Parsing of the value content is deferred until needed by the AbstractString interface.

          ┌▶"GET / HTTP/1.1\\r\\n" *
          │ "Content-Type: text/plain\\r\\r\\r\\n"
          │  ▲          ▲
          │  │          │
FieldName(s, i=17)      │        == "Content-Type"
          └──────────┐  │
          FieldValue(s, i=28)    == "text/plain"

isvalid.jl

Base.isvalid(h::RequestHeader; obs=false)
Base.isvalid(h::ResponseHeader; obs=false)

Regexs for checking validity of HTTP Headers (for cases where a lazy parser does not notice invalidity, but validity is important)

Nibbles.jl

Iterate over byte-vectors 4-bits at a time.

Used for decoding HPack's Huffman code.

HPack.jl

Lazy Parsing and String comparison for RFC7541
"HPACK Header Compression for HTTP/2".

Use start-of-line index in FieldValue (be more lazy)

Add findstart() function to find start of value when iterating.

Docstrings
…ng stroage that does not implement `ncodeunits`
Make AWS signing code lazy header compatible
@samoconnor
Copy link
Contributor Author

The CI is passing sometimes and timing out sometimes:
https://travis-ci.org/JuliaWeb/HTTP.jl/builds/442040263?utm_source=github_status&utm_medium=notification

The HPack test takes 55MB of simulated HTTP/2 header streams from https://github.com/http2jp/hpack-test-case and processes them in a variety of ways to ensure that lazy random access works, and full iteration works, and random access after full iteration works, and vis versa... All this takes a bit of time. I might have to add an env var for HTTP_JL_RUN_FULL_HPACK_TEST...

@samoconnor
Copy link
Contributor Author

With MbedTLS hanging bug fixed, CI now passing.

@quinnj
Copy link
Member

quinnj commented Oct 24, 2018

This is exciting! Looks like this needs a rebase and perhaps a little squashing along the way? Any recommendations on where to start reviewing or what to look for/worry about?

@samoconnor
Copy link
Contributor Author

Yeah there are most definitely a bunch of commits fiddling with CI/toml etc that I can squash.
I can rebase and do that today if you're ready to take a look.

Any recommendations on where to start reviewing or what to look for/worry about?

I was thinking you should start by just doing a sanity check that this PR is really a no-op as far as current exported functionality goes. Aside from all-new files, the changes should be:

  • *.toml contains mysterious stuff that Pkg3 wants.
  • Little tweaks in IODbug.jl and AWS4AuthRequest.jl to be compatible with new structs.
  • Include new files in HTTP.jl (but the newly included modules don't export anything).

Aside from that I'm happy to answer questions about the new code if you have any.
I've tried to put a reasonable amount of explanatory documentation in the code, but it would be good to know if there are places where stuff doesn't make sense.

@quinnj
Copy link
Member

quinnj commented May 30, 2019

@samoconnor, this would be really great functionality to have, especially the http2 support. Will you be able to pick this back up? Otherwise, I could try to dive in and get it merged in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants