Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression with chunked transfer encoding? #340

Closed
pb- opened this issue Mar 20, 2017 · 58 comments
Closed

Regression with chunked transfer encoding? #340

pb- opened this issue Mar 20, 2017 · 58 comments

Comments

@pb-
Copy link

pb- commented Mar 20, 2017

I'm using https://httpbin.org and it looks like chunked transfer encoding does not work any more as expected:

$ curl -X POST https://httpbin.org/post \
 -d '{"message":"BLA"}' \
 -H 'Content-Type: application/json' \
 -H 'Transfer-Encoding: chunked'
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Connect-Time": "1", 
    "Connection": "close", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "Total-Route-Time": "0", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "curl/7.50.1", 
    "Via": "1.1 vegur", 
    "X-Request-Id": "2a06bd5b-06dc-4ced-ac89-fe83cbb6771c"
  }, 
  "json": null, 
  "origin": "<removed>", 
  "url": "https://httpbin.org/post"
}

Observe that data and json do not contain the posted data. The same request works as expected without chunked transfer encoding:

$ curl -X POST https://httpbin.org/post \
 -d '{"message":"BLA"}' \
 -H 'Content-Type: application/json'
{
  "args": {}, 
  "data": "{\"message\":\"BLA\"}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Connect-Time": "0", 
    "Connection": "close", 
    "Content-Length": "17", 
    "Content-Type": "application/json", 
    "Host": "httpbin.org", 
    "Total-Route-Time": "0", 
    "User-Agent": "curl/7.50.1", 
    "Via": "1.1 vegur", 
    "X-Request-Id": "dfbaa440-9871-4a3c-b482-35a68459722a"
  }, 
  "json": {
    "message": "BLA"
  }, 
  "origin": "<removed>", 
  "url": "https://httpbin.org/post"
}

Am I missing something?

@sigmavirus24
Copy link
Contributor

You're not missing anything. This does look like a regression.

For @kennethreitz:

>>> r = requests.post('https://httpbin.org/post', data=iter(('foo', 'bar', 'bogus')))
>>> r.request.headers['Transfer-Encoding']
>>> r.json()['data']
>>> pprint.pprint(r.json())
{u'args': {},
 u'data': u'',
 u'files': {},
 u'form': {},
 u'headers': {u'Accept': u'*/*',
              u'Accept-Encoding': u'gzip, deflate',
              u'Connect-Time': u'0',
              u'Connection': u'close',
              u'Host': u'httpbin.org',
              u'Total-Route-Time': u'0',
              u'Transfer-Encoding': u'chunked',
              u'User-Agent': u'python-requests/2.13.0',
              u'Via': u'1.1 vegur',
              u'X-Request-Id': u'0c60c510-80e2-4d97-a02a-11b006f38376'},
 u'json': None,
 u'origin': u'...',
 u'url': u'https://httpbin.org/post'}

@kennethreitz
Copy link
Contributor

I don't remember this previously being a problem when hosting on Heroku.

@sigmavirus24
Copy link
Contributor

I don't remember this previously being a problem when hosting on Heroku.

Sadly it's reproducible.

@kennethreitz
Copy link
Contributor

@sigmavirus24 you reproduced it on another app on heroku?

@kennethreitz
Copy link
Contributor

I'm down with adding a "on heroku" mode to httpbin, that removes that endpoint from the html, but keeps it in the distributed version.

@sigmavirus24
Copy link
Contributor

@kennethreitz I didn't try reproducing with a different app, I was just able to reproduce that httpbin seems broken. I can try spinning up a different app over the weekend to see if I can reproduce this there too

@kennethreitz
Copy link
Contributor

@sigmavirus24 sounds great

@graingert
Copy link
Contributor

graingert commented Apr 4, 2017

@graingert
Copy link
Contributor

graingert commented Apr 4, 2017

Python 3.6.1 (default, Mar 22 2017, 06:17:05) 
[GCC 6.3.0 20170321] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from httpbin import app as httpbin_app
>>> from pytest_httpbin import serve, certs
>>> server = serve.Server(application=httpbin_app)
>>> server.start()
>>> server
<Server(<class 'pytest_httpbin.serve.Server'>, started 140641123542784)>
>>> server.port
35117
$ curl -X POST http://localhost:35117/post \
 -d '{"message":"BLA"}' \
 -H 'Content-Type: application/json'
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "", 
    "Content-Type": "application/json", 
    "Host": "localhost:35117", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "curl/7.51.0"
  }, 
  "json": null, 
  "origin": "127.0.0.1", 
  "url": "http://localhost:35117/post"
}

@Lukasa
Copy link
Contributor

Lukasa commented Apr 4, 2017

Ok, so that seems to be a werkzeug issue. Using twisted as the WSGI server (twist web --wsgi httpbin.app) I get this:

{
  "args": {}, 
  "data": "{\"message\": \"bla\"}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Length": "18", 
    "Content-Type": "application/json", 
    "Host": "localhost:8080", 
    "User-Agent": "curl/7.51.0"
  }, 
  "json": {
    "message": "bla"
  }, 
  "origin": "127.0.0.1", 
  "url": "http://localhost:8080/post"
}

So my guess would be that this is an interaction between the werkzeug server and httpbin.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 4, 2017

However, the chunked problem is alive and well. Again, testing with Twisted:

>>> import requests
>>> from pprint import pprint as p
>>> def gen():
...     yield b"hello"
...     yield b"world"
... 
>>> r = requests.post('http://localhost:8080/post', data=gen())
>>> p(r.json())
{'args': {},
 'data': '',
 'files': {},
 'form': {},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate',
             'Connection': 'keep-alive',
             'Content-Length': '',
             'Content-Type': '',
             'Host': 'localhost:8080',
             'Transfer-Encoding': 'chunked',
             'User-Agent': 'python-requests/2.13.0'},
 'json': None,
 'origin': '127.0.0.1',
 'url': 'http://localhost:8080/post'}

Wireshark shows the chunked data being sent, so that issue is somewhere in the httpbin codebase.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 4, 2017

So I just double-checked, and twisted has definitely received the chunked data and de-chunked it before passing it to Flask. So this rather suggests that Flask might be dropping the ball here: by the time it gets to httpbin the data is missing, so presumably somewhere in Flask the data is getting lost.

@graingert
Copy link
Contributor

@Lukasa pallets/flask#367 ?

@Lukasa
Copy link
Contributor

Lukasa commented Apr 4, 2017

No. Flask isn't seeing the chunked data here: Twisted has parsed it before Flask ever gets a chance to see it.

@graingert
Copy link
Contributor

And another pytest-httpbin issue kevin1024/pytest-httpbin#33 (comment)

@Lukasa
Copy link
Contributor

Lukasa commented Apr 4, 2017

Ok, so I wrote a really simple repro of this with Flask. Run the following Flask application:

from flask import Flask, request
app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def hello_world():
        return request.data

With the following test script:

import requests

def gen():
    yield b"hello"
    yield b"world"

print("Making first request")
r = requests.post('http://localhost:8000/', data=b'helloworld')
print("Got: %s" % r.content.decode('utf-8'))
print("Making second request")
r = requests.post('http://localhost:8000/', data=gen())
print("Got: %s" % r.content.decode('utf-8'))

Gives the output:

Making first request
Got: helloworld
Making second request
Got: 

I ran this test with gunicorn. This suggests that the flask data is going...somewhere. I don't know where yet.

@Lukasa
Copy link
Contributor

Lukasa commented Apr 4, 2017

Reported as pallets/flask#2229.

@kennethreitz
Copy link
Contributor

✨🍰✨

@graingert
Copy link
Contributor

graingert commented May 8, 2017

@tmenier this issue has never been closed...?

@davidism
Copy link

The way to support this is to wrap your app in the following middleware when running with Gunicorn (or another server that supports chunked encoding).

class InputTerminated:
    def __init__(self, wsgi_app):
        self.wsgi_app = wsgi_app

    def __call__(self, environ, start_response):
        environ['wsgi.input_terminated'] = True
        return self.wsgi_app(environ, start_response)

app.wsgi_app = InputTerminated(app.wsgi_app)

@skyline75489
Copy link

skyline75489 commented Jul 31, 2017

Curl seems to be not working well with httpbin right now:

$ curl -F [email protected] http://localhost:5000/post --header "Transfer-Encoding: chunked" --header "Expect:"
curl: (56) Recv failure: Connection was reset

I'm using curl 7.51.0 and httpbin 0.5.0 running with python -m httpbin.core.

With curl 7.54.0 the response seems OK but there's no file in it:

{
  "args": {},
  "data": "",
  "files": {},
  "form": {},
  "headers": {
    "Accept": "*/*",
    "Content-Type": "multipart/form-data; boundary=------------------------5f6ff6e1cc372527",
    "Host": "localhost:5000",
    "Transfer-Encoding": "chunked",
    "User-Agent": "curl/7.54.0"
  },
  "json": null,
  "origin": "127.0.0.1",
  "url": "http://localhost:5000/post"
}

@kennethreitz
Copy link
Contributor

@kennethreitz
Copy link
Contributor

That did not seem to fix this problem.

@kevin1024
Copy link
Contributor

OK, so 2e94212 broke pytest-httpbin (and caused some issues with requests as well) since pytest-httpbin uses werkzeug.

See also:

BUT since it didn't fix anything, I think we will remove it for the next httpbin release.

kevin1024 added a commit that referenced this issue Aug 27, 2017
This reverts commit 2e94212.

This didn't fix the problem, and caused other issues when using werkzeug
to execute httpbin, so let's remove it.  See also: #340
kennethreitz added a commit that referenced this issue Dec 18, 2017
Added support for chunked encoded requests when running gunicorn. #340
quinnj referenced this issue in JuliaWeb/HTTP.jl Jan 16, 2018
* Replace FIFOBuffer implementation with BufferStream/IOBuffer wrapper.

Add mark/reset support to Form <: IO

Dont call eof() in body(::IO, ...) if content length is known (eof() can block)

Work around JuliaLang/julia#24465

Use non-blocking readavailable() to read serverlog in test/server.jl

Hack test/fifobuffer.jl to work with BufferStream/IOBuffer implementation.

    The BufferStream API does not expose the size of the buffer, so tests
    that passed a fixed size to FIFOBuffer(n) have been tweaked to pass with
    BufferStream's automatic buffer size managment behaviour.

Add note re @test_broken and https://github.com/kennethreitz/httpbin/issues/340#issuecomment-330176449

* Rename canonicalize!(::String) -> tocameldash!(::String)

Move header canonicalization out of parser.

* Remove URI, header and body size limits from parser.

* Replace unused parser option "lenient" with global const "strict".

* Parser simplification

move "close(response.body)" from parser to client

remove @nread -- redundant now that header size is unrestricted

remove unused handlers: onmessagecomplete, onstatus, onmessagebegin,
onheaderscomplete

* Add tests for parsing one charactar at a time. Fix #125 and #126.

Replace local `KEY` in `parse!()` with `parser.previous_field`.

Add onurlbytes(). Accumulates URL in `parser.valuebuffer`.

Use `Ref{String}` to return `extra`/`upgrade` so that caller can
tell
the difference between upgrade with empty string and no upgrade.

Add `s_req_fragment_start` to list of states where the url_mark
should
be reset to 1.

* Store method, major, minor, url and status in Parser - begin to separate Parser from Request/Response

* Use Vector{Pair} instead of Dict for Headers. See #108.

Store Vector{Pair} headers in Parser struct.

Move cookie processing from parser to client.

Move special treatement of multi-line and repeated headers out of
main parser function.

* Use IOBuffer instead of Vector{UInt8} for field and value buffers in Parser.

append!() is replaced by write(). Faster, less allocaiton.

    unsafe_string(pointer(x.buf), length(x.buf))

is replaced by

    String(take!(x.buf))

```

mutable struct s1
    buf::Vector{UInt8}
end

mutable struct s2
    buf::IOBuffer
end

a(x::s1, s) = append!(x.buf, s)

a(x::s2, s) = write(x.buf, s)

r(x::s1) = unsafe_string(pointer(x.buf), length(x.buf))

r(x::s2) = String(take!(x.buf))

function go(x)

    inbuf = Vector{UInt8}([0])

    for i = 1:100000
        inbuf[1] = UInt8(i % 255)
        a(x, inbuf)
    end

    outbuf = String[]

    for i = 1:1000
        push!(outbuf, r(x))
    end
    println(length(outbuf))
end

t1() = go(s1(Vector{UInt8}()))
t2() = go(s2(IOBuffer()))

t1()
t2()

println("t1, Vector...")
@time t1()

println("")
println("t2, IOBuffer...")
@time t2()
```

```
t1, Vector...
1000
  0.101289 seconds (1.12 k allocations: 95.733 MiB, 65.69% gc time)

t2, IOBuffer...
1000
  0.002676 seconds (2.04 k allocations: 241.281 KiB)
```

* reduce main parse! function to a single method by keeping a reference to body in the Parser struct

* Remove *_mark from Parser.

Explicitly handle data within main parse loop instead of using *_mark
to clean up loose ends.

Number of places that callbacks are called is reduced.

Add `while p <= len` inner loop for URL parsing
(follows pattern used for header field and value parsing)

Remove on*bytes functions, just `write()` to valuebuffer or fieldbuffer
as needed.

* Remove (mostly) unused Parser.nread field

Purpose is not clear.
Wasn't working anyway in some cases in origin master branch.
e.g. In the "Foo: F\01ailure" test case nread was buffer length + 1.

* Explicitly return HPE_OK (not errno) in Parser in places where there is
no error.

Remove redundant write-back of parser.state.

* Clean up gotos and main loop condition in parser

* Replace @debug(PARSING_DEBUG, ParsingStateCode(p_state)) in every ifelse
with single @debug at top of main loop.

* Move header and body procssing out of parser (via callbacks)

* throw message incomplete error from parse()
fix tests for invalid content length

* move upgrade flag to Parser struct so that state is preserved between calls to parse!()
ensure that catch-all return at end of main loop is only used for incomplete messages

* throw ArgumentError for zero len passed to parse!

* simplify ParsingError display

* Replace multiple status var returned by parse!() with state accessor
functions.

Replace multiple return points in main parse!() loop with single return
at the bottom.

Make s_message_done a loop termination condidion.

* replace "while true" read loop in client with "while !eof(socket)" loop

* throw errors directly from parse!()

* Call show(r) for verbose logging and write(r) for comms.
(was using string(r) for both).

Work around for JuliaLang/MbedTLS.jl#113

Try to rationalise Base.write, Base.string, Base.show and opts.logbody (WIP)

* fix parse error when fragmentation splits CRLF

* simple debug macro

* function absuri(u::URI, context::URI)

* parser interface tweaks in server (untested)

* Work in progress. Separation of client functionality into layers.

New modules:
 - Connect.jl - Make a simple TCL or SSL connection.
 - Connections.jl - Connection pool with interleaved requests/responses.
 - Messages.jl - Refactored Request and Response.
 - Body.jl - Statick and streaming bodies.

parser.jl:
 - add isheadresponse flag
 - add onheaderscomplete callback
 - remove "extra" view
 - return number of bytes consumed instead of using "extra" view
 - change bytes parameter to be a SubArray

client.jl and types.jl
 - Unfinished refactoring of redirects, connection management etc

* Added closeread/closewrite interface to manage pooling/interleaving.

Simplify pool implementation.

* Add high level client interfaces
 - CookieRequest
 - RetryRequest

Add redirect and header normalisation to SendRequest

Move ::IO API extensions from Connect.jl to IOExtras.jl (unread!,
closeread, closewrite)

Add chunked mode to write(::IO, ::Body), add setlengthheader().

Replace writebusy and readlock with writecount/readcount for managing
connection interleaving in Connections.jl

Clean using and import statements.

Hook up StatusError

Replace old parse() function with Request(::String) and
Response(::String) const
ructors.

Replace "offset" scheme in URIs with SubStrings (WIP).

* Move default port (80/443) to getconnection.

Change `hostname` to `host` per obsoleting of `hostname` in RFC:
https://tools.ietf.org/html/rfc3986#appendix-D.2

Add `hostport` that returns e.g. "foobar.com:8080" or "[1:2::3:4]:8080"
(old `host` function returned "foobar.com8080" or "1:2::3:48080").

Replace Offset with SubString in urlparser.jl

* simplify Base.print(io::IO, u::URI), just print string directly

* simplify Base.print(io::IO, u::URI), just print string directly

* Split util functions into: Pairs.jl, Stirngs.jl, debug.jl, parseutils.jl

* debug tweaks

* tweak import of MbedTLS.SSLContext

* Update client.jl API to use refactored internals.

Bodies.jl - Only use chunked encoding if length is unknown.

Move redirect from SendRequest to CookieRequest layer so that cookies
can be
processed before redirect.

header() and setheader() - make header lookup case-insensitive

Revert to original FIFOBuffer implementation

* improve show(io::IO, c::Connection)

* Parsers.jl cleanup

Split output fields from struct Parser into struct Message.

Docstrings and cosmetics.

Pass Parsers.Message to p.onheaderscomplete callback.

Add const NOMETHOD.

Use @PasserT for inner parser assertions (see enable_passert)

Rename @strictcheck -> @errorifstrict

Move http_should_keep_alive to server.jl

Use HPE_INVALID_INTERNAL_STATE instead of error("unhandled state")

Fix up end of loop assertion: p_state == s_message_done || p == len

* doc examples

* Architecture and Internal Interface Documentation

* Whoops

* Split stack furthur into seperate modules.

See module module RequestStack and const DefaultStack in HTTP.jl

* more refinement of layer separation, doc updates

* note that header functions are case-insensitive

* doc typos

* Cache Parser in ConnectionPool.

Make stack dynamically configurable.

* simplify Base.read!(::IO, ::Parser) per JuliaLang/MbedTLS.jl#114

* fix prior broken commit

* test/REQUIRE JSON

* fix for syntax: unexpected "catch" on 0.7

* temporarily limit travis to v0.6/linux

* Ensure that all old client.jl keyword options are handeled or produce a
depricat
ion message.

Simplify Client.options by using ::Vector{Tuple{Symbol,Any}} instead of
::RequestOptions.

Add minimal flag in HTTP.jl to use a stripped down stack config.

* Revert rename of parser.jl -> Parsers.jl to avoid making GitHub's diff
display confused.

* tweaks to "minimal" configuration

* fix connection pool locking for concurrent access to same connection

* Parser tweaks

 - waitingforeof() true if p.state == s_start_req_or_res

 - Check isheadresponse before F_CHUNKED in s_headers_done state.
   HEAD response can include chunked, but has no body.

 - Consume trailing end of line after message.
   Saves unreading a stray newline at the end of a message.

* cosmetics

* add state and status to ParsingError. Handle eof() before message has begun

* fix read/write locks in ConnectionPool.jl

* typo in @err macro

* add note about header fragmentation

* ConnectionPool scheme refinement.

Add `max_duplicates` setting.

Up to `max_duplicates` for a certian scheme/host/port, create new
connections if the existing connections are already locked for writing.

If there are aleady `max_duplicates` connections fo a scheme/host/port,
return one of the existing connections (even though it is locked for
writing by another task). The new call to lock(c.writelock) will then
block, forcing the calling task to wait.

This seems to work nicely with the AWSS3 concurrency tests:
https://github.com/samoconnor/AWSS3.jl/blob/master/test/runtests.jl#L231

Also added a new test at the bottom of test/client.jl

Inspection with tcpdump/wireshark shows that multiple requests are being
written to the socket before the first response comes back; and once
responses start comming, requests and responses stream continuously,
utilising both halves of the TCP connection.

* HTTP.jl
 - Update top-level RequestStack.request method to convert body
   stream/data to a ::Body ASAP.
 - Pass response_body as positional (not kw) arg.

Bodies.jl
 - Add isstreamfresh(::Body) to check if stream is untouched and safe to
   use in a retry.

Messages.jl
 - Add exception field to ::Response, set by writeandread() when there
   is an error processing the request. Rethrown by by wait() and
   waitforheaders().

SocketRequest.jl
 - Handle exceptions thrown by writeandread() background task for
   streamed Response Bodies.

RetryRequest.jl
 - Retry for Base.ArgumentError.msg == "stream is closed or unusable"
 - Don't retry if the previous attempt has altered the state of the
   request stream or the response stream.

ConnectionPool.jl

 - Use writecount/readcount to ensure that the readlock is obtained in
   the correct sequence so that Responses match Requests.
   See closewrite().
 - Revised pool strategy:
    - Try to find a connection with no readers or writers, then
    - Open new connections up to the limit, then
    - Try to find a connection with pending readers, but writable, then
      - Sleep waiting for writes to finish, then try again.
 - The previous scheme ended up over-interleaving requests while other
   connections sat idle.
 - New async test cases to validate connection interleaving.

-- INSERT --

* v0.7 compat

* remove old async test, see new test/async.jl

* v0.7 updates

* split parse! into parseheaders! and parsebody!

* remove unreachable state s_dead

* add readbody! and readheaders!

* API notes

* API notes

* API notes

* More extreme async and streaming tests: async.jl

New HTTPStream <: IO module replaces Bodies.jl
 - Brings together Parser, Message and Connection
 - Provides transparent IO-compatible writing/reading of chunked bodies.
 - Provides a HTTP Body-aware eof()
 - Supports better streaming via new open() API.

New HTTP.open() API function for streaming bodies: HTTP.open(...) do io
... end
 - Removes the need for tasks and condition signaling.

parser.jl
 - Remove callback functions.
 - Reduce interface down to parseheaders() and parsebody()
 - parseheaders is now called as: parseheaders(p, bytes) do header ...
   end
 - parseheaders returns a SubArray of the unprocessed bytes.
 - parsebody returns a tuple of SubArrays: the processed and unprocessed
   bytes.

Messages.jl
 - Replace ::Body with ::Vector{UInt8}. Simpler.
 - Add explict Response.request and Request.response fields.
 - Remove obsolete waiting functions
 - Remove connectparser. Parser no longer needs connecting.
 - Add readtrailers()
 - Remove writeandread() - now handled by StreamLayer

ConnectionRequest.jl:
 - close io on error

ConnectionPool.jl
 - Add reuse_limit option. e.g. httpbin.org seems to drop the connection
   after
   100 requests, so it's better to set a limit to avoid sending hundreds
of
   requests to a connection when only the first few will get processed.

IOExtras.jl
 - add unread! methods for IOBuffer and BufferStream (for use in
   testing)

Move MessageLayer above RetryLayer and ExceptionLayer to give RetryLayer
access
to Request and Response structs. Used to determine retryability of body
streaming.

MessageRequest.jl
 - add bodylength and bodybytes functions (with many methods for
   different types)
 - use these to set Conent-Length/Transfer-Encoding

RetryRequest.jl
 - Use special const values and === to mark "body_was_streamed" to avoid
   reusing a stream.

consts.jl
 - remove unused consts

Move VERSION specific stuff to compat.jl

* Move request() function from RequestStack module to top-level HTTP module.

Add async streaming PUT and GET tests using AWS S3 (added simple
AWS4AuthRequest.jl layer to sign requests).

* whoops

* More async streaming tests (and resulting bug-fixes)

ConnectionPoolLayer
 - accept connectionpool::Bool=true option instead of seperate
   ConnectLayer

ConnectionPool.jl
 - Add NonReentrantLock, wrapper for ReentrantLock with assertion
   of no multiple locking.
 - Many new locking logic assertions
 - Make locking logic more rigid

HTTPStream.jl
 - Fix write to read transition logic in eof()

RetryRequest.jl
 - Reset response body before processing retry.

parser.jl
 - rename setheadresponse() to setnobody()

* whoops

* whoops

* whoops

* need eof bugfix in +MbedTLS 0.5.2

* tweak AWS4AuthRequest.jl debug levels

* DEBUG_LEVEL = 0 by default

* added lockedby() to compat.jl

* type annotation

* type annotation

* cosmetics

* reenstate verbose= option

* ConnectionPool.jl enhancements.

Remove NonReentrantLock for simplicity, testing indicates that locking
is ok now, retain assertions

Add pipeline_limit option.

Split startread() code out of closewrite().

Cosmetics

move localport and peerport to IOExtras.jl

* test tweaks

* added TimeoutLayer

* Concurrency enhancements to Handle RFC7230 6.5, early termination.

 - Add struct ConnectionPool.Transaction <: IO to manage state of a
   single transaction (Request/Response) on a shared Connection.
   The Transaction struct holds a sequence number and a reference
   to the connection.

 - Decouple read/write lock/unlock sequecing in ConnectionPool.
   Each Transaction now knows its sequence number.
   closewrite() no longer calls startread(). Reading can now begin
   before writing has finished, or start much later without the need
   to atomically unlockread+lockwrite.

 - Remove redundant Connection.writelock.

 - Handle RFC7230 6.5, early send body termination on error:
   send body runs in an @async task and closeread(::HTTPStream)
   checks for error code and "Connection: close".

 - iserror(::Messages) false if status is 0 (not yet recieved)

 - Don't write(::IO, ::Request) the whole request in StreamLayer,
   instead startwrite(::HTTPStream) sends the headers automatically
   and StreamRequest.jl sends the body (static or streamed).

 - Callstartwrite(::HTTPStream) in HTTPStream{Response} constructor
   to send headers automatically when stream is created.

 - Rename readheaders(::HTTPStream) -> startread(::HTTPStream)

 - Rename writeend(::HTTPStream) -> closewrite(::HTTPStream)

 - Rename close(::HTTPStream) -> closeread(::HTTPStream),
   StreamLayer now calls closeread (not close),
   ConnectionPoolLayer now calls close when pooling is disabled
   (i.e. one request per connection mode).

 - Add @require from https://github.com/JuliaLang/julia/pull/15495/files

 - IOExtras.jl stubs for startwrite(), startread() and closeread()

* typo

* ConnectionPool eof() handle closed Transaction.

Remove auto startwrite() in HTTPStream constructor (better to have
the symetry of startwrite()/closewrite() in request(StreamLayer, ...) ).

In closeread(http::HTTPStream) don't call closeread(::Transaction)
if the Transaction is already closed (e.g. due to exception or early
abort).

Split out default_iofunction from request(StreamLayer, ...)

* fix pipline_limit off-by-one, add +Base.close(c::Connection) = Base.close(c.io)

* more variation of parameters in test/async.jl

* Add test/WebSockets.jl to test connection upgrade.

* Ignore 100 Continue message in startread(http::HTTPStream)

* Test for 100-continue

* MUST NOT automatically retry a request with a non-idempotent method
https://tools.ietf.org/html/rfc7230#section-6.3.1

* fix deadlock when no conncetion duplication is allowed

* https://tools.ietf.org/html/rfc7230#section-6.3.2

 "A user agent SHOULD NOT pipeline requests after a
  non-idempotent method, until the final response
  status code for that method has been received"

* Fix isrecoverable()/isidempotent() bug

Retry on 408 timeout and 403 forbidden (credentials timeout is fixable
by retrying).

Improve retry debug output.

* Add loopback test for abort while sending body (& fix bugs)

Add loopback tests for streaming and collection body types (& fix bugs).

Do closewrite() in close(t::Transaction) to ensure aborted connections
are stuck in writable state.

Implement post-abort exception handling in StreamRequest.jl

* dont set chunked header when upgrade header is set

* add tests for pipeline_limit and duplicate_limit

* Pass Content-Length:0 to S3 GET in tests to avoid automatic chunked header.

Tweak debug levels.

* added tests for no pipelinging after non-idempotent

* Fix for closed connections stuck in pool.
When connection failed during write, readcount was one less than
writecount. Remove read/write count check from purge. Now simply
remove closed connections from pool. Any outstanding Transactions
have their own reference to the Connection anyway.

* loopback test connection pool cleanup

* close before purge in close(c::Connection)

* Set Content-Length: 0 by default for iofunction GET requests

* tweak delays in loopback test to remove race conditions

* dont run AWS S3 tests if creds ENV not set

* whoops

* docstring cleanup, work in progress

* arch doc

* Doc cleanup

* fix !minima/minimal bug in stack()

* Doc cleanup.

Wrap IO errors in IOError type.

* Remove Connect.jl module.

Rename duplicate_limit to connection_limit.

Rename timeout to readtimeout (to match original API).

Use const nobody for default empty body to avoid allocation.

Make default headers = Header[] rather than just [] for type stability.

Move IOExtras unread!(::IOBuffer) and unread!(::BufferStream) to
../test/

Improve Streams.jl doc strings.

Add `URI(uri::URI) = uri` to avoid reconstructing URIs

* doc updates

* typo

* Fix show(::IOError).
Close response_stream after writing in StreamRequest.jl.

* handlers tweaks for compatibility with HTTP.Request

* remove redundant connectionclosed(p::Parser), replaced by hasheader(http, "Connection", "close")

* added @Ensure to debug.jl

* remove unneeded abbeviation of request to req

* added hasheader(::Message, key, value)

* ConnectionPool and HTTP.Stream tweaks for server mode compatibility.

ConnectionPool:
 - Make startwrite/closewrite locling symetrical with startread/closread.
 - Use a Condition and busy flag instead of ReentrantLock to esnure consistent
   behaviour regardless of what task the start/end functions are called from.
 - Add Connection.sequence for initialisation of new Transactions's seq No.
 - Base.isopen(t::Transaction) returns false after read and write are closed.

MessageRequest.jl
 - Don't set "chunked" header, this is now done in startwrite(http::Stream)
   for client and server modes.

Streams.jl
 - add messagetowrite(http::Stream{*})
 - Set "chunked" header in startwrite(http::Stream) when needed.
 - Call startwrite() in first call to Base.unsafe_write()
 - Add closewrite(http::Stream{Request}) and
        closeread(http::Stream{Request}) for server mode.

* cosmatics

* WIP integrating connection pipelining and HTTP.Stream with server.jl

* Add HTTP.listen server API

HTTP.listen() do http
    write(http, "Hello World!")
end

Add HTTP.WebSockets.listen API as test case for HTTP.listen

HTTP.WebSockets.listen() do ws
    write(ws, "WebSockets Echo Server...\n")
    while !eof(ws);
        s = String(readavailable(ws))
        println(("in: $s")
        write(ws, "echo: $s\n")
    end
end

* Parser cleanup post "Connection: Upgrade" testing.

 - rename ULLONG_MAX => unknown_length
 - rationalise reset!() / constructors
 - move some init from reset!() to state s_start_req_or_res
 - replace flags with chunked::Bool and trailing::Bool
 - dead code removal:
   - No need for special treatment of Connection: and Upgrade: headers.
     Parser does not use these internally.
     (Content-Length: and Transfer-Encoding: are still used to determine how
      to parse body)
   - Hard-coded list of methods precludes use of other registered/custom
     methods. e.g. https://tools.ietf.org/html/rfc3253#section-3.5.
     Method is a token: https://tools.ietf.org/html/rfc7230#section-3.1.1
                        https://tools.ietf.org/html/rfc7230#section-3.2.6
     So, parse method until first non-token character.

* Remove commented out dead code.
add parse_token function, used to parse method.

* add Base.readbytes!(t::Transaction, a::Vector{UInt8}, nb::Int)

* Messages.jl

Fix case insensitiveness of hasheader

Add RFC compliant bodylength(::Request/::Response) methods.
See nodejs/http-parser#403

Update readbody() to only use Parser for chunked bodies.

* Revise Streams.jl to bypass parser for non-chunked bodies.

readavailable(::HTTP.Stream) for non chunked messages is now a very thin
wrapper for readavailable(::TCPSocket) (or readbytes!(::TCPSocket, ...)
if nb_available() is more than Content-Length). Parser overhead is
removed. !unread overhead is removed.

* use ConnectionPool.byteview conveniance fn in Base.readavailable(http::Stream)

* Remove redundant header interpretation from Parser.

bodylength(::HTTP.Message) implements RFC compliant body lenth
determination. Parser no longer needs to interpret Content-Length
or "chunked" headers. No Need to "parse" non chunked boides.
Just let the client read them directly from the stream.

Update tests per: nodejs/http-parser#403

* #135 (review)

* #135 (review)

* doc tweak

* typo

* #135 (review)
#135 (review)

* doc fix

* doc tweaks

* doc tweak

* remove redundant import in Messages.jl

* typos, and add escapepath to URIs module

* Rename Nitrogren -> Servers

042dab6#r26814240

Add logging macros to compat.jl

* 0.6 tweaks

Rename some files to match module names.

* update README

* Merge samoconnor#1

Merged by hand, many of these changes were already made.

* compat for Val

* compat for Val

* IOExtras.closeread::Stream{Request} fix for server mode

* Add http://localhost:8081 server for ConnectionPool debug.

Connecting to http://localhost:8081 while running the async test
shows an auto-refreshing page showing connection pool state.

* JuliaLang/julia#25479

* latest v0.7 master compat tweaks

 - f(;kw...), kw seems to now be an iterator.
   The only way I could see to get a NamedTuple was merge(NamedTuple, kw)

 - b"fooo" now yeilds CodeUnit , so Vector becomes AbstractVector in sniff

* move STATUS_CODES const to Messages.jl (only place it is used) and use Vector instead of Dict

* Don't include consts.jl in top level HTTP namespace

rename STATUS_CODES => STATUS_MESSAGES

Use symbols for ParsingErrorCode to avoid duplicating list of error
names.

Put constants only used by parser in Parsers.jl.

Remove Method enum. Don't want to limit methods to the ones in the enum.

* Server doc updates

* untested integration of Server and serve with listen

* FIXME

* Add MicroLogging for Windows v0.6

* Rename handlers.jl to Handlers.jl

* untested implementation of HTTP.serve() calling HTTP.listen()

* Server integration and tests.

Add kw option to set http_version.

In closewrite(::Stream{Request}) close the connection for non keep-alive HTTP/1.0.

* oops

* disable extra debug printf in test/server.jl

* Return status 413 for too-large headers.
#135 (comment)

* randomise server port number in tests

* iso8859_1_to_utf8 per #141

* typo

* Naming consistancy tweaks.

URLs are the subset of URIs that specify a means of acessing a resource
in addition to specifying the resource name. e.g http://foo.com/bar?a=b is a URL
but bar?a=b is not.

Changes:
 - Variables of type URI that must contain a compelte URL renamed uri -> url.
 - Remove the "URL" constructor for URIs. It seems to have been constructing
   URI's that were not URLs for some input, which seems wrong. The other "URI"
   constructors should suffice.
 - Simplify the main kwargs URI constructor by putting the sanity checks in
   preconditions.

Changes:
 - Renames HTTP.Request.uri -> HTTP.Request.target
 - Added "target=" kw arg to request(MessageLayer, ...) to allow setting
   the target to somthing that cannot be derived from the request URL.
 - Remove the old behaviour (I assume untested) of handling "CONNECT" as
   a special case and using the Request URL host:port as the target.
   I seems that this would have created a request for the server to proxy
   to itself.
 - Remove the now redundant option isconnect from the url parser.
   A client making a CONNECT Request should use target="$host:$port".
   A server implemetning a CONNECT proxy should use http_parse_host(target)

* update tests and compat

* try to fix Inexact Error in win32

* Update @Ensure macro to print values of failed comparison.

(expr-fu stolen from Test.jl)

* Add postconditions to URI(;kw...) and parse(::URI, ...) to ensure
that the result can be losslessly transformed back into the input.

Don't allow '*' at the start of URI path.
1226c4b#r26878192

Add sentinal const blank_userinfo to handle test cases with empty '@'
userinfo.

Remove UF_XXX and UF_XXX_MASK constants. It's simpler to use the state
machine constants to keep track of state, and use local variables to
keep track of values

* handle asterix-form Request Target https://tools.ietf.org/html/rfc7230#section-5.3.4
@tmenier
Copy link

tmenier commented Feb 17, 2018

Can I ask why this issue was closed? From my tests it doesn't appear to be fixed. Just me? Or hasn't the fix been deployed yet?

@davidism
Copy link

@tmenier the fix is not part of a released version yet.

@javabrett
Copy link
Contributor

javabrett commented Feb 18, 2018

@tmenier if you build the project's Dockerfile you should be able to:

docker build --tag httpbin .
docker run -d --name httpbin -p 8080:8080 httpbin
curl -X POST http://localhost:8080/post  -d '{"message":"BLA"}'  -H 'Content-Type: application/json'  -H 'Transfer-Encoding: chunked'
{
  "args": {}, 
  "data": "{\"message\":\"BLA\"}", 
  "files": {}, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Content-Type": "application/json", 
    "Host": "localhost:8080", 
    "Transfer-Encoding": "chunked", 
    "User-Agent": "curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.21 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2"
  }, 
  "json": {
    "message": "BLA"
  }, 
  "origin": "172.17.0.1", 
  "url": "http://localhost:8080/post"
}

Edit: Of course you don't need to build the container yourself, as the current kennethreitz/httpbin image also has the fix.

A couple of follow-up questions of my own:

  • Should https://httpbin.org or some sub-page return the exact version-number of the project, either in the response or as a header? Might help with this sort of enquiry.
  • Above I wrote:

I have a general question about writing a test to demonstrate this bug and any candidate fix.

... asking whether there were any current project integration-style tests that run via the Gunicorn server, versus just running in Flask. If there aren't, is it worthwhile adding something? That would help with tests that require Gunicorn to assert a fix.

@kennethreitz
Copy link
Contributor

deployed latest to production

@davidism
Copy link

@kennethreitz I think this exact fix is in Werkzeug now, so you could technically remove this patch if you're using the latest version of Werkzeug.

@javabrett
Copy link
Contributor

@davidism nice we should try that.

Should we spin-up a framework to run some integration tests? I'm still a little uneasy that all existing tests might only run through the internals of Flask.

@davidism
Copy link

I may have been misremembering, it's been a while. I think Werkzeug's dev server now sets wsgi.input_terminated correctly, and I reached out to other WSGI servers but not all of them set the flag yet.

@javabrett
Copy link
Contributor

pallets/werkzeug@8eb665a since Werkzeug 0.13.

httpbin is now werkzeug = ">=0.14.1", but was not as was locked to 0.12.1 at the time the patch was merged, so yes probably time to re-test without it.

@javabrett
Copy link
Contributor

I reverted my patch commits and tested (manually, there is no automated test available for this) in a Docker (so running Gunicorn) and httpbin did not return the sent-data - it returned "json": null. So something is still not right.

@samoconnor
Copy link

Still not working for POST with Transfer-Encoding: chunked

using HTTP
f = Base.BufferStream() ; write(f, "hey"); close(f)
HTTP.post("http://httpbin.org/post"; body = f)

wireshark:

POST /post HTTP/1.1
Host: httpbin.org
Transfer-Encoding: chunked

3
hey
0

HTTP/1.1 411 Length Required
Date: Mon, 19 Feb 2018 01:14:20 GMT
Connection: close
Content-Type: text/html
Server: meinheld/0.6.1
Via: 1.1 vegur

<html><head><title>Length Required</title></head><body><p>Length Required.</p></body></html>

@javabrett
Copy link
Contributor

@samoconnor yes I see the same, and also note that httpbin.org is not hosted using Gunicorn, at least not directly. I don't have the technical details of that deployment. The fix as it stands only works for Gunicorn-hosted httpbin.

Response-header from localhost Docker-hosted:

HTTP/1.1 200 OK
Server: gunicorn/19.7.1
...
X-Powered-By: Flask

Response-header from httpbin.org:

HTTP/1.1 411 Length Required
Server: meinheld/0.6.1
Via: 1.1 vegur

So clearly the hosting is different, and likely there is some reverse-proxying. Might have to wait for a response from someone familiar with the hosting set-up.

In the interim I can assure you the Docker version works ...

@javabrett
Copy link
Contributor

I can further confirm that if the currently-published Docker is run with non-standard launch, instead running the following (borrowing from https://github.com/kennethreitz/httpbin/blob/master/Procfile#L1 )

gunicorn httpbin:app -b 0.0.0.0:8080 --log-file - --worker-class="egg:meinheld#gunicorn_worker"

... also yields:

curl -v -X POST http://localhost:8080/post -d '{"message":"BLA"}'  -H 'Content-Type: application/json'  -H 'Transfer-Encoding: chunked' 
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 8080 (#0)
> POST /post HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.54.0
> Accept: */*
> Content-Type: application/json
> Transfer-Encoding: chunked
> 
> 11
* upload completely sent off: 24 out of 17 bytes
* HTTP 1.0, assume close after body
< HTTP/1.0 411 Length Required
< Content-Type: text/html
< Server: meinheld/0.6.1
< 
* Closing connection 0
<html><head><title>Length Required</title></head><body><p>Length Required.</p></body></html>

This problem seems to occur well before Gunicorn or the middleware itself is processing the request - problem in the Meinheld Gunicorn worker?

@samoconnor
Copy link

should this issue be re-opened?

@javabrett
Copy link
Contributor

I can reproduce the 411 coming from meinheld for a chunked-request, but don't know how to fix that yet.

@kennethreitz
Copy link
Contributor

We don't have to use meinheld.

@kennethreitz
Copy link
Contributor

Is the problem reproduced when meinheld is dropped?

@javabrett
Copy link
Contributor

This problem cannot be reproduced (e.g. in a Docker container build running Gunicorn) without the --worker-class="egg:meinheld#gunicorn_worker". This is in meinheld and I found its logic where it return 411 and wrote a failing test. Will raise an issue and reference this issue.

@javabrett
Copy link
Contributor

My code-change to meinheld was merged, so when it is next-released it will no-longer generate the 411 error-response when Transfer-Encoding: chunked and missing or zero Content-Length, which is as it should be. I don't think there's nightly builds other than on Travis.

@kennethreitz
Copy link
Contributor

How long do you think that will be? We can drop meinheld in the meantime, if needed. It's not necessary. Just helps handle bursts of traffic.

@kennethreitz
Copy link
Contributor

Sometimes, this services receives over 8k requests per second.

@javabrett
Copy link
Contributor

There are no recent meinheld releases, but that could be needs driven, as there aren't a lot of unreleased commits: mopemope/meinheld@v0.6.1...master . I guess we could request a release.

If httpbin.org is very busy, perhaps it could retain meinheld and the chunked-request problem until the release, but other less-busy servers could have meinheld disabled for now. I don't know if that's a good idea or not. I guess it depends on whether stability or perf issues are expected on certain servers without meinheld.

virgiliojr94 pushed a commit to virgiliojr94/httpbin-chat2desk that referenced this issue Jul 1, 2023
This reverts commit 2e94212.

This didn't fix the problem, and caused other issues when using werkzeug
to execute httpbin, so let's remove it.  See also: postmanlabs#340
virgiliojr94 pushed a commit to virgiliojr94/httpbin-chat2desk that referenced this issue Jul 1, 2023
postmanlabs#340.

In this commit:

- when we see a Transfer-Encoding: chunked request, and the server is gunicorn,
  we set environ wsgi.input_terminated, which is required by Werkzeug in the
  absence of Content-Lenght, or it will empty the data stream.
- for chunked requests to non-gunicorn, return 501 Not Implemented.
virgiliojr94 pushed a commit to virgiliojr94/httpbin-chat2desk that referenced this issue Jul 1, 2023
Added support for chunked encoded requests when running gunicorn. postmanlabs#340
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests