-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Encode JSON responses on a thread in C #10844
Conversation
There were two issues: 1) `channel` is actually private type so its hard to type `.site`, and 2) `.site` was actually overwriting an existing member and so we need to rename it.
Is the slow bit iterating a list of events? I wonder if we could iterate that manually and spit out the [ / ] manually as well, encoding each event with the fast C encoder. (Essentially a custom producer for this case.) |
This isn't specific to events, any large response will suffer from the same problem. Most of the time this will be due to events, but special casing things like |
@@ -184,7 +184,7 @@ async def _unsafe_process(self) -> None: | |||
|
|||
should_notify_at = max(notif_ready_at, room_ready_at) | |||
|
|||
if should_notify_at < self.clock.time_msec(): | |||
if should_notify_at <= self.clock.time_msec(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it has nothing to do with the changes in the rest of the PR right?? WRONG. Without this change the tests would tight loop and then OOM. What I think is going on is that moving the encoding of JSON to a thread has subtly changed the timings of the tests, causing us to hit this line such that should_notify_at
is now, causing us to not send the response but then schedule the function to be rerun in 0 seconds, causing the tests to enter a tight loop and never make any progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me
Python implementation (rather than the C backend), which is *much* more | ||
expensive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and presumably holds the global interpreter lock, making thread pooling it useless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ja
Note to self I should probably change this to use byte producer rather than |
I'm surprised about that. I'd have thought we'd have checked it before switching to iterencode, and it doesn't fit with my memory of how the json encoder works. I'll assume you've double-checked, though! |
Yeah, it is surprising. FWIW the key bit here is, where |
@@ -105,7 +105,7 @@ def _throw(*args): | |||
def _callback(request, **kwargs): | |||
d = Deferred() | |||
d.addCallback(_throw) | |||
self.reactor.callLater(1, d.callback, True) | |||
self.reactor.callLater(0.5, d.callback, True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed because we time out responses that take 1s or longer to process. I'm not 100% why the change in this PR would make this suddenly start failing, but 🤷
1bf359a
to
3f89ad7
Compare
3f89ad7
to
16e8580
Compare
I've updated this with a commit to use a producer rather than writing all the content at once, for the reasons mentioned in #3701 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to ask you to break this up. There's a lot changing, including a bunch of prep work, and going through commit-by-commit isn't working, as the implementation has obviously evolved a bit as you've been working on it. sorry.
synapse/http/server.py
Outdated
request.write(json_str) | ||
request.finish() | ||
except RuntimeError as e: | ||
logger.info("Connection disconnected before response was written: %r", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this really the only possible reason that a RuntimeError can be raised?
return NOT_DONE_YET | ||
|
||
|
||
def _write_json_bytes_to_request(request: Request, json_bytes: bytes) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this specific to being json bytes, or could it be used for any byte sequence?
# note that this is zero-copy (the bytesio shares a copy-on-write buffer with | ||
# the original `bytes`). | ||
bytes_io = BytesIO(json_bytes) | ||
|
||
producer = NoRangeStaticProducer(request, bytes_io) | ||
producer.start() | ||
_write_json_bytes_to_request(request, json_bytes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this changing in this PR? It seems unrelated to how we encode JSON responses?
I've split this up into separate PRs, with the final one being: #10905 |
Currently we use
JsonEncoder.iterencode
to write JSON responses, which ensures that we don't block the main reactor thread when encoding huge objects. The downside to this is thatiterencode
falls back to using a pure Python encoder that is much less efficient and can easily burn a lot of CPU for huge responses. To fix this, while still ensuring we don't block the reactor loop, we encode the JSON on a threadpool using the standardJsonEncoder.encode
functions, which is backed by a C library.Doing so, however, requires
respond_with_json
to have access to the reactor, which it previously didn't. There are two ways of doing this:DirectServeJsonResource
doesn't currently take a reactor, but is exposed to modules and so is a PITA to change; orSynapseRequest
, which requires updating a bunch of servlet types.I went with the latter as that is just a mechanical change, and I think makes sense as a request already has a reactor associated with it (via its http channel).