Encode JSON responses on a thread in C #10844

erikjohnston · 2021-09-17T14:05:50Z

Currently we use JsonEncoder.iterencode to write JSON responses, which ensures that we don't block the main reactor thread when encoding huge objects. The downside to this is that iterencode falls back to using a pure Python encoder that is much less efficient and can easily burn a lot of CPU for huge responses. To fix this, while still ensuring we don't block the reactor loop, we encode the JSON on a threadpool using the standard JsonEncoder.encode functions, which is backed by a C library.

Doing so, however, requires respond_with_json to have access to the reactor, which it previously didn't. There are two ways of doing this:

threading through the reactor object, which is a bit fiddly as e.g. DirectServeJsonResource doesn't currently take a reactor, but is exposed to modules and so is a PITA to change; or
expose the reactor in SynapseRequest, which requires updating a bunch of servlet types.

I went with the latter as that is just a mechanical change, and I think makes sense as a request already has a reactor associated with it (via its http channel).

There were two issues: 1) `channel` is actually private type so its hard to type `.site`, and 2) `.site` was actually overwriting an existing member and so we need to rename it.

clokep · 2021-09-17T14:46:32Z

Is the slow bit iterating a list of events? I wonder if we could iterate that manually and spit out the [ / ] manually as well, encoding each event with the fast C encoder. (Essentially a custom producer for this case.)

erikjohnston · 2021-09-17T15:01:00Z

Is the slow bit iterating a list of events? I wonder if we could iterate that manually and spit out the [ / ] manually as well, encoding each event with the fast C encoder. (Essentially a custom producer for this case.)

This isn't specific to events, any large response will suffer from the same problem. Most of the time this will be due to events, but special casing things like /send_join and initial syncs just feels a bit tedious, though would work 🤷

erikjohnston · 2021-09-17T16:33:57Z

synapse/push/emailpusher.py

@@ -184,7 +184,7 @@ async def _unsafe_process(self) -> None:

            should_notify_at = max(notif_ready_at, room_ready_at)

-            if should_notify_at < self.clock.time_msec():
+            if should_notify_at <= self.clock.time_msec():


This looks like it has nothing to do with the changes in the rest of the PR right?? WRONG. Without this change the tests would tight loop and then OOM. What I think is going on is that moving the encoding of JSON to a thread has subtly changed the timings of the tests, causing us to hit this line such that should_notify_at is now, causing us to not send the response but then schedule the function to be rerun in 0 seconds, causing the tests to enter a tight loop and never make any progress.

reivilibre

Seems reasonable to me

reivilibre · 2021-09-17T16:47:09Z

synapse/http/server.py

+    Python implementation (rather than the C backend), which is *much* more
+    expensive.


and presumably holds the global interpreter lock, making thread pooling it useless?

erikjohnston · 2021-09-18T09:47:29Z

Note to self I should probably change this to use byte producer rather than request.write all at once

richvdh · 2021-09-20T09:05:15Z

The downside to this is that iterencode falls back to using a pure Python encoder that is much less efficient and can easily burn a lot of CPU for huge responses.

I'm surprised about that. I'd have thought we'd have checked it before switching to iterencode, and it doesn't fit with my memory of how the json encoder works. I'll assume you've double-checked, though!

erikjohnston · 2021-09-20T09:11:31Z

The downside to this is that iterencode falls back to using a pure Python encoder that is much less efficient and can easily burn a lot of CPU for huge responses.

I'm surprised about that. I'd have thought we'd have checked it before switching to iterencode, and it doesn't fit with my memory of how the json encoder works. I'll assume you've double-checked, though!

Yeah, it is surprising. FWIW the key bit here is, where _one_shot is always False for iterencode (I think because the C lib does the encoding all at once):

https://github.com/python/cpython/blob/5822ab672a1d26ff1837103c1ed8e4c3c2a42b87/Lib/json/encoder.py#L246-L256

erikjohnston · 2021-09-20T13:30:26Z

tests/test_server.py

@@ -105,7 +105,7 @@ def _throw(*args):
        def _callback(request, **kwargs):
            d = Deferred()
            d.addCallback(_throw)
-            self.reactor.callLater(1, d.callback, True)
+            self.reactor.callLater(0.5, d.callback, True)


This is needed because we time out responses that take 1s or longer to process. I'm not 100% why the change in this PR would make this suddenly start failing, but 🤷

erikjohnston · 2021-09-21T08:00:36Z

I've updated this with a commit to use a producer rather than writing all the content at once, for the reasons mentioned in #3701

richvdh

I think I need to ask you to break this up. There's a lot changing, including a bunch of prep work, and going through commit-by-commit isn't working, as the implementation has obviously evolved a bit as you've been working on it. sorry.

richvdh · 2021-09-21T13:19:42Z

synapse/http/server.py

+            request.write(json_str)
+            request.finish()
+        except RuntimeError as e:
+            logger.info("Connection disconnected before response was written: %r", e)


is this really the only possible reason that a RuntimeError can be raised?

richvdh · 2021-09-21T13:22:25Z

synapse/http/server.py

    return NOT_DONE_YET


+def _write_json_bytes_to_request(request: Request, json_bytes: bytes) -> None:


is this specific to being json bytes, or could it be used for any byte sequence?

richvdh · 2021-09-21T13:27:00Z

synapse/http/server.py

-    # note that this is zero-copy (the bytesio shares a copy-on-write buffer with
-    # the original `bytes`).
-    bytes_io = BytesIO(json_bytes)
-
-    producer = NoRangeStaticProducer(request, bytes_io)
-    producer.start()
+    _write_json_bytes_to_request(request, json_bytes)


why is this changing in this PR? It seems unrelated to how we encode JSON responses?

erikjohnston · 2021-09-24T10:37:59Z

I've split this up into separate PRs, with the final one being: #10905

erikjohnston added 8 commits September 17, 2021 14:41

Add types to http.site

d18c71a

Fix SynapseRequest.site type.

341a92b

There were two issues: 1) `channel` is actually private type so its hard to type `.site`, and 2) `.site` was actually overwriting an existing member and so we need to rename it.

Add reactor to SynapseRequest

2f8abe0

Require SynapseRequest for respond_with_json

fbcbfb4

Add a _write_json_to_request_in_thread

40c99c2

Encode JSON responses on a thread

e369a20

Newsfile

8521a0c

Fix tests

6c0bc18

erikjohnston requested a review from a team September 17, 2021 14:46

erikjohnston added 3 commits September 17, 2021 17:12

Fix tests

d1f25c6

Don't create temporary function

c2d84ed

Fix tests

0ce9315

erikjohnston commented Sep 17, 2021

View reviewed changes

reivilibre approved these changes Sep 17, 2021

View reviewed changes

erikjohnston commented Sep 20, 2021

View reviewed changes

erikjohnston force-pushed the erikj/faster_json_response branch from 1bf359a to 3f89ad7 Compare September 20, 2021 16:35

Write JSON response using a producer

16e8580

erikjohnston force-pushed the erikj/faster_json_response branch from 3f89ad7 to 16e8580 Compare September 21, 2021 07:58

erikjohnston requested a review from a team September 21, 2021 08:00

richvdh suggested changes Sep 21, 2021

View reviewed changes

This was referenced Sep 21, 2021

Add types to http.site #10867

Merged

Add reactor to SynapseRequest and fix up types. #10868

Merged

Encode JSON responses on a thread in C, mk2 #10905

Merged

erikjohnston closed this Sep 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode JSON responses on a thread in C #10844

Encode JSON responses on a thread in C #10844

erikjohnston commented Sep 17, 2021

clokep commented Sep 17, 2021

erikjohnston commented Sep 17, 2021

erikjohnston Sep 17, 2021

reivilibre left a comment

reivilibre Sep 17, 2021

erikjohnston Sep 17, 2021

erikjohnston commented Sep 18, 2021

richvdh commented Sep 20, 2021

erikjohnston commented Sep 20, 2021

erikjohnston Sep 20, 2021

erikjohnston commented Sep 21, 2021

richvdh left a comment

richvdh Sep 21, 2021

richvdh Sep 21, 2021

richvdh Sep 21, 2021

erikjohnston commented Sep 24, 2021

		Python implementation (rather than the C backend), which is much more
		expensive.

		return NOT_DONE_YET


		def _write_json_bytes_to_request(request: Request, json_bytes: bytes) -> None:

Encode JSON responses on a thread in C #10844

Encode JSON responses on a thread in C #10844

Conversation

erikjohnston commented Sep 17, 2021

clokep commented Sep 17, 2021

erikjohnston commented Sep 17, 2021

erikjohnston Sep 17, 2021

Choose a reason for hiding this comment

reivilibre left a comment

Choose a reason for hiding this comment

reivilibre Sep 17, 2021

Choose a reason for hiding this comment

erikjohnston Sep 17, 2021

Choose a reason for hiding this comment

erikjohnston commented Sep 18, 2021

richvdh commented Sep 20, 2021

erikjohnston commented Sep 20, 2021

erikjohnston Sep 20, 2021

Choose a reason for hiding this comment

erikjohnston commented Sep 21, 2021

richvdh left a comment

Choose a reason for hiding this comment

richvdh Sep 21, 2021

Choose a reason for hiding this comment

richvdh Sep 21, 2021

Choose a reason for hiding this comment

richvdh Sep 21, 2021

Choose a reason for hiding this comment

erikjohnston commented Sep 24, 2021