Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Synapse can fail to move large media uploads from temporary storage to disk in time before the client disconnects #13009

Open
reivilibre opened this issue Jun 9, 2022 · 3 comments
Labels
A-Media-Repository Uploading, downloading images and video, thumbnailing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@reivilibre
Copy link
Contributor

This issue is a distilled version of #12937.

When a file is uploaded to Synapse's media store, Twisted buffers it into a temporary file and gives Synapse an IO handle to it once it's been fully received.
The issue is that, with sufficiently large files and slow storage, it's possible that it will take more than 30 seconds to copy the bytes out of that temporary file into the correct location in the media store.

Clients such as Element Web will time out 30 seconds after having sent the last byte. (It also wouldn't seem unusual for a reverse proxy to do something similar.)
When this happens, the disconnection causes Twisted to close the temporary file and so Synapse's copy is interrupted.

The error produced in the logs looks as follows:

synapse.http.site - 362 - INFO - POST-8959 - Connection from client lost before response was sent
synapse.http.server - 183 - ERROR - POST-8959 - Failed handle request via 'UploadResource': <XForwardedForRequest at 0x7f5bada94c40 method='POST' uri='/_matrix/media/r0/upload?filename=2.0gb' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 366, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 396, in _async_render
    callback_return = await raw_callback_return
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/upload_resource.py", line 96, in _async_render_POST
    content_uri = await self.media_repo.create_content(
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_repository.py", line 178, in create_content
    fname = await self.media_storage.store_file(content, file_info)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 94, in store_file
    await self.write_to_file(source, f)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 101, in write_to_file
    await defer_to_thread(self.reactor, _write_file_synchronously, source, output)
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/usr/local/lib/python3.9/site-packages/synapse/logging/context.py", line 970, in g
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 312, in _write_file_synchronously
    shutil.copyfileobj(source, dest)
  File "/usr/local/lib/python3.9/shutil.py", line 205, in copyfileobj
    buf = fsrc_read(length)
ValueError: I/O operation on closed file
synapse.http.server - 795 - WARNING - POST-8959 - Not sending response to request <XForwardedForRequest at 0x7f5bada94c40 method='POST' uri='/_matrix/media/r0/upload?filename=2.0gb' clientproto='HTTP/1.1' site='8008'>, already disconnected.

We would benefit overall by writing the bytes straight into a file on the correct filesystem as they're received from the client (either to the correct location, or to a temporary location with a rename operation upon completion). This would also mean that the client doesn't have to 'wait around' for the file copy to happen, since it's been happening during the upload (and in most cases, disk I/O is probably faster than the client's upload speed anyway!). In short: it will reduce upload latency a bit for large files.

(An uglier workaround would be to prevent the file copy from blocking the response, returning the MXC URL before it's even ready and finishing the copy in the background. I don't think it's great to give an MXC URL that's not ready — it could lead to races.)

Either solution probably involves some amount of spelunking into how Twisted manages these temporary files.

@reivilibre reivilibre added S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. labels Jun 9, 2022
@Vetchu
Copy link
Contributor

Vetchu commented Jul 12, 2022

I think it relates to how twisted reactors are used and old connections are cleaned up (along with any streams [including IO stream of file in synapse] opened by them) after all response data was sent back to client - I'm working on a fix for that.

@H-Shay H-Shay added the O-Occasional Affects or can be seen by some users regularly or most users rarely label Aug 25, 2022
@YamatoRyou
Copy link

YamatoRyou commented Sep 10, 2022

#12937
I use other clients to reproduce the file upload behavior.
The test object is a 3.5 GB file, the client is matrix-commander.
The maximum file size for the homeserver.yaml configuration is 8192 MB.

{2CD75CAC-D017-5CAD-0AF6-5E13D9C57A60}

Final result:
Upload failed, matrix-commander returned M_UNKNOWN Internal server error.
This is an additional finding, I'm giving feedback here first, not sure if it helps to track down this issue.

@MadLittleMods MadLittleMods added the A-Media-Repository Uploading, downloading images and video, thumbnailing label Apr 25, 2023
@nooblag
Copy link

nooblag commented Jul 12, 2023

Hi all, I'm just chiming in to follow this, as I have the same issue with any uploads larger than ~300MB (the I/O isn't sufficiently fast enough at peak times o keep up with the 30 second timeout documented above.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Media-Repository Uploading, downloading images and video, thumbnailing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Major Major functionality / product severely impaired, no satisfactory workaround. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

6 participants