Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Weirdly large file upload failed. #12937

Closed
YamatoRyou opened this issue Jun 1, 2022 · 11 comments
Closed

Weirdly large file upload failed. #12937

YamatoRyou opened this issue Jun 1, 2022 · 11 comments
Labels
A-Media-Repository Uploading, downloading images and video, thumbnailing

Comments

@YamatoRyou
Copy link

YamatoRyou commented Jun 1, 2022

Description

Failed to upload large file. The failure only occurs after the upload progress is complete.
homeserver.yaml configuration allows a single file upload with a maximum size of 2048 MB.
I tested a total of 11 files, ranging in size from 1.0 GB ~ 2.0 GB. There is a failure when uploading a 1.7 GB file.
{2A4C654A-FDEE-69BF-5BDE-BF959730F3B4}

Log when upload fails:

synapse.http.site - 362 - INFO - POST-8959 - Connection from client lost before response was sent
synapse.http.server - 183 - ERROR - POST-8959 - Failed handle request via 'UploadResource': <XForwardedForRequest at 0x7f5bada94c40 method='POST' uri='/_matrix/media/r0/upload?filename=2.0gb' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 366, in _async_render_wrapper
    callback_return = await self._async_render(request)
  File "/usr/local/lib/python3.9/site-packages/synapse/http/server.py", line 396, in _async_render
    callback_return = await raw_callback_return
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/upload_resource.py", line 96, in _async_render_POST
    content_uri = await self.media_repo.create_content(
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_repository.py", line 178, in create_content
    fname = await self.media_storage.store_file(content, file_info)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 94, in store_file
    await self.write_to_file(source, f)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 101, in write_to_file
    await defer_to_thread(self.reactor, _write_file_synchronously, source, output)
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 244, in inContext
    result = inContext.theWork()  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/threadpool.py", line 260, in <lambda>
    inContext.theWork = lambda: context.call(  # type: ignore[attr-defined]
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 117, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/usr/local/lib/python3.9/site-packages/twisted/python/context.py", line 82, in callWithContext
    return func(*args, **kw)
  File "/usr/local/lib/python3.9/site-packages/synapse/logging/context.py", line 970, in g
    return f(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/synapse/rest/media/v1/media_storage.py", line 312, in _write_file_synchronously
    shutil.copyfileobj(source, dest)
  File "/usr/local/lib/python3.9/shutil.py", line 205, in copyfileobj
    buf = fsrc_read(length)
ValueError: I/O operation on closed file
synapse.http.server - 795 - WARNING - POST-8959 - Not sending response to request <XForwardedForRequest at 0x7f5bada94c40 method='POST' uri='/_matrix/media/r0/upload?filename=2.0gb' clientproto='HTTP/1.1' site='8008'>, already disconnected.

Client used for testing:

  • Element Desktop for Windows 1.10.13
  • Element Web (Chromium 102 for x64 Windows)
  • Element for Android 1.4.14

Steps to reproduce

  • Log in to your account;
  • Upload a large file to chatroom, such as 2 GB;
  • Wait for the upload to finish.

Version information

  • Homeserver: n/a

If not matrix.org:

  • Version: 1.60.0

  • Install method: Docker

  • Platform: Synology DiskStation Manager 6.2.3.25426 (Linux Kernel 3.10.105)
@richvdh
Copy link
Member

richvdh commented Jun 1, 2022

looks the same as #6192

@richvdh richvdh closed this as completed Jun 1, 2022
@YamatoRyou
Copy link
Author

YamatoRyou commented Jun 1, 2022

looks the same as #6192

Nope, because:

  • Even without reverse proxy (change server address to http://192.168.x.x:8008), this problem still exists.
  • The consequence of configuring a reverse proxy according to this issue is that uploading files still fails, and the developer tools return an "HTTP 413" error. Before that, a "(canceled)" error was returned.

@richvdh
Copy link
Member

richvdh commented Jun 1, 2022

I don't think #6192 is particularly confined to reverse-proxies, but ok.

Your logs show fairly clearly that the client is closing the connection:

synapse.http.site - 362 - INFO - POST-8959 - Connection from client lost before response was sent

... can you share a tcpdump of the traffic between the client and the server demonstrating the problem, along with logs with timestamps?

@richvdh richvdh reopened this Jun 1, 2022
@H-Shay H-Shay added the X-Needs-Info This issue is blocked awaiting information from the reporter label Jun 1, 2022
@YamatoRyou
Copy link
Author

YamatoRyou commented Jun 1, 2022

I don't think #6192 is particularly confined to reverse-proxies, but ok.

Your logs show fairly clearly that the client is closing the connection:

synapse.http.site - 362 - INFO - POST-8959 - Connection from client lost before response was sent

... can you share a tcpdump of the traffic between the client and the server demonstrating the problem, along with logs with timestamps?

log_and_dump.zip

File Description:
Logs: In order to reduce useless logs as much as possible, and avoid destroying critical information, I recreated new Docker containers and started logging from scratch. Some of these personal privacy-related parts have been erased. Log content: Start from container The time until the file upload failed.
Packet dump: I use WinDump (the computer running WinDump and the test client are the same), the command to get the dump file is: windump -i 2 -w C:\dump.data dst host 192.168.x.x and tcp port 8008. If you analyze this packet dump and find that it is not what you want, then I don't know what to do.

@squahtx
Copy link
Contributor

squahtx commented Jun 1, 2022

What kind of storage does /tmp and the media store live on?

Twisted initially saves the upload to a temporary file, and then synapse copies it to the media store.
At 50 MB/s, it'd take 40 seconds to copy the file, during which I can believe a client would time out.

Twisted handles client disconnection by closing the IO wrapping the temporary file, which would cause a ValueError: I/O operation on closed file in the thread doing the file copy.

@squahtx
Copy link
Contributor

squahtx commented Jun 1, 2022

For the record, Element Web times out the request 30 seconds after it has uploaded the last byte of the file: https://github.com/matrix-org/matrix-js-sdk/blob/12253064d14b6a3f3b043cf3bdd382d520d0c13b/src/http-api.ts#L419-L423

@squahtx
Copy link
Contributor

squahtx commented Jun 1, 2022

Twisted initially saves the upload to a temporary file, and then synapse copies it to the media store.

We could avoid the slow copy by uploading straight to the media store, or a temporary file on the same mount?

On Linux >= 3.11, it's possible to create a temporary file using O_TMPFILE and upgrade it to a permanent file later using linkat (see the notes on O_TMPFILE at https://man7.org/linux/man-pages/man2/open.2.html). We might have to override SynapseRequest.gotLength to create the temporary file ourselves instead of letting twisted do it.

@H-Shay H-Shay removed the X-Needs-Info This issue is blocked awaiting information from the reporter label Jun 2, 2022
@reivilibre
Copy link
Contributor

log_and_dump.zip

This zip file is showing up as password-protected so I can't access the dumps and logs. Can you share a version without the encryption (or if this was intentional, get in touch with the password?)? :)

@reivilibre reivilibre added X-Needs-Info This issue is blocked awaiting information from the reporter and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Jun 7, 2022
@YamatoRyou
Copy link
Author

YamatoRyou commented Jun 7, 2022

log_and_dump.zip

This zip file is showing up as password-protected so I can't access the dumps and logs. Can you share a version without the encryption (or if this was intentional, get in touch with the password?)? :)

Since the last file has not been downloaded for a long time, I removed the decompression password.
I repackaged one.
The content is the same as the previous one.

Unzip password: ***
Please download this file as soon as possible.

log_and_dump_2.zip

@reivilibre
Copy link
Contributor

The logs you've shared with me are showing that the request is taking 31 seconds to be processed, which going by the above (copied below) would mean that Element Web would cut off the upload

Element Web times out the request 30 seconds after it has uploaded the last byte of the file: matrix-org/matrix-js-sdk@1225306/src/http-api.ts#L419-L423

As a note: I'm afraid the TCP dump is not very revealing; the packets seem to have been truncated very short (I'm not sure why, perhaps worth trying with an explicit snaplen using -s 2000 next time :|).
I also think the TCP dump only includes traffic in one direction (I think it may have been useful to see the server's responses).
The TCP dump also seems to cover a different timespan to the logs (it would have been preferable to have them the same).

However I think @squahtx has made some valid points which probably explain what is happening here (especially noting that your logs show that the request is taking 31 seconds, which is more than the 30 seconds than Element Web will cut off after!).

Broadly:

  • Client uploads bytes whilst Twisted writes them into a temporary file
  • Client starts 30 second timer
  • Synapse starts to copy file to the right place
  • Client's timer triggers, disconnects
  • Twisted closes the temporary file, leading to Synapse not being able to finish the copy

The remedy suggested by Sean also makes perfect sense to me: it should be possible to write the file once only and then rename it to the correct place instantly.

@reivilibre
Copy link
Contributor

I'm tracking this as #13009 so that it's easier to pick up without having to follow the stream of investigation here.

If you think I've missed any other problems that are contributing to this issue, please let me know!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Media-Repository Uploading, downloading images and video, thumbnailing
Projects
None yet
Development

No branches or pull requests

6 participants