Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Server-wide media retention policy #6832

Closed
babolivier opened this issue Feb 3, 2020 · 14 comments · Fixed by #12732
Closed

Server-wide media retention policy #6832

babolivier opened this issue Feb 3, 2020 · 14 comments · Fixed by #12732
Labels
A-Disk-Space things which fill up the disk T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. z-feature (Deprecated Label)

Comments

@babolivier
Copy link
Contributor

babolivier commented Feb 3, 2020

A feature that's both interesting to have and fairly well requested is the ability to configure a media retention policy at the server level.

A first approach would have been to base the TTL of a media on its date of upload, but then we'd likely delete still-in-use avatars, medias used in community descriptions, etc.

Therefore, a preferred approach is to base that TTL on the date it was last accessed at, to ensure we don't delete media that are still being used. FTR, that date of last access is stored by Synapse for both remote and local media, so it's technically doable.

Another thing to consider is that we currently don't have any way in deleting a media in Synapse, so that'd need to be added in.

Also, we'd need to figure out how this feature would handle quarantined media.

cc @rxl881

Related: #6459, #3479, https://github.com/matrix-org/matrix-doc/issues/790

@babolivier babolivier self-assigned this Feb 3, 2020
@richvdh
Copy link
Member

richvdh commented Feb 5, 2020

there are open issues/MSCs around deleting media: we should hunt them down and link them

@babolivier
Copy link
Contributor Author

there are open issues/MSCs around deleting media: we should hunt them down and link them

I've updated the issue's description with the few issues I could find. I didn't find any MSC relating to that though.

@richvdh
Copy link
Member

richvdh commented Feb 5, 2020

Thanks. MSC2278 seems to be the MSC I was thinking of.

@babolivier
Copy link
Contributor Author

After chatting with @neilisfragile and @lampholder about the quarantine concern, the conclusion is that a media retention policy should ignore quarantined media.

@IF-Adin
Copy link

IF-Adin commented Mar 2, 2020

I would like to add that avatars and room avatars should probably be stored in a different way. They mostly seem to be cached, so they are probably not accessed all that often in smaller groups, which mean they would be purged, too.

May add a tag or something to all user and room Avatars?

Or was that what you mean by quarantined data?

@IF-Adin
Copy link

IF-Adin commented Mar 24, 2020

Nevermind, as it turns out i can not read, you have already thought of that.

@MamasLT
Copy link

MamasLT commented May 27, 2020

Great points made by babolivier.

There are few options for dealing with media:
https://github.com/matrix-org/synapse/blob/master/docs/admin_api/purge_remote_media.rst
https://github.com/matrix-org/synapse/blob/master/docs/admin_api/media_admin_api.md

But they don't seem to do the main thing - to delete it on local server. I would agree that there should be a server-wide media retention policy available, that could be set in config in the same way as message retention is now.
The quarantine, as I understand should be done just before deleting the particular media files, so there won't be any issues for clients.
Doing the quarantine and then deleting all media manually (not sure if there is any other way at this time) sounds a bit over complicated. Cheers!

@babolivier babolivier removed their assignment Jul 29, 2020
@NHAS
Copy link

NHAS commented Aug 24, 2020

Not sure if this is the correct place to note this.
However when a general "event" retention policy is applied to a server it doesnt remove media such as images or uploaded files.
This means there are a large number of files that have no event referencing them. This issue would help somewhat to solve that. However the best solution would be for the server retention policy to delete said media.

@anoadragon453
Copy link
Member

The before_ts parameter of the Delete Local Media Admin API should cover this usecase. Setting the keep_profiles parameter to true ensures that room and user avatars are not removed in the process.

This endpoint calls the MediaRepository.delete_old_local_media method. Adding a config option to configure a background job that regularly calls this method with a determined retention period seems plausible to me.

Of course, note that this will only work with media uploaded to Synapse's media repository, rather than using tools such as matrix-media-repo.

@reivilibre
Copy link
Contributor

Deleting media automatically based on TTL is going to be troublesome as Matrix is used in new ways.

Let's say that the problem we actually want to solve is to remove media that has no real references left (a reasonable request, given the disk space this must take up). Unfortunately the homeserver doesn't know which media are pointed to by encrypted events...

As an idea, I wonder if we could implement a hybrid scheme:

  • TTL for media in general (based on last access time)
  • media with known references does not get expired (or has a longer expiry — configurable)

Finally, an MSC could be written to allow some encrypted events to 'opt-in' to declaring which media they point to, in their unsigned portion. This would be useful for applications such as file storage on Matrix, where you don't want your infrequently-accessed files to just go missing one day, but as it's optional, chat clients (etc) can still choose to be secretive and withhold that reference — they will just have to accept that the media can be deleted whilst it's still potentially accessible.

@reivilibre reivilibre added the T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. label Mar 2, 2022
@ptman
Copy link
Contributor

ptman commented Mar 3, 2022

@reivilibre very good point. I think user and room avatars are also in the media repo (mxc://) so garbage collecting those would cause problems. Any other uses for media that aren't suitable for garbage collection? Access time is better than creation/modification time, but still problematic in many usecases.

@anoadragon453
Copy link
Member

The before_ts parameter of the Delete Local Media Admin API should cover this usecase. Setting the keep_profiles parameter to true ensures that room and user avatars are not removed in the process.

Note that this API does operate on last access time, which is updated as users browse encrypted rooms.

I think user and room avatars are also in the media repo (mxc://) so garbage collecting those would cause problems.

This endpoint won't delete room and user avatars if keep_profiles is true.

Access time is better than creation/modification time, but still problematic in many usecases.

We also won't be aware of events referencing local media in rooms that our homeserver isn't in. To that end, last_access time (with a suitably conservative threshold) is probably the best media TTL solution we have available at the moment.

@tastytea
Copy link

tastytea commented May 2, 2022

Any other uses for media that aren't suitable for garbage collection?

It would be unfortunate if media from MSC2545: Image Packs would get deleted by this.

@clokep
Copy link
Member

clokep commented May 31, 2022

Any other uses for media that aren't suitable for garbage collection?

It would be unfortunate if media from MSC2545: Image Packs would get deleted by this.

I filed #12928 about this.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Disk-Space things which fill up the disk T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. z-feature (Deprecated Label)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants