Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix several issues with beatmap online ID management #30453

Merged
merged 7 commits into from
Oct 30, 2024

Conversation

bdach
Copy link
Collaborator

@bdach bdach commented Oct 30, 2024

RFC.

This is an all-in-one diff with assorted changes after today's daily challenge fiasco. Caution, wall of text incoming.

Taking it commit by commit:

Adjust test to match desired reality

Self-explanatory.

Remove problematic online ID check

Would "close" #27332 by essentially throwing up hands and deciding that it's probably fine not to check the garbage IDs in the .osu file.

Only use MD5 when performing metadata lookups

Both online and offline using the cache.

The rationale behind this change is that after 1a2e323, TestPartiallyMaliciousSet() fails in a way that cannot be reconciled without this sort of change.

The test exercises a scenario where the beatmap being imported has an online ID in the .osu file, but its hash does not match the online hash of the beatmap. This turns out to be a more frequent scenario than envisioned because of users doing stupid things with manual file editing rather than reporting issues properly.

The scenario is realistic only because the behaviour of the endpoint responsible for looking up beatmaps is such that if multiple parameters are given (e.g. all three of beatmap MD5, online ID, and filename), it will try the three in succession, and the local metadata cache implementation reflected this implementation.

Because online ID and filename are inherently unreliable in this scenario due to being directly manipulable by clueless or malicious users, neither should not be used as a fallback.

Remove no longer applicable test

After 2c2f307 the behaviour set up on the mock in the test in question is no longer realistic. Online metadata lookups will no longer fall back to online ID or filename.

Prefer not deleted models when picking model instances for reuse when importing

This fell out while investigating why the issue with online IDs mismatching in the .osu could be worked around by importing the map three times in total when starting from it not being available locally.

Here follows an explanation of why that "helped".

Import 1:

  • The beatmap set is imported normally.
  • Online metadata population sees the online ID mismatch and resets it on the problematic beatmap.

Import 2:

  • The existing beatmap set is found, but deemed not reusable because of the single beatmap having its ID reset to -1.
  • The existing beatmap set is marked deleted, and all the IDs of its beatmaps are reset to -1.
  • The beatmap set is reimported afresh.
  • Online metadata population still sees the online ID mismatch and resets it on the problematic beatmap.

Note that at this point the first import is still physically present in the database but marked deleted.

Import 3:

  • When trying to find the existing beatmap set to see if it can be reused, the one pending deletion and with its IDs reset - the remnant from import 1 - is returned.
  • Because of this, validateOnlineIds() resets online IDs on the model representing the current reimport.
  • The beatmap set is reimported yet again.
  • With the online ID reset, the online metadata population check for online ID mismatch does not run because the IDs were reset to -1 earlier.

Preferring undeleted models when picking the model instance for reuse prevents this scenario.

bdach added 5 commits October 30, 2024 07:35
Both online and offline using the cache.

The rationale behind this change is that in the current state of
affairs, `TestPartiallyMaliciousSet()` fails in a way that cannot be
reconciled without this sort of change.

The test exercises a scenario where the beatmap being imported has an
online ID in the `.osu` file, but its hash does not match the online
hash of the beatmap. This turns out to be a more frequent scenario than
envisioned because of users doing stupid things with manual file editing
rather than reporting issues properly.

The scenario is realistic only because the behaviour of the endpoint
responsible for looking up beatmaps is such that if multiple parameters
are given (e.g. all three of beatmap MD5, online ID, and filename), it
will try the three in succession:

	https://github.com/ppy/osu-web/blob/f6b341813be270de59ec8f9698428aa6422bc8ae/app/Http/Controllers/BeatmapsController.php#L260-L266

and the local metadata cache implementation reflected this
implementation.

Because online ID and filename are inherently unreliable in this
scenario due to being directly manipulable by clueless or malicious
users, neither should not be used as a fallback.
After dd06dd0e699311494412e36bc3f37bb055a01477 the behaviour set up on
the mock in the test in question is no longer realistic. Online
metadata lookups will no longer fall back to online ID or filename.
… importing

This fell out while investigating why the issue with online IDs
mismatching in the `.osu` could be worked around by importing the map
three times in total when starting from it not being available locally.

Here follows an explanation of why that "helped".

Import 1:
- The beatmap set is imported normally.
- Online metadata population sees the online ID mismatch and resets it
  on the problematic beatmap.

Import 2:
- The existing beatmap set is found, but deemed not reusable
  because of the single beatmap having its ID reset to -1.
- The existing beatmap set is marked deleted, and all the IDs of
  its beatmaps are reset to -1.
- The beatmap set is reimported afresh.
- Online metadata population still sees the online ID mismatch
  and resets it on the problematic beatmap.

Note that at this point the first import *is still physically present
in the database* but marked deleted.

Import 3:
- When trying to find the existing beatmap set to see if it can be
  reused, *the one pending deletion and with its IDs reset -
  - the remnant from import 1 - is returned*.
- Because of this, `validateOnlineIds()` resets online IDs
  *on the model representing the current reimport*.
- The beatmap set is reimported yet again.
- With the online ID reset, the online metadata population check for
  online ID mismatch does not run because *the IDs were reset to -1*
  earlier.

Preferring undeleted models when picking the model instance for reuse
prevents this scenario.
@bdach bdach added type:online realm deals with local realm database next release Pull requests which are almost there. We'll aim to get them in the next release, but no guarantees! labels Oct 30, 2024
@bdach bdach requested a review from a team October 30, 2024 07:25
@peppy peppy self-requested a review October 30, 2024 07:33
// unfortunately it appears that historically stable mappers would apply crude hacks to fix unspecified "issues" with submission
// which would amount to reusing online IDs of other beatmaps.
// this means that the online ID in the `.osu` file is not reliable, and this cannot be fixed server-side
// because updating the maps retroactively would break stable (by losing all of users' local scores).

if (beatmapInfo.OnlineID == -1 && result.MD5Hash != beatmapInfo.MD5Hash)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a quick read of the code, I'm not sure that we can ever hit this scenario anymore. The code run prior to this is a tryLookup which is based on md5 matching alone, so at this point it should be guaranteed that the MD5 matches.

I may be wrong, but we can probably do away with this method altogether?

Copy link
Collaborator Author

@bdach bdach Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, and the assertion holds up as far as I can tell. Removed in 2b0fd35 (alongside a test that made no sense anymore due to the same exact thing - the mock behaved in a way incongruous with the actual reality).

The scenario that remaining guard was trying to protect against is
obviated by and no longer possible after
776fabd.
@peppy
Copy link
Member

peppy commented Oct 30, 2024

Not 100% about the filename removal. Stable still uses this as a fallback in order to let users update to the latest version in-game. Thoughts?

@bdach
Copy link
Collaborator Author

bdach commented Oct 30, 2024

My thoughts are that online ID in the beatmap file and the beatmap filename are similarly reliable, i.e. not at all, and if we're bringing back one, we may as well bring both back, and just rely on md5 checks at score submission time to prevent funny business.

Just depends on how many layers of safety you're comfortable with I guess. I prefer more, which is why I deleted anything that isn't md51. I am not sure how to conclusively determine whether excluding filename is going to be a problem or not.

Footnotes

  1. Even the reliability of that could be considered in question because it's known to be a broken hash algo, but it's much more difficult to exploit that one, and moving to anything else is gonna take 2 weeks minimum

@peppy
Copy link
Member

peppy commented Oct 30, 2024

My thoughts are that it hasn't ever presented an issue to date in stable, so we could probably still keep it around. I think it's the only way we can recover the case where a user downloads a beatmap with .osu files that are outdated (ie. MD5 doesn't match database hash). This can regularly happen in cases like:

  • Beatmap pack archives (aren't updated when rare fixes are made to ranked maps)
  • Mirrors incorrectly having outdated version of beatmaps (brought up in recent threads)

@bdach
Copy link
Collaborator Author

bdach commented Oct 30, 2024

I don't have willpower to fight on that point, but you do realise I will have to revert 2b0fd35 etc and bring the hash equality checks back if I'm to allow filename again? Because otherwise I can take any set and rename any difficulty file to any old thing and it will show as ranked so long as it matches something else.

Unless you're again saying that that's fine and that the onus of checking the hash should be on the score submission process?

@peppy
Copy link
Member

peppy commented Oct 30, 2024

Unless you're again saying that that's fine and that the onus of checking the hash should be on the score submission process?

I think this is valid thinking, yeah. After all, a user could just edit things in realm post-import. I feel we were doing far too much on import to try and get things in a good state, and probably need to fall back to "what worked in stable" for now

@bdach
Copy link
Collaborator Author

bdach commented Oct 30, 2024

Alright, filename lookup restored with no guardrails, online ID still excluded.

I have tested that the submission process correctly detects this and prevents score token issuance when attempting to play a doctored beatmap.

@peppy peppy enabled auto-merge October 30, 2024 09:57
@peppy peppy disabled auto-merge October 30, 2024 10:22
@peppy peppy merged commit 50be7fb into ppy:master Oct 30, 2024
8 of 9 checks passed
@bdach bdach deleted the everything-is-terrible branch October 30, 2024 10:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
next release Pull requests which are almost there. We'll aim to get them in the next release, but no guarantees! realm deals with local realm database size/L type:online
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants