Fix caching behavior to always use hashed URLs for the cache filenames #503
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change fixes a significant bug in caching behavior.
Cross-link: resolves GHSA-q6mv-284r-mp36
Why
Under the pre-existing default behavior, two different remote (HTTPS) schemas with the same filename would be cached under the same local filename. e.g.,
https://example.com/foo/schema.json
andhttps://example.com/bar/schema.json
both cache asschema.json
.As a result, cache confusion / poisoning issues would easily occur for users interacting with these schemas.
Prior to this change, users would either have to disable caching or use
--cache-filename
to get correct caching behavior.What
To fix, caching is changed here to always use URL hashes, as was developed for the
refs/
cache.This behavior is refined and relocated to be used by both caching paths in the codebase.
The
--cache-filename
option is no longer important or useful to support, so it is made non-functional and deprecated in this release.It will be removed in a future release; timeline TBD.
Additional Notes
Both caching codepaths -- initial schema and refs -- now use sha256 hashing. This results in a mild (read: negligible) slowdown in exchange for driving the risk of a collision even lower, to effectively 0.
No effort is made to "clean up" stale files in the cache. As documented, users can always blow the whole cache, as in
rm -r ~/.cache/check_jsonschema
, if they want to remove these files.Cache cleanup would also play poorly with users who have different tool versions installed simultaneously in different projects.
I tried some implementation options which required a more significant refactor of the caching layer, but discarded them as
which evaluate negatively against the criterion of "ship a bugfix confidently and expeditiously so that users can upgrade".