-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Head request is not cached even with cacheable_methods=("GET", HEAD") without a previous GET #337
Comments
This PR aims to improve plugin's performance regarding HTTP requests which are performed to retrieve remote image length: - add a new option `cache_dir` - rely on https://pypi.org/project/CacheControl/ to manage local cache requests For now, it works only for GET requests. See upstream issue: psf/cachecontrol#337
@woodruffw Not sure, I should have written "possibly related issue". I've edited. |
Any chance to see it fixed @woodruffw? Sorry, I cant' help on this since it's far away from my skills. I just need if it's something planned or not, to adapt my proper plans consequently. |
I'll try and find the time to make a fix for this over the coming weekend, but I can't make any promises, sorry. If this is an urgent behavioral change for you, I'd suggest working around it for now 🙂 |
I took a quick look at this, and I think it's unfortunately going to be non-trivial to implement:
To perform a general fix here, we'd probably need to update the cache keying logic to treat Taking a step back: could you share a bit more about your use case? You mentioned that you're performing |
To highlight this underlying architectural decision, here's an example of how CacheControl leaks the cached import requests
from cachecontrol import CacheControl
from cachecontrol.cache import DictCache
sess = CacheControl(
requests.Session(), cache=DictCache(), cacheable_methods=("HEAD", "GET")
)
# misses the cache
resp1 = sess.head("https://example.com")
assert resp1.request.method == "HEAD"
assert not resp1.content
# primes the cache
resp2 = sess.get("https://example.com")
assert resp2.request.method == "GET"
assert resp2.content
# hits the cache from the previous GET
resp3 = sess.head("https://example.com")
assert resp3.request.method == "HEAD"
# fails because the cached GET is returned as HEAD
assert not resp3.content, resp3.content This is arguably incorrect, since a TL;DR: I think that |
Hello,
Thanks for this package, it sounds really useful and fit my needs.
I'm trying to use it to improve the Mkdocs RSS plugin when it comes to retrieve a remote image length as expected by the enclosure tag. For now, a HEAD request is tried to read the value from response content-length header. If it fails,
See related code: https://github.com/Guts/mkdocs-rss-plugin/blob/68c62e5b579b408dbc9999b251bb7c13c562cee8/mkdocs_rss_plugin/util.py#L620-L668
Here comes my quick & dirty dev script to test it quickly:
But reading the log, I can see that the cache is not used, nor even stored:
BUT if I make a GET request to the same resource before the HEAD, the cache is stored AND even read for the HEAD:
Logs:
Possibly related issue: #216
The text was updated successfully, but these errors were encountered: