Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support urllib3 v2 #128

Merged
merged 12 commits into from
Sep 26, 2023
Merged

Support urllib3 v2 #128

merged 12 commits into from
Sep 26, 2023

Conversation

Mr0grog
Copy link
Member

@Mr0grog Mr0grog commented Sep 25, 2023

Back in #118 we “fixed” things for urllib3 v2 by marking this package as only compatible with v1, so users wouldn't wind up with bad dependency combinations. However, we ultimately need to actually support urllib3 v2 and remove that limitation, which I intend to do here. Will fix #116.

  • Widen dependency specification for urllib3.
  • Add both versions of urllib3 to CI test matrix.
  • Make tests pass with urllib3 v2.

@Mr0grog Mr0grog force-pushed the 116-actually-really-support-urllib3v2 branch from 9b16d02 to 651bb6e Compare September 25, 2023 01:29
@Mr0grog Mr0grog marked this pull request as ready for review September 25, 2023 23:56
@Mr0grog
Copy link
Member Author

Mr0grog commented Sep 26, 2023

Well, this got a little complicated:

  • The actual updates for urllib3 v2 compatibility (seen in wayback/_client.py) are a bit hacky, but relatively straightforward. Instead of replacing the from_httplib() static method, which doesn’t have a replacement that is as overridable in v2, we patch the HTTPHeaderDict class to repair the headers.

  • VCR became a really complicated configuration. When VCR added support for urllib3 v2, it did it in such a way that the same request and response generate different cassette files for v1 and v2 of urllib3. Namely, the bodies of gzipped responses are recorded decompressed for v2, but still have the headers and such declaring them as gzipped. These cassette files can be written and read safely with either v1 or v2 of urllib3, but cannot be written with one and read with the other. More info in this VCR issue: [4.3.1] urllib3>=2 causes vcrpy to save/read gzip responses uncompressed kevin1024/vcrpy#719

    The solution I wound up with here (see wayback/tests/support.py) is to have a custom serializer that compresses the bodies when serializing to disk and decompresses them when deserializing from disk if urllib3 v2 is in use (it does nothing for v1, which records the actual raw bytes just fine). This is imperfect (we could wind up with bytes that don’t match the content-length header, or other weird edge cases), but it lets the same cassette files be read or written regardless of which version of urllib3 you are testing or developing with.

    I originally went for a simpler solution that just had separate cassette files for each urllib3 version (you can see that in a664a95). After thinking about it, though, I realized this puts a huge burden on contributors, who have to have two development environments set up (one with each urllib3 version) and then record cassettes in both environments when making new tests. Super complex! The custom serializer is way more code, but keeps development straightforward.

    This includes a change to every cassette file, but in all but one case those are just adding .yaml to the end of the file names (I noticed this was missing while messing with all this stuff).

    Some recordings and test implementations have minor changes because stuff changed upstream (but in an innocuous way; if you are testing against a recording it would have kept working, but you wouldn’t have been able to re-generate the recording because some things we expected to fail on the Wayback Machine’s servers started succeeding, which is good. I replaced them with other things that still fail in the same way.).

This should fail for urllib3 v2 and pass for v1, since no changes have been made to the actual source code yet, and v2 should be incompatible. First stop on the way to fully fixing #116.
Unfortunately, VCR currently records cassettes differently with urllib3 v1 vs v2 (in v2, it records decompressed bodies rather than raw bytes). To handle this, we need to have separate cassettes for the different versions of urllib3. More details in kevin1024/vcrpy#719.
I had hoped the underlying issue was fixed in the Wayback Machine, but it is not. I couldn't find a nicer way to do this in urllib3 v2, so the hack is unfortunately even hackier than it used to be.
A few tests couldn't have their cassettes regenerated because things have changed upstream (in ways that are fine, but the tests need to account for them):
- The memento in the basic `get_memento` test has a newer "last" memento.
- The non-playbackable memento has started playing back! But I found another that still doesn't play back and switched to use that.
- A little more care was needed with regenerating the rate limit tests, since these can make wayback put a longer delay on us during the test.
After thinking about this some more, I decided some more hackery here was a worthwhile cost vs. forcing contributors to figure out how to record separate cassettes for each urllib3 version. We now have a custom serializer for VCR that attempts to paper over the recording differences by automatically compressing and decompressing bodies in cassette files that should have been compressed in urllib3 v2. It is basically a pass-through on urllib3 v1.
@Mr0grog Mr0grog force-pushed the 116-actually-really-support-urllib3v2 branch from c0df3e4 to 7d44077 Compare September 26, 2023 00:37
@Mr0grog
Copy link
Member Author

Mr0grog commented Sep 26, 2023

Rebased on main to mix correctly with new changes there.

@Mr0grog Mr0grog merged commit 2d7f999 into main Sep 26, 2023
16 checks passed
@Mr0grog Mr0grog deleted the 116-actually-really-support-urllib3v2 branch September 26, 2023 00:56
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Not compatible with urllib3 2.0
1 participant