Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow mixing IDN and params #265

Merged
merged 17 commits into from
Mar 29, 2020
Merged

Allow mixing IDN and params #265

merged 17 commits into from
Mar 29, 2020

Conversation

twm
Copy link
Contributor

@twm twm commented Jan 27, 2020

Fixes #264.

@twm twm marked this pull request as ready for review January 29, 2020 06:32
@twm twm requested a review from a team January 29, 2020 06:32
@twm twm mentioned this pull request Mar 18, 2020
Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor tweaks in-line.

from urllib import urlencode

try:
# The old location was quixotically deprecated and might actually be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"quixotically" feels like maybe a little bit of unnecessary editorializing ;-). Do you have a link you want to drop in here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have... strong feelings here.

I do not think that deprecating this import alias is a meaningful improvement to the Python standard library. This alias could be left in place, undocumented, forever, at little cost.

I have fixed this deprecation warning in countless places since it was introduced, both publicly and privately. I'm really quite tired of it. It's makework that doesn't deliver value to anyone.

I feel that this change is frankly disrespectful to library maintainers. I mean, here are some CPython devs trying to get the html5lib maintainer to do a release so that they don't break his code: html5lib/html5lib-python#419 (comment) This is not a good look.

And then, well, they eventually decide not to do it yet. Note Victor's response:

FYI [removal of] aliases to ABC in collections have been reverted in Python 3.9, but they will be removed again in Python 3.10: https://bugs.python.org/issue39674

if isinstance(url, unicode):
parsed_url = URL.from_text(url)
else:
parsed_url = URL.from_text(url.decode('ascii'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ascii really the right choice here? This method's supposed to return a Deferred, so this creates an awkward error-handling situation at the very least. I'd probably just go with charmap personally, but if you want a UnicodeDecodeError maybe make it asynchronous?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the Deferred bit (which seems a defect orthogonal to the one I'm addressing here), I don't really understand the objection here. Could you file a fresh issue with a use case?

src/treq/client.py Show resolved Hide resolved

if isinstance(url, unicode):
url = URL.fromText(url).asURI().asText().encode('ascii')
url = bytes(parsed_url)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind leaving the explicit .as_uri().to_text().encode() idiom here, rather than invoking it indirectly via __bytes__ / __str__? This works, but it's not documented super strictly in hyperlink and it's more or less a concession to requests' API design, not something I want to rely on really heavily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it looks like they're equivalent:

https://github.com/python-hyper/hyperlink/blob/56ed2291c0fc6a402b91db38d18d1ad4aed8efbf/src/hyperlink/_url.py#L1584-L1591

I'd hoped that URL might avoid some of those copies, but 🤷‍♂️.

if parsed_url.query:
qs.extend([parsed_url.query, b'&'])
Parameter names and values are coerced to unicode. As a special case,
`bytes` are decoded as ASCII.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Again, why ASCII rather than latin-1?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches what the urllib.parse functions did. TBH I'd rather be stricter about types than this, it's just here for compatibility with the previous implementation. Switching to charmap may make sense, but I'd want to add additional test cases to formalize the expected behavior.

src/treq/client.py Outdated Show resolved Hide resolved
src/treq/test/test_client.py Show resolved Hide resolved
src/treq/client.py Show resolved Hide resolved
src/treq/client.py Show resolved Hide resolved
setup.py Outdated Show resolved Hide resolved
@twm
Copy link
Contributor Author

twm commented Mar 26, 2020

Thank you for the review, @glyph! I will try to address the rest of your feedback soon.

@twm twm merged commit a673d24 into master Mar 29, 2020
@twm twm deleted the hyperlink-url-212 branch March 29, 2020 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode URL (IDN) support doesn't mix with params
2 participants