-
Notifications
You must be signed in to change notification settings - Fork 3.2k
[core] remove chardet dep from azure.core.rest #19962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
a13b248
2c9d57b
9de3b62
2be52d4
916aafc
a7a1a85
d4c38a7
be24aa7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -50,11 +50,6 @@ | |
| from urlparse import urlparse # type: ignore | ||
| except ImportError: | ||
| from urllib.parse import urlparse | ||
| try: | ||
| import cchardet as chardet | ||
| except ImportError: # pragma: no cover | ||
| import chardet # type: ignore | ||
| from ..exceptions import ResponseNotReadError | ||
|
|
||
| ################################### TYPES SECTION ######################### | ||
|
|
||
|
|
@@ -285,22 +280,21 @@ def from_pipeline_transport_request_helper(request_class, pipeline_transport_req | |
| ) | ||
|
|
||
| def get_charset_encoding(response): | ||
| # type: (...) -> Optional[str] | ||
| content_type = response.headers.get("Content-Type") | ||
|
|
||
| if not content_type: | ||
| return None | ||
| _, params = cgi.parse_header(content_type) | ||
| encoding = params.get('charset') # -> utf-8 | ||
| if encoding is None: | ||
| if content_type in ("application/json", "application/rdap+json"): | ||
| # RFC 7159 states that the default encoding is UTF-8. | ||
| # RFC 7483 defines application/rdap+json | ||
| encoding = "utf-8" | ||
| else: | ||
| try: | ||
| encoding = chardet.detect(response.content)["encoding"] | ||
| except ResponseNotReadError: | ||
| pass | ||
| if encoding is None or not lookup_encoding(encoding): | ||
| return None | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We want to default to "utf-8"("utf-8-sig"), right?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. After talking with @lmazuel, we don't default the value of |
||
| return encoding | ||
|
|
||
| def decode_to_text(encoding, content): | ||
| # type: (Optional[str], bytes) -> str | ||
| if encoding == "utf-8": | ||
| encoding = "utf-8-sig" | ||
| if encoding: | ||
| return content.decode(encoding) | ||
| return codecs.getincrementaldecoder("utf-8-sig")(errors="replace").decode(content) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -138,7 +138,7 @@ def test_response_no_charset_with_ascii_content(send_request): | |
|
|
||
| assert response.headers["Content-Type"] == "text/plain" | ||
| assert response.status_code == 200 | ||
| assert response.encoding == 'ascii' | ||
| assert response.encoding is None | ||
| assert response.text == "Hello, world!" | ||
|
|
||
|
|
||
|
|
@@ -151,7 +151,7 @@ def test_response_no_charset_with_iso_8859_1_content(send_request): | |
| request=HttpRequest("GET", "/encoding/iso-8859-1"), | ||
| ) | ||
| assert response.text == u"Accented: Österreich" | ||
| assert response.encoding == 'ISO-8859-1' | ||
| assert response.encoding is None | ||
|
|
||
| def test_response_set_explicit_encoding(send_request): | ||
| # Deliberately incorrect charset | ||
|
|
@@ -168,7 +168,7 @@ def test_json(send_request): | |
| request=HttpRequest("GET", "/basic/json"), | ||
| ) | ||
| assert response.json() == {"greeting": "hello", "recipient": "world"} | ||
| assert response.encoding == 'utf-8-sig' # for requests, we use utf-8-sig instead of utf-8 bc of requests behavior | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we still need these three tests? Maybe one is good enought?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not quite sure which tests you mean by the three tests, but I feel like more tests is better, so might as well not get rid of them |
||
| assert response.encoding is None | ||
|
|
||
| def test_json_with_specified_encoding(send_request): | ||
| response = send_request( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to clarify this is a breaking change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure, @annatisch do you think we should clarify it's breaking, because this is in a provisional package? Worried we're being scary for non-azure.core.rest users for nothing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xiangyan99 talked to @lmazuel and he recommended adding a section like "Breaking changes in provisional package", so I added that to the changelog and listed this fix under there. Thanks!