-
-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Implement charset handling in WebRequestConcern
- The `force_encoding` and `unzip` options in WebsiteAgent are moved to WebRequestConcern so other users of the concern such as RssAgent can benefit from them. - WebRequestConcern detects a charset specified in the Content-Type header to decode the content properly, and if it is missing the content is assumed to be encoded in UTF-8 unless it has a binary MIME type. Not all Faraday adopters handle character encodings, and Faraday passes through what is returned from the backend, so we need to do this on our own. (cf. lostisland/faraday#139) - WebRequestConcern now converts text contents to UTF-8, so agents can handle non-UTF-8 data without having to deal with encodings themselves. Previously, WebsiteAgent in "json"/"text" modes and RssAgent would suffer from encoding errors when dealing with non-UTF-8 contents. WebsiteAgent in "html"/"xml" modes did not have this problem because Nokogiri would always return results in UTF-8 independent of the input encoding. This should fix #608.
- Loading branch information
Showing
3 changed files
with
65 additions
and
21 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters