Skip to content

Commit 2d26aeb

Browse files
authored
Improve multi-byte cutter/chunk (#233)
* 🐛 multi-byte cutter/chunk is not accurate enough on u16, u32 (le/be) * 🔖 bump version 3.0.1-dev
1 parent 5ec4a27 commit 2d26aeb

File tree

3 files changed

+7
-2
lines changed

3 files changed

+7
-2
lines changed

Diff for: CHANGELOG.md

+5
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,11 @@
22
All notable changes to charset-normalizer will be documented in this file. This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
33
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
44

5+
## [3.0.1](https://github.com/Ousret/charset_normalizer/compare/3.0.0...master) (unreleased)
6+
7+
### Fixed
8+
- Multi-bytes cutter/chunk generator did not always cut correctly (PR #233)
9+
510
## [3.0.0](https://github.com/Ousret/charset_normalizer/compare/2.1.1...3.0.0) (2022-10-20)
611

712
### Added

Diff for: charset_normalizer/utils.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -396,7 +396,7 @@ def cut_sequence_chunks(
396396

397397
# multi-byte bad cutting detector and adjustment
398398
# not the cleanest way to perform that fix but clever enough for now.
399-
if is_multi_byte_decoder and i > 0 and sequences[i] >= 0x80:
399+
if is_multi_byte_decoder and i > 0:
400400

401401
chunk_partial_size_chk: int = min(chunk_size, 16)
402402

Diff for: charset_normalizer/version.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22
Expose version
33
"""
44

5-
__version__ = "3.0.0"
5+
__version__ = "3.0.1-dev"
66
VERSION = __version__.split(".")

0 commit comments

Comments
 (0)