UTF-C is a compression for short UTF-8 messages with non-ASCII characters (See the comparisons below).
Note
The texts used here in various languages were translated with Google Translate.
In this example, we were able to remove 6 bytes.
Uncompressed(26): [ 206, 150, 207, 137, 32, 207, 131, 207, 132, 206, 183, 206, 189, 32, 206, 149, 207, 133, 207, 129, 207, 142, 207, 128, 206, 183]
Compressed(20): [26, 206, 150, 207, 137, 32, 131, 132, 206, 183, 189, 32, 149, 207, 133, 129, 142, 128, 206, 183]
In this example, we were able to remove 13 bytes.
Uncompressed(42): [ 231, 167, 129, 227, 129, 175, 227, 131, 168, 227, 131, 188, 227, 131, 173, 227, 131, 131, 227, 131, 145, 227, 129, 171, 228, 189, 143, 227, 130, 147, 227, 129, 167, 227, 129, 132 227, 129, 190, 227, 129, 153]
Compressed(29): [42, 231, 167, 129, 227, 129, 175, 227, 131, 168, 188, 173, 131, 145, 227, 129, 171, 228, 189, 143, 227, 130, 147, 227, 129, 167, 132, 190, 153]
Important
Please create your own comparison and check if this compression is suitable for your project!
# ...
[dependencies]
utf-c = { path = "./utf-c/" }
# We will use `GzEncoder` and `GzDecoder`.
flate2 = "1.0.35"
[profile.release]
strip = true # Automatically strip symbols from the binary
opt-level = 3 # Optimize for size
lto = true # Enable link time optimization
codegen-units = 1 # Maximize size reduction optimizations
CPU: Broadcom BCM2712D0 quad-core Arm Cortex A76 processor @ 2.4GHz
SIMD: NEON
RAM: 1x 16GB LPDDR4X-4267
"👁👄👁" compression and decompression 50000x (12 bytes)
[flate2 | compression ] finished after 926231 µs (31 bytes)
[flate2 | decompression] finished after 335222 µs
[utf-c | compression ] finished after 3325 µs (7 bytes)
[utf-c | decompression] finished after 3546 µs
"טקסט זה נדחס עם UTF-C ו-GZIP ולאחר מכן הושווה. טקסט זה תורגם עם Google Translate ואנו מקווים שהוא תורגם כהלכה, אך אין ערובה לכך" compression and decompression 50000x (204 bytes)
[flate2 | compression ] finished after 1507926 µs (160 bytes)
[flate2 | decompression] finished after 466058 µs
[utf-c | compression ] finished after 51067 µs (129 bytes)
[utf-c | decompression] finished after 53226 µs
CPU: Intel Core i5-14400f (10c/16t)
SIMD: SSE2
RAM: 2x 8GB DDR4-3600
"👁👄👁" compression and decompression 50000x (12 bytes)
[flate2 | compression ] finished after 363896 µs (31 bytes)
[flate2 | decompression] finished after 179868 µs
[utf-c | compression ] finished after 1755 µs (7 bytes)
[utf-c | decompression] finished after 1887 µs
"טקסט זה נדחס עם UTF-C ו-GZIP ולאחר מכן הושווה. טקסט זה תורגם עם Google Translate ואנו מקווים שהוא תורגם כהלכה, אך אין ערובה לכך" compression and decompression 50000x (204 bytes)
[flate2 | compression ] finished after 517129 µs (160 bytes)
[flate2 | decompression] finished after 235639 µs
[utf-c | compression ] finished after 15883 µs (129 bytes)
[utf-c | decompression] finished after 18662 µs