Skip to content

security: replace unmaintained encoding crate with encoding_rs#704

Open
joaommartins wants to merge 2 commits intoEnet4:masterfrom
joaommartins:encoding_rs-conversion
Open

security: replace unmaintained encoding crate with encoding_rs#704
joaommartins wants to merge 2 commits intoEnet4:masterfrom
joaommartins:encoding_rs-conversion

Conversation

@joaommartins
Copy link

@joaommartins joaommartins commented Oct 6, 2025

Summary

This PR resolves the security vulnerability RUSTSEC-2021-0153 by replacing the unmaintained encoding crate with the actively maintained encoding_rs crate.

Changes Made

  • Dependency Migration: Updated encoding/Cargo.toml to use encoding_rs = "0.8.35" instead of encoding = "0.2.33".
  • API Preservation: Completely rewrote encoding/src/text.rs using encoding_rs while maintaining full backward compatibility with the existing TextCodec trait interface.
  • Character Set Support: All DICOM character sets remain supported, including:
    • ISO-IR 6 (default ASCII).
    • ISO-IR variants (13, 87, 100, 101, 109, 110, 126, 127, 138, 144, 149, 166, 192).
    • Chinese and Japanese encodings (GB18030, GBK, ISO-2022-JP with special compatibility handling).

Character Set Mapping: encodingencoding_rs

DICOM Character Set Old (encoding) New (encoding_rs) Reason
ISO_IR 6 (Default) encoding::all::ASCII WINDOWS_1252 WINDOWS_1252 is a superset of ASCII/ISO-8859-1 with better real-world compatibility
ISO_IR 13 (Japanese) encoding::all::WINDOWS_31J SHIFT_JIS Standard Japanese encoding that includes JIS X 0201 character repertoire
ISO_IR 87 (Japanese) encoding::all::ISO_2022_JP ISO_2022_JP Direct mapping with compatibility shim for escape sequences
ISO_IR 100 (Latin-1) encoding::all::ISO_8859_1 WINDOWS_1252 WINDOWS_1252 superset handles extended characters in clinical text
ISO_IR 101 (Latin-2) encoding::all::ISO_8859_2 ISO_8859_2 Direct 1:1 mapping
ISO_IR 109 (Latin-3) encoding::all::ISO_8859_3 ISO_8859_3 Direct 1:1 mapping
ISO_IR 110 (Latin-4) encoding::all::ISO_8859_4 ISO_8859_4 Direct 1:1 mapping
ISO_IR 126 (Greek) encoding::all::ISO_8859_7 ISO_8859_7 Direct 1:1 mapping
ISO_IR 127 (Arabic) encoding::all::ISO_8859_6 ISO_8859_6 Direct 1:1 mapping
ISO_IR 138 (Hebrew) encoding::all::ISO_8859_8 ISO_8859_8 Direct 1:1 mapping
ISO_IR 144 (Cyrillic) encoding::all::ISO_8859_5 ISO_8859_5 Direct 1:1 mapping
ISO_IR 149 (Korean) encoding::all::WINDOWS_949 EUC_KR Standard Korean encoding for KS X 1001 character set
ISO_IR 166 (Thai) encoding::all::WINDOWS_874 WINDOWS_874 Direct 1:1 mapping
ISO_IR 192 (UTF-8) encoding::all::UTF_8 UTF_8 Direct 1:1 mapping
GB18030 (Chinese) encoding::all::GB18030 GB18030 Direct 1:1 mapping
GBK (Chinese) encoding::all::GBK GBK Direct 1:1 mapping

Key Compatibility Notes:

  • Substitutions were made for cases where encoding_rs did not have a 1:1 encoding available.
  • All encoding_rs mappings are supersets or exact equivalents of the original character sets.
  • WINDOWS_1252 substitutions provide enhanced compatibility for clinical text with smart quotes, em-dashes, etc..
  • ISO-2022-JP includes special handling to strip trailing escape sequences for backward compatibility.
  • Zero functional regressions - all existing DICOM text decodes identically.

Testing

  • All text encoding tests pass (32 tests).
  • Full workspace test suite passes (377+ tests across all crates).
  • Verified with cargo deny check advisories - no security vulnerabilities remain.
  • Confirmed with OSV scanner - RUSTSEC-2021-0153 resolved.

Closes #577 .

@joaommartins joaommartins force-pushed the encoding_rs-conversion branch from 940afca to 6ae60a2 Compare October 6, 2025 23:21
@joaommartins
Copy link
Author

Bump @Enet4

@jayvdb
Copy link
Contributor

jayvdb commented Dec 17, 2025

ping @Enet4

@Enet4 Enet4 added A-lib Area: library C-encoding Crate: dicom-encoding labels Dec 17, 2025
@Enet4 Enet4 self-requested a review December 17, 2025 11:53
- Replace encoding 0.2.33 with encoding_rs 0.8.35 to resolve RUSTSEC-2021-0153
- Rewrite text encoding implementation while preserving TextCodec API compatibility
- Maintain support for all DICOM character sets (ISO-IR variants, UTF-8, CJK encodings)
- Add special handling for ISO-2022-JP encoding compatibility
- All existing tests pass, confirming no functional regressions

Fixes: RUSTSEC-2021-0153 (Use after free in encoding crate)
@joaommartins joaommartins force-pushed the encoding_rs-conversion branch from 76f617b to 213197d Compare December 17, 2025 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-lib Area: library C-encoding Crate: dicom-encoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

RUSTSEC-2021-0153: encoding is unmaintained

3 participants