feat(compression): update tooling to use DECODE operators #3257

rkuester · 2025-11-25T04:37:15Z

This is a draft PR for running CI tests on the full change. The commits
along this branch will be individually submitted for review.

See the linked issue for a description of the change.

BUG=implements #3256

Implement unified module for creating, reading, and modifying TFLite models with a clean API. The module eliminates manual index tracking and buffer management through automatic bookkeeping, supporting both declarative and imperative construction styles. Core design uses first-class Buffer objects that can be shared between tensors, with automatic deduplication during build. Tensors reference Buffers directly, matching the TFLite schema structure. The compiler automatically extracts inline tensor declarations, builds operator code tables, and handles index assignment according to TFLite conventions. Supports quantization parameters (per-tensor and per-channel), metadata key-value pairs, and read-modify-write workflows. The read() function preserves the object graph structure, enabling models to be read, modified, and rebuilt. Add comprehensive test coverage for core functionality, advanced features, quantization, and modification workflows.

…_editor Replace model_facade with model_editor in compress.py and tests. model_editor provides a cleaner API with better buffer and metadata handling. Buffers appended during compression are automatically indexed, and quantization parameters are accessed through a wrapper object. Update BUILD dependencies accordingly.

Remove model_facade module and its tests as they are superseded by model_editor.

…ess_test Replace dictionary-based test_models.build() with model_editor's declarative API. Add _build_test_model() function that uses model_editor to create the same test model more cleanly.

Remove test_models module and its tests as they are superseded by model_editor.

Add DecodeType class to replace raw integer decode_type field with named constants and factory methods. Provides predefined constants for built-in types (LUT, HUFFMAN, PRUNING) and a factory method for custom types (128-255). Custom types are automatically named with CUSTOM_{code} prefix for clarity in debugging. The class supports serialization via __int__() and comparison with both DecodeType objects and integers. Update DecodeCommonMetadata to use DecodeType and update tests to use named constants. Add decode module BUILD targets.

Add dataclass placeholders for future compression methods. These will be used by the plugin architecture to dispatch to compression-specific implementations.

Factor out compression method parsing into a dedicated function that dispatches on the YAML key. This enables parse_yaml to iterate over multiple compression methods per tensor and makes adding new compression types straightforward.

Define the plugin interface for compression methods. Each compressor implements the Compressor protocol with a compress() method that returns encoded data and ancillary data. CompressionError provides a common exception type for compression failures.

Extract LUT compression logic from compress.py into a dedicated plugin module. The LutCompressor class implements the Compressor protocol, producing packed indices and ancillary data in the format expected by the C++ DECODE kernel.

Add placeholder implementations that raise CompressionError when invoked. These validate the plugin architecture and will be replaced with working implementations later.

Implement graph modification to insert DECODE operators before consumers of compressed tensors. Each compressed tensor gets a DECODE operator with two inputs (encoded tensor and ancillary data tensor) and one output (decompressed tensor). Consumer operators are rewired to use the DECODE output.

Replace monolithic compression logic with a dispatch table that routes compression requests to plugin modules based on the spec's compression method type. After compressing tensors, insert DECODE operators into the model graph. The old metadata flatbuffer approach is removed in favor of the DECODE operator format.

The TFLM interpreter requires subgraph inputs/outputs to be set in the flatbuffer to know which tensors are model inputs and outputs. Without these, models built with model_editor cannot be executed. Add inputs and outputs fields to Subgraph dataclass, populate them in _compile_subgraph when building, and read them back in read().

Add tests that compress models with LUT compression, run them through the TFLM Python interpreter, and verify outputs match uncompressed originals. Also verify DECODE operators are inserted and that compressed models are smaller than originals. Tests only run when compression is enabled (--//:with_compression). Placeholder tests for Huffman and Pruning are skipped until implemented.

Add alt_decompression_memory_size parameter to the Python interpreter API. When non-zero, allocates a separate memory region for DECODE operator outputs and calls SetDecompressionMemory before AllocateTensors. SetDecompressionMemory stores a pointer to its initializer_list argument, requiring the list to outlive the interpreter. Per C++ standard, an initializer_list's backing array lifetime is only extended to match the list's when initialized in a declaration, not when assigned. This makes the API difficult to use correctly.

Add test for shared compressed tensors with alternate decompression memory. The test is marked expectedFailure to document the current mismatch between interpreter and DECODE insertion: the interpreter's alt decompression memory resets allocations for each DECODE, but the insertion code shares one DECODE output among all consumers. The workaround is to insert a separate DECODE before each consumer. The expectedFailure decorator should be removed once this is implemented.

…mory Insert a separate DECODE immediately before each consumer of a compressed tensor, rather than sharing one DECODE output among all consumers. The interpreter's alternate decompression memory resets its allocation offset for each DECODE's Prepare, causing all DECODE outputs to be allocated at the same address. If two consumers share one DECODE and another DECODE runs between them, the intervening DECODE overwrites the shared output, corrupting data for the second consumer. Update test expectations to reflect the new DECODE-per-consumer behavior and change the integration test from expected-failure to expected-pass.

Add tests demonstrating bugs in model_editor.read() when parsing models with None values for tensor shape, operator inputs, or operator outputs. These edge cases can occur in real models from the TFLite converter but cause TypeError crashes in the current implementation. Tests construct models using the low-level TFLite schema to reproduce these conditions. Marked as expectedFailure until the fix is applied.

The TFLite flatbuffer schema allows None values for tensor shape (representing scalars) and operator inputs/outputs (for certain ops). Handle these cases in read() to avoid TypeError when iterating. Remove expectedFailure decorators from edge case tests now that the fix is applied.

rkuester added the ci:run_full label Nov 25, 2025

TFLM-bot removed the ci:run_full label Nov 25, 2025

rkuester requested a review from ddavis-2015 November 25, 2025 13:56

rkuester added the ci:run_full label Nov 25, 2025

TFLM-bot removed the ci:run_full label Nov 25, 2025

rkuester added 19 commits December 2, 2025 08:35

chore(compression): remove model_facade.py and model_facade_test.py

fdf9c60

Remove model_facade module and its tests as they are superseded by model_editor.

refactor(compression): replace test_models with model_editor in compr…

dcb9577

…ess_test Replace dictionary-based test_models.build() with model_editor's declarative API. Add _build_test_model() function that uses model_editor to create the same test model more cleanly.

chore(compression): remove test_models.py and test_models_test.py

c252adf

Remove test_models module and its tests as they are superseded by model_editor.

docs(compression): add docstring to LookUpTableCompression

62e2805

feat(compression): add HuffmanCompression and PruningCompression types

e80725d

Add dataclass placeholders for future compression methods. These will be used by the plugin architecture to dispatch to compression-specific implementations.

feat(compression): add LUT compression plugin

69e8e50

Extract LUT compression logic from compress.py into a dedicated plugin module. The LutCompressor class implements the Compressor protocol, producing packed indices and ancillary data in the format expected by the C++ DECODE kernel.

feat(compression): add Huffman and Pruning plugin stubs

86136ed

Add placeholder implementations that raise CompressionError when invoked. These validate the plugin architecture and will be replaced with working implementations later.

rkuester force-pushed the feat-decode branch from cf78500 to 58cc12f Compare December 2, 2025 14:40

rkuester added the ci:run_full label Dec 2, 2025

TFLM-bot removed the ci:run_full label Dec 2, 2025

rkuester added 2 commits December 5, 2025 15:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(compression): update tooling to use DECODE operators #3257

feat(compression): update tooling to use DECODE operators #3257

Uh oh!

rkuester commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat(compression): update tooling to use DECODE operators #3257

Are you sure you want to change the base?

feat(compression): update tooling to use DECODE operators #3257

Uh oh!

Conversation

rkuester commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants