Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: model compression #2636

Closed
rkuester opened this issue Jul 24, 2024 · 8 comments
Closed

feature: model compression #2636

rkuester opened this issue Jul 24, 2024 · 8 comments
Assignees
Labels

Comments

@rkuester
Copy link
Contributor

rkuester commented Jul 24, 2024

An issue to track the implementation of model compression.

Merge Queue

  1. PRs from rkuester/feat-compression are open for approval and merging. Because of our one-commit-per-PR policy, there is typically only one PR open at a time.
  2. Commits along the branch rkuester/feat-compression-next are queued, so to speak, for submission as PRs. Looking at this branch might give context for the open PR in feat-compression, see above. Be aware—this branch is rebased often.
  3. The branch rkuester/compress-testing typically contains the final result once all PRs for the model compression feature are merged. This is the Oort cloud from which commits along feat-compression-next, which turn into PRs from feat-compression, are formed. Explore this branch or compare it to main to see the full model compression feature implementation and understand the queued PRs in context.
mergify bot pushed a commit that referenced this issue Jul 24, 2024
Add the Python distribution package `hexdump`, to be used in tests
and utilities which display raw memory.

BUG=#2636
mergify bot pushed a commit that referenced this issue Jul 24, 2024
Hoist the universally-useful tflite::Span out of
codegen/runtime/micro_codegen_context.h and into an independent header
usable from elsewhere in the project.

BUG=#2636
mergify bot pushed a commit that referenced this issue Aug 7, 2024
…tor (#2642)

Add a type, tflite::StaticVector, which behaves like std::vector, but
which avoids heap memory allocation.

BUG=#2636
Copy link
Contributor

github-actions bot commented Oct 3, 2024

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

@github-actions github-actions bot added the Stale label Oct 3, 2024
@rkuester
Copy link
Contributor Author

rkuester commented Oct 3, 2024

This task remains open; PRs and issues link to it for tracking purposes regularly.

@github-actions github-actions bot removed the Stale label Oct 4, 2024
Copy link
Contributor

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

@github-actions github-actions bot added the Stale label Oct 29, 2024
@rkuester rkuester removed the Stale label Oct 29, 2024
mergify bot pushed a commit that referenced this issue Nov 5, 2024
chore: remove obsolete ci/temp_patches

Remove ci/temp_patches, which was obsoleted in 23f608f once it
was no longer used by the sync script. It should have been
deleted then.

Remove it not only to clean up dead code, but because it contains
a reference to `micro_copts`, which is about to be refactored
away, and we don't want to leave stray references to it in the
tree.

BUG=#2636
rkuester added a commit to rkuester/tflite-micro that referenced this issue Nov 13, 2024
Remove micro_copts() by replacing every cc_* target that used
them with a tflm_cc_* equivalent, and setting those common copts
in one place, inside the tflm_cc_* macro.

This is the first of several commits introducing tflm_cc_* macros
in place of cc_binary, cc_library, and cc_test. Motivated by the
upcoming need to support conditional compilation, the objective
is to centralize build configuration rather than requiring (and
remembering that) each cc_* target in the project add the same
common attributes such as compiler options and select()ed

Alternatives such as setting global options on the command line
or in .bazelrc, even if simplified with a --config option, fail
to preserve flags and hooks for configuration in the case TFLM is
used as an external repository by an application project. Nor is
it easy in that case for individual targets to override an
otherwise global setting.

BUG=tensorflow#2636
rkuester added a commit to rkuester/tflite-micro that referenced this issue Nov 13, 2024
Replace cc_* targets remaining in TFLM code with tflm_cc_*
targets. These are targets which did not formerly use the common
copts. Avoid changing imported TFLite code, if for no other
reason than to avoid merge conflicts during the automatic sync
with upstream TFLite.

BUG=tensorflow#2636
rkuester added a commit to rkuester/tflite-micro that referenced this issue Nov 14, 2024
Replace cc_* targets remaining in TFLM code with tflm_cc_*
targets. These are targets which did not formerly use the common
copts. Avoid changing imported TFLite code, if for no other
reason than to avoid merge conflicts during the automatic sync
with upstream TFLite.

BUG=tensorflow#2636
mergify bot pushed a commit that referenced this issue Nov 14, 2024
#2765)

Remove micro_copts() by replacing every cc_* target that used
them with a tflm_cc_* equivalent, and setting those common copts
in one place, inside the tflm_cc_* macro.

This is the first of several commits introducing tflm_cc_* macros
in place of cc_binary, cc_library, and cc_test. Motivated by the
upcoming need to support conditional compilation, the objective
is to centralize build configuration rather than requiring (and
remembering that) each cc_* target in the project add the same
common attributes such as compiler options and select()ed

Alternatives such as setting global options on the command line
or in .bazelrc, even if simplified with a --config option, fail
to preserve flags and hooks for configuration in the case TFLM is
used as an external repository by an application project. Nor is
it easy in that case for individual targets to override an
otherwise global setting.

BUG=#2636
rkuester added a commit to rkuester/tflite-micro that referenced this issue Nov 15, 2024
Replace cc_* targets remaining in TFLM code with tflm_cc_*
targets. These are targets which did not formerly use the common
copts. Avoid changing imported TFLite code, if for no other
reason than to avoid merge conflicts during the automatic sync
with upstream TFLite.

BUG=tensorflow#2636
mergify bot pushed a commit that referenced this issue Nov 15, 2024
Replace cc_* targets remaining in TFLM code with tflm_cc_*
targets. These are targets which did not formerly use the common
copts. Avoid changing imported TFLite code, if for no other
reason than to avoid merge conflicts during the automatic sync
with upstream TFLite.

BUG=#2636
rkuester added a commit to rkuester/tflite-micro that referenced this issue Nov 15, 2024
Add tflite::hexdump() for printing raw memory to output streams.
Copy the output format of Python's hexdump module.

BUG=tensorflow#2636
rkuester added a commit to rkuester/tflite-micro that referenced this issue Nov 15, 2024
Add a flatbuffer schema for describing compressed models.
Flatbuffers with this schema are to be used as the value in a
.tflite model flatbuffer metadata field, and contain the extra
information necessary to describe a compressed model.

Include tests to ensure basic functionality and demonstrate
integration with C++, Python, and Bazel.

BUG=tensorflow#2636
mergify bot pushed a commit that referenced this issue Nov 15, 2024
Add tflite::hexdump() for printing raw memory to output streams.
Copy the output format of Python's hexdump module.

BUG=#2636
rkuester pushed a commit to rkuester/tflite-micro that referenced this issue Dec 15, 2024
Implement tensor decompression in op conv. Extend tests to
validate operation on compressed tensors.

BUG=part of tensorflow#2636
mergify bot pushed a commit that referenced this issue Dec 16, 2024
…3013)

Allocate resource variables in a persistent buffer when the input
tensor is compressed. Extend tests to validate operation.

BUG=part of #2636
mergify bot pushed a commit that referenced this issue Dec 16, 2024
…#3014)

Implement tensor decompression in op concatenation. Extend
tests to validate operation on compressed tensors.

BUG=part of #2636
mergify bot pushed a commit that referenced this issue Dec 16, 2024
Implement tensor decompression in op conv. Extend tests to
validate operation on compressed tensors.

BUG=part of #2636
rkuester pushed a commit to rkuester/tflite-micro that referenced this issue Dec 16, 2024
Implement tensor decompression in op depthwise conv. Extend tests
to validate operation on compressed tensors.

BUG=part of tensorflow#2636
rkuester pushed a commit to rkuester/tflite-micro that referenced this issue Dec 16, 2024
Implement tensor decompression in op transpose conv. Extend tests
to validate operation on compressed tensors.

BUG=part of tensorflow#2636
mergify bot pushed a commit that referenced this issue Dec 16, 2024
#3018)

Implement tensor decompression in op transpose conv. Extend tests
to validate operation on compressed tensors.

BUG=part of #2636
mergify bot pushed a commit that referenced this issue Dec 16, 2024
#3017)

Implement tensor decompression in op depthwise conv. Extend tests
to validate operation on compressed tensors.

BUG=part of #2636
rkuester pushed a commit to rkuester/tflite-micro that referenced this issue Dec 17, 2024
Clarify the usage of
`MicroContext::AllocateDecompressionScratchBuffer` and
`tflite::micro::GetTensorData` for handling decompressed tensor
data.

Add a section on alternate decompression memory regions,
explaining how to specify and use specialized memory for
decompression.

Update instructions for compressing models using a YAML
specification.

Simplify the model compression and alignment command examples.

Spin off new Generic Benchmark Application documentation.

BUG=part of tensorflow#2636
rkuester pushed a commit to rkuester/tflite-micro that referenced this issue Dec 17, 2024
Add instructions for using the tool with compressed models,
including profiling timing for decompression and alternate memory
regions. Update the tested targets list to include additional
Xtensa architectures. Provide example build and run commands for
compressed models with alternate decompression memory. Correct
typos and improve clarity in build instructions and example
outputs. Update compiler flags and example output to reflect
recent changes.

BUG=part of tensorflow#2636
rkuester pushed a commit to rkuester/tflite-micro that referenced this issue Dec 17, 2024
Add a check in `micro_allocator.cc` to verify the compression
metadata schema version. If the schema version in the metadata is
greater than the expected version, log a schema version mismatch
error and return a `nullptr`. This prevents potential issues
arising from using a newer, unsupported schema version.

BUG=part of tensorflow#2636
mergify bot pushed a commit that referenced this issue Dec 17, 2024
Clarify the usage of `MicroContext::AllocateDecompressionScratchBuffer`
and`tflite::micro::GetTensorData` for handling decompressed tensor
data.

Add a section on alternate decompression memory regions,
explaining how to specify and use specialized memory for
decompression.

Update instructions for compressing models using a YAML
specification.

Simplify the model compression and alignment command examples.

Spin off new Generic Benchmark Application documentation.

BUG=part of #2636
mergify bot pushed a commit that referenced this issue Dec 18, 2024
Add a check in `micro_allocator.cc` to verify the compression
metadata schema version. If the schema version in the metadata is
greater than the expected version, log a schema version mismatch
error and return a `nullptr`. This prevents potential issues
arising from using a newer, unsupported schema version.

BUG=part of #2636
mergify bot pushed a commit that referenced this issue Dec 18, 2024
Add instructions for using the tool with compressed models,
including profiling timing for decompression and alternate memory
regions. Update the tested targets list to include additional
Xtensa architectures. Provide example build and run commands for
compressed models with alternate decompression memory. Correct
typos and improve clarity in build instructions and example
outputs. Update compiler flags and example output to reflect
recent changes.

BUG=part of #2636
@tinskip
Copy link

tinskip commented Dec 18, 2024

Hi. Was this change intended for whether TFLM compression is or or off, or should it have been made conditional as well?

https://github.com/tensorflow/tflite-micro/blame/main/tensorflow/lite/micro/kernels/concatenation.cc#L189

@ddavis-2015
Copy link
Member

ddavis-2015 commented Dec 18, 2024

Hi. Was this change intended for whether TFLM compression is or or off, or should it have been made conditional as well?

https://github.com/tensorflow/tflite-micro/blame/main/tensorflow/lite/micro/kernels/concatenation.cc#L189

@tinskip This change is intentional and is there regardless of LRTM conditional compression compilation. The change updates the LRTM (TFLM) code to match the LiteRT (TfLite) reference implementation.

rkuester added a commit to rkuester/tflite-micro that referenced this issue Dec 19, 2024
…sors

Compress using a single value table when a tensor is per-tensor
quantized, as indicated by the presence of only one quantization
scale and zero point. Update unit tests accordingly and augment
`test_models` to accommodate additional quantization fields.

Abandon the logic that a tensor should be compressed along the
NHWC channel dimension if the quantization parameters do not
specify an axis. Instead, fail with an error if the compression
axis cannot be inferred from the quantization parameters.

The interpreter already expects a single value table when a
tensor is per-tensor quantized.

BUG=part of tensorflow#2636
mergify bot pushed a commit that referenced this issue Dec 19, 2024
…sors (#3025)

Compress using a single value table when a tensor is per-tensor
quantized, as indicated by the presence of only one quantization
scale and zero point. Update unit tests accordingly and augment
`test_models` to accommodate additional quantization fields.

Abandon the logic that a tensor should be compressed along the
NHWC channel dimension if the quantization parameters do not
specify an axis. Instead, fail with an error if the compression
axis cannot be inferred from the quantization parameters.

The interpreter already expects a single value table when a
tensor is per-tensor quantized.

BUG=part of #2636
Copy link
Contributor

"This issue is being marked as stale due to inactivity. Remove label or comment to prevent closure in 5 days."

@github-actions github-actions bot added the Stale label Jan 13, 2025
@rkuester
Copy link
Contributor Author

Closing, as the initial implementation has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants