Skip to content

Commit

Permalink
Squashed commit of the following:
Browse files Browse the repository at this point in the history
commit df29a4c
Author: ddavis-2015 <[email protected]>
Date:   Wed Oct 23 03:47:27 2024 -0700

    header file cleanup.

commit 7dc34a9
Author: ddavis-2015 <[email protected]>
Date:   Mon Oct 21 17:16:24 2024 -0700

    update to latest Cadence decompression code.

commit b43c16c
Author: ddavis-2015 <[email protected]>
Date:   Mon Oct 21 05:05:15 2024 -0700

    fix code style errors.

commit 2d825e3
Author: ddavis-2015 <[email protected]>
Date:   Mon Oct 21 04:09:39 2024 -0700

    fix code style errors.
    fix BUILD file style errors.

commit 459569a
Author: ddavis-2015 <[email protected]>
Date:   Sun Oct 20 19:17:43 2024 -0700

    use kernel optimzer level -O3 and -LNO:simd for Xtensa HIFI5

commit 122db20
Author: ddavis-2015 <[email protected]>
Date:   Sun Oct 20 19:07:19 2024 -0700

    add compression build/test to bazel default test script

commit d96b614
Author: ddavis-2015 <[email protected]>
Date:   Sun Oct 20 18:56:31 2024 -0700

    fix CI code style errors
    fix CI BUILD file style errors

commit 821dfdf
Author: ddavis-2015 <[email protected]>
Date:   Sun Oct 20 12:19:34 2024 -0700

    Cleanup header file usage.
    Add decompression code to Bazel BUILD files

commit 4a02b22
Author: ddavis-2015 <[email protected]>
Date:   Sat Oct 19 13:31:11 2024 -0700

    cleanup

commit 3d765e6
Author: ddavis-2015 <[email protected]>
Date:   Fri Oct 18 18:01:21 2024 -0700

    Squashed commit of the following:

    commit eaee851
    Author: ddavis-2015 <[email protected]>
    Date:   Fri Oct 18 17:48:48 2024 -0700

        Squashed commit of the following:

        commit 4894265
        Author: ddavis-2015 <[email protected]>
        Date:   Fri Oct 18 17:48:05 2024 -0700

            pre-merge empty commit

        commit a110e41
        Author: ddavis-2015 <[email protected]>
        Date:   Fri Oct 18 16:17:13 2024 -0700

            fix C++ bitwidth 6 & 7 decompression

        commit efedcc2
        Author: ddavis-2015 <[email protected]>
        Date:   Fri Oct 18 10:18:50 2024 -0700

            working decompression unit test

        commit 81ecf2e
        Author: ddavis-2015 <[email protected]>
        Date:   Thu Oct 17 18:17:06 2024 -0700

            decompression unit test improvements

        commit b318421
        Author: ddavis-2015 <[email protected]>
        Date:   Wed Oct 16 17:34:09 2024 -0700

            add decompression unit test

        commit 9bb2b63
        Author: ddavis-2015 <[email protected]>
        Date:   Sun Oct 13 18:34:01 2024 -0700

            cleanup

        commit 77bb05d
        Author: ddavis-2015 <[email protected]>
        Date:   Sun Oct 13 18:29:33 2024 -0700

            align compressed tensor data as per schema

        commit ad2b1c3
        Author: ddavis-2015 <[email protected]>
        Date:   Sat Oct 12 22:35:54 2024 -0700

            reduce HIFI5 decompression code size

        commit 99c6e35
        Author: ddavis-2015 <[email protected]>
        Date:   Fri Oct 11 14:02:58 2024 -0700

            revert to original Cadence bit width 4 code

        commit 2388549
        Author: ddavis-2015 <[email protected]>
        Date:   Thu Oct 10 17:50:29 2024 -0700

            refactor decompression code into reference and platform specific
            Apply some Xtensa acceleration code changes

        commit b84853c
        Author: ddavis-2015 <[email protected]>
        Date:   Tue Oct 8 16:08:55 2024 -0700

            testing

    commit c107f42
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 17 14:31:03 2024 -0500

        refactor: move misplaced TF_LITE_REMOVE_VIRTUAL_DELETEs to private:

        Move several TF_LITE_REMOVE_VIRTUAL_DELETE declarations that are
        wrongly in a public section of their classes. To have the intended
        effect, as documented in t/l/m/compatibility.h, these must be in a
        private section.

    commit 7b3a2bd
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 17 12:36:46 2024 -0500

        build(bazel): always build with TF_LITE_STATIC_MEMORY

        Add TF_LITE_STATIC_MEMORY to the defines set globally for TFLM builds in
        Bazel. TFLM always builds with this set in Make, and it appears to have
        been an oversight that it wasn't set during Bazel builds. Not having it
        set in Bazel caused some unit tests to pass under Bazel that failed
        under Make.

        At the same time, add -fno-exceptions. This flag is also always set in
        Make builds. Without it, setting TF_LITE_STATIC_MEMORY breaks the build.
        TF_LITE_STATIC_MEMORY triggers TF_LITE_REMOVE_VIRTUAL_DELETE in
        t/l/m/compatibility.h, which makes operator delete private in certain
        classes. When exceptions are enabled, a placement new with those classes
        is allowed to throw an exception, and operator delete is implicitly
        called during the unwind. The build breaks because operator delete can't
        be called if it's private. Disabling exceptions eliminates the unwind
        code that calls operator delete implicitly, and thus the build succeeds.

        In any case, -fno-exceptions should have been used in Bazel builds,
        matching the flags used in Make and the no-exceptions design requirement
        of the TFLM project.

    commit 1eb4e0d
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 17 11:05:45 2024 -0500

        feat(python): don't check .sparsity in interpreter

        Remove the check for sparse tensors in the Python interpreter wrapper.
        This fixes a broken build when TF_LITE_STATIC_MEMORY is set, which
        should always be the case in TFLM. TfLiteTensor objects don't have a
        .sparsity member when TF_LITE_STATIC_MEMORY is set.

        This prepares for an upcoming commit setting TF_LITE_STATIC_MEMORY
        during Bazel builds. This hasn't caused build failures in Make builds,
        which have always set TF_LITE_STATIC_MEMORY, because Make builds don't
        build the Python interpreter wrapper.

    commit 7217095
    Author: Ryan Kuester <[email protected]>
    Date:   Wed Oct 16 14:03:25 2024 -0500

        fix(memory_arena_threshold): with TF_LITE_STATIC_MEMORY

        Fix the broken build due to redefinition of the threshold when
        TF_LITE_STATIC_MEMORY is set. Apparently this case isn't triggered in
        any Bazel test, only in Make.

        Simplify the threshold specification by only depending on whether
        compression is enabled and not also on whether TF_LITE_STATIC_MEMORY is
        in use.

    commit 8e4e55e
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 10 12:38:03 2024 -0500

        build(bazel): disable codegen when building --//:with_compression

        The codegen prototype code is not compatible with the changes which
        implement model compression made to the core TFLM components. For now,
        disable codegen targets when building with compression enabled.

    commit 884a234
    Author: Ryan Kuester <[email protected]>
    Date:   Tue Oct 15 18:31:01 2024 -0500

        build(bazel): compile in compression when --//:with_compression

        Conditionally compile in support for compressed tensors when the option
        --//:with_compression is given.

    commit a1d459b
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 10 12:28:39 2024 -0500

        build(bazel): add --//with_compression build setting

        Add a --//with_compression user-defined build setting and a
        corresponding configuration setting.

    commit 4edc564
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 10 12:24:53 2024 -0500

        build(bazel): fix compression-related dependencies of micro_allocator

    commit a52f97f
    Author: Ryan Kuester <[email protected]>
    Date:   Tue Oct 15 17:28:09 2024 -0500

        build(bazel): replace cc_* with tflm_cc_* in remaining TFLM code

        Replace cc_* targets remaining in TFLM code with tflm_cc_* targets.
        These are targets which did not formerly use the common copts. Avoid
        changing imported TFLite code, if for no other reason than to avoid
        merge conflicts during the automatic sync with upstream TFLite.

    commit a6368f4
    Author: Ryan Kuester <[email protected]>
    Date:   Fri Oct 11 16:08:34 2024 -0500

        build(bazel): introduce tflm_cc_* macros, refactoring away micro_copts

        Remove micro_copts() by replacing every cc_* target that used
        them with a tflm_cc_* equivalent, and setting those common copts in one
        place, inside the tflm_cc_* macro.

        This is the first of several commits introducing tflm_cc_* macros in
        place of cc_binary, cc_library, and cc_test. Motivated by the upcoming
        need to support conditional compilation, the objective is to centralize
        build configuration rather than requiring (and remembering that) each
        cc_* target in the project add the same common attributes such as
        compiler options and select()ed #defines.

        Alternatives such as setting global options on the command line or in
        .bazelrc, even if simplified with a --config option, fail to preserve
        flags and hooks for configuration in the case TFLM is used as an
        external repository by an application project. Nor is it easy in that
        case for individual targets to override an otherwise global setting.

    commit 1518422
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 10 23:56:49 2024 -0500

        chore: remove obsolete ci/temp_patches

        Remove ci/temp_patches, which was obsoleted in 23f608f once it
        was no longer used by the sync script. It should have been
        deleted then.

        Remove it not only to clean up dead code, but because it contains
        a reference to `micro_copts`, which is about to be refactored
        away, and we don't want to leave stray references to it in the
        tree.

    commit 18ef080
    Author: Ryan Kuester <[email protected]>
    Date:   Tue Oct 8 17:58:12 2024 -0500

        refactor: use metadata_saved.h instead of metadata_generated.h

        Use the generated file metadata_saved.h instead of metadata_generated.h
        for the reasons explained in t/l/m/compression/BUILD:metadata_saved.
        Delete metadata_generated.h from the source tree as it is not
        maintained.

    commit 5a02e30
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 10 13:46:46 2024 -0500

        test(memory_arena_threshold): adjust expected value with compression

        Fix a test failure by setting a different expected value for the
        persistent buffer allocation when compression is configured in. The
        allocation was allowed to vary by 3%; however, compression adds ~10%.
        Set the expected value to the measured value when compression is
        configured in.

    commit 01bc582
    Author: Ryan Kuester <[email protected]>
    Date:   Thu Oct 10 13:35:10 2024 -0500

        test(memory_arena_threshold): don't expect exact allocation values

        Remove the check for allocation sizes to exactly match expected values.
        This check immediately followed--and thus rendered pointless---a check
        that sizes are within a certain percentage, which seems to be the true
        intent of the test.

    commit e0aae77
    Merge: e328029 e86d97b
    Author: Ryan Kuester <[email protected]>
    Date:   Wed Oct 16 13:39:56 2024 -0500

        Merge branch 'main' into compress-testing

    commit e328029
    Author: Ryan Kuester <[email protected]>
    Date:   Mon Oct 7 12:52:23 2024 -0500

        build(bazel): fix dependencies in work-in-progress compression code

        In the Bazel build, add dependencies needed by the code added to
        t/l/m:micro_context for decompression. The Bazel build with or without
        compression was broken without this.

    commit e86d97b
    Author: RJ Ascani <[email protected]>
    Date:   Mon Oct 7 10:36:26 2024 -0700

        Replace rascani with suleshahid on OWNERS (tensorflow#2715)

        BUG=none

    commit b773428
    Author: Ryan Kuester <[email protected]>
    Date:   Fri Oct 4 09:59:10 2024 -0500

        feat(compression): add work-in-progress compression and viewer tools

    commit f6bd486
    Merge: 487c17a e3f6dc1
    Author: Ryan Kuester <[email protected]>
    Date:   Fri Oct 4 09:36:24 2024 -0500

        Merge branch 'main' into compress-prerelease

    commit e3f6dc1
    Author: David Davis <[email protected]>
    Date:   Thu Oct 3 10:45:00 2024 -0700

        Compression documentation (tensorflow#2711)

        @tensorflow/micro

        Add documentation describing some compression/decompression internals and makefile build procedures.

        bug=tensorflow#2710

    commit b3967a9
    Author: Ryan Kuester <[email protected]>
    Date:   Wed Oct 2 13:36:01 2024 -0500

        style: add .style.yapf to control yapf styling of Python code (tensorflow#2709)

        Add a .style.yapf file so yapf can be used to style Python code without
        passing the project's style via command line option. Remove the
        corresponding patch to pigweed's call to yapf, used by CI, and instead
        let it too rely on .style.yapf. Remove the developer documentation's
        instruction to use the command line option.

        BUG=description

    commit d249577
    Author: Ryan Kuester <[email protected]>
    Date:   Tue Oct 1 16:16:45 2024 -0500

        build(codegen): suppress noise in console output (tensorflow#2708)

        Add a --quiet option to the code_generator binary so that when it's used
        within the build system, it doesn't print unexpected, distracting noise
        to the console. Generally, compiler or generator commands don't print
        output unless there's an error.

        BUG=description

commit 4894265
Author: ddavis-2015 <[email protected]>
Date:   Fri Oct 18 17:48:05 2024 -0700

    pre-merge empty commit

commit a110e41
Author: ddavis-2015 <[email protected]>
Date:   Fri Oct 18 16:17:13 2024 -0700

    fix C++ bitwidth 6 & 7 decompression

commit efedcc2
Author: ddavis-2015 <[email protected]>
Date:   Fri Oct 18 10:18:50 2024 -0700

    working decompression unit test

commit 81ecf2e
Author: ddavis-2015 <[email protected]>
Date:   Thu Oct 17 18:17:06 2024 -0700

    decompression unit test improvements

commit b318421
Author: ddavis-2015 <[email protected]>
Date:   Wed Oct 16 17:34:09 2024 -0700

    add decompression unit test

commit 9bb2b63
Author: ddavis-2015 <[email protected]>
Date:   Sun Oct 13 18:34:01 2024 -0700

    cleanup

commit 77bb05d
Author: ddavis-2015 <[email protected]>
Date:   Sun Oct 13 18:29:33 2024 -0700

    align compressed tensor data as per schema

commit ad2b1c3
Author: ddavis-2015 <[email protected]>
Date:   Sat Oct 12 22:35:54 2024 -0700

    reduce HIFI5 decompression code size

commit 99c6e35
Author: ddavis-2015 <[email protected]>
Date:   Fri Oct 11 14:02:58 2024 -0700

    revert to original Cadence bit width 4 code

commit 2388549
Author: ddavis-2015 <[email protected]>
Date:   Thu Oct 10 17:50:29 2024 -0700

    refactor decompression code into reference and platform specific
    Apply some Xtensa acceleration code changes

commit b84853c
Author: ddavis-2015 <[email protected]>
Date:   Tue Oct 8 16:08:55 2024 -0700

    testing
  • Loading branch information
ddavis-2015 committed Oct 23, 2024
1 parent 052a6b8 commit b7bc438
Show file tree
Hide file tree
Showing 4 changed files with 255 additions and 53 deletions.
4 changes: 0 additions & 4 deletions tensorflow/lite/micro/compression/BUILD
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
load("//tensorflow/lite/micro:build_def.bzl",
"tflm_cc_library",
"tflm_cc_test",
)
load(
"//tensorflow/lite/micro:build_def.bzl",
"tflm_cc_library",
Expand Down
5 changes: 5 additions & 0 deletions tensorflow/lite/micro/kernels/decompress.h
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ See the License for the specific language governing permissions and
limitations under the License.
==============================================================================*/

#ifndef TENSORFLOW_LITE_MICRO_MICRO_KERNELS_DECOMPRESS_H_
#define TENSORFLOW_LITE_MICRO_MICRO_KERNELS_DECOMPRESS_H_

#include <cstdint>

#include "tensorflow/lite/micro/compression.h"
Expand Down Expand Up @@ -82,3 +85,5 @@ struct DecompressionState {
#endif // USE_TFLM_COMPRESSION

} // namespace tflite

#endif // TENSORFLOW_LITE_MICRO_MICRO_KERNELS_DECOMPRESS_H_
295 changes: 250 additions & 45 deletions tensorflow/lite/micro/kernels/xtensa/decompress.cc
Original file line number Diff line number Diff line change
Expand Up @@ -43,16 +43,15 @@ struct DecompressionStateXtensa : DecompressionState {
: DecompressionState(other) {}

void DecompressToBufferWidth4_Xtensa(int8_t* buffer);
void DecompressToBufferWidth4_Xtensa_Old(int8_t* buffer);
void DecompressToBufferWidth3_Xtensa(int8_t* buffer);
void DecompressToBufferWidth2_Xtensa(int8_t* buffer);

void DecompressToBufferWidthAnyInt8_Xtensa(int8_t* buffer);
void DecompressToBufferWidthAnyInt16_Xtensa(int16_t* buffer);
void DecompressToBufferWidthAnyInt32_Xtensa(int32_t* buffer);
void DecompressToBufferWidthAnyInt64_Xtensa(int64_t* buffer);
};

// TODO(ddavis-2015): unaligned/stride code has error, method not currently
// used.
void DecompressionStateXtensa::DecompressToBufferWidth4_Xtensa(int8_t* buffer) {
ScopedMicroProfiler scoped_profiler(__func__, micro_profiler_);

Expand All @@ -76,6 +75,8 @@ void DecompressionStateXtensa::DecompressToBufferWidth4_Xtensa(int8_t* buffer) {

const uint8_t* __restrict value_table_t = value_table;

ae_valignx2 align_store = AE_ZALIGN128();

for (size_t i = 0; i < num_channels_; i++) {
value_table_t = value_table;
ae_valignx2 align_vtab = AE_LA128_PP(value_table_t);
Expand All @@ -84,7 +85,6 @@ void DecompressionStateXtensa::DecompressToBufferWidth4_Xtensa(int8_t* buffer) {
AE_DSEL8X8(d_value_0, d_value_1, d_value_0_t, d_value_1_t,
d_shuffle_value_t);

ae_valignx2 align_store = AE_ZALIGN128();
ae_valign align_load = AE_LA64_PP(pIn_tmp);

for (j = 0; j < elements_per_channel_t_by_4; j++) {
Expand All @@ -95,57 +95,257 @@ void DecompressionStateXtensa::DecompressToBufferWidth4_Xtensa(int8_t* buffer) {
}

value_table += stride;

ae_valignx2 align_index = AE_LA128_PP(pIn_tmp);
AE_LAV8X8X2_XP(d_index, d_dummy, align_index, (ae_int8x16*)pIn_tmp,
(elements_per_channel_t_rem >>
1)); /* Loading 48 bits for decoding 16 weight values */
AE_DSEL8X8(d_out1, d_out2, d_value_0, d_value_1, d_index);
AE_DSEL8X8(d_out1, d_out2, d_out1, d_out2, d_shuffle_t);
AE_SAV8X8X2_XP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp,
elements_per_channel_t_rem);
AE_SA128POS_FP(align_store, (ae_int8x16*)p_out_tmp);
if (elements_per_channel_t_rem) {
ae_valignx2 align_index = AE_LA128_PP(pIn_tmp);
AE_LAV8X8X2_XP(d_index, d_dummy, align_index, (ae_int8x16*)pIn_tmp,
(elements_per_channel_t_rem >>
1)); /* Loading 48 bits for decoding 16 weight values */
AE_DSEL8X8(d_out1, d_out2, d_value_0, d_value_1, d_index);
AE_DSEL8X8(d_out1, d_out2, d_out1, d_out2, d_shuffle_t);
AE_SAV8X8X2_XP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp,
elements_per_channel_t_rem);
}
}
AE_SA128POS_FP(align_store, (ae_int8x16*)p_out_tmp);
}

void DecompressionStateXtensa::DecompressToBufferWidth4_Xtensa_Old(
int8_t* buffer) {
void DecompressionStateXtensa::DecompressToBufferWidth3_Xtensa(int8_t* buffer) {
ScopedMicroProfiler scoped_profiler(__func__, micro_profiler_);

char shuffle_pattern_1[8] = {0x08, 0x19, 0x2A, 0x3B, 0x4C, 0x5D, 0x6E, 0x7F};
ae_int8x8 d_shuffle_t = *(ae_int8x8*)&shuffle_pattern_1[0];
int i, j;
ae_int8* __restrict p_out_tmp = (ae_int8*)buffer;
ae_int8x8* pIn_tmp = (ae_int8x8*)compressed_indices_;
const uint8_t* __restrict value_table =
static_cast<const uint8_t*>(comp_data_.data.lut_data->value_table);

const uint8_t* __restrict value_table_t = value_table;

char shuffle_pattern_2[8] = {0xFB, 0x73, 0xEA, 0x62, 0xD9, 0x51, 0xC8, 0x40};
ae_int8x8 d_d_shuffle_t2 = *(ae_int8x8*)&shuffle_pattern_2[0];
int num_channels_t = num_channels_;
const size_t stride = comp_data_.data.lut_data->value_table_channel_stride;

int elements_per_channel_t_by_4 = elements_per_channel_ >> 4;
int elements_per_channel_t_rem = elements_per_channel_ & 0xF;

ae_int8x8 d_index, d_dummy;
ae_int8x8 d1, d2, d3, d4, d5, d6, d7, d8, d9, d10, d11;
ae_int8x8 d_out1, d_out2;
ae_int8x8 d_value_0, d_value_1;
ae_int8x8 d_index;

int elements_per_channel_t = elements_per_channel_;
int num_channels_t = num_channels_;
ae_int8x8* __restrict pIn_tmp = (ae_int8x8*)compressed_indices_;
ae_int8* __restrict p_out_tmp = (ae_int8*)buffer;
ae_valignx2 align_index = AE_LA128_PP(pIn_tmp);

const size_t stride = comp_data_.data.lut_data->value_table_channel_stride;
ae_int8x8 d_shuffle_value_t = AE_MOVINT8X8_FROMINT64(0x08192A3B4C5D6E7FLL);
ae_int8x8 d_shuffle_t1 = AE_MOVINT8X8_FROMINT64(0x0F00050C00020000LL);
ae_int8x8 d_shuffle_t2 = AE_MOVINT8X8_FROMINT64(0x000E00040B000100LL);
ae_int8x8 d_shuffle_t3 = AE_MOVINT8X8_FROMINT64(0x0F060D040C030A01LL);
ae_int8x8 d_shuffle_t = AE_MOVINT8X8_FROMINT64(0xFB73EA62D951C840LL);

ae_valignx2 align_store = AE_ZALIGN128();

for (i = 0; i < num_channels_t; i++) {
ae_int8x8 d_value_0 = AE_MOVINT8X8_FROMINT64(AE_ZERO());
ae_int8x8 d_value_1 = AE_MOVINT8X8_FROMINT64(AE_ZERO());

value_table_t = value_table;

ae_valign align_vtab = AE_LA64_PP(value_table_t);
AE_LA8X8_IP(d_value_0, align_vtab, (ae_int8x8*)value_table_t);
AE_DSEL8X8(d_value_0, d_value_1, d_value_0, d_value_1, d_shuffle_value_t);

for (j = 0; j < elements_per_channel_t_by_4; j++) {
AE_LAV8X8X2_XP(d_index, d_dummy, align_index, (ae_int8x16*)pIn_tmp,
6); /* Loading 48 bits for decoding 16 weight values */

d1 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 1));
d2 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 2));
d3 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 3));
d4 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 4));

d1 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d1), 0x7007007007000000LL));
d2 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d2), 0x0700700700700000LL));
d3 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d3), 0x0070070070070000LL));
d4 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d4), 0x0007007007007000LL));

d5 = d1 | d2;
d6 = d3 | d4;

d7 = AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d5), 4));
d8 = AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d6), 4));

d9 = AE_SEL8X8(d5, d7, d_shuffle_t1);
d10 = AE_SEL8X8(d6, d8, d_shuffle_t2);
d11 = AE_SEL8X8(d9, d10, d_shuffle_t3);

AE_DSEL8X8(d_out1, d_out2, d_value_0, d_value_1, d11);
AE_DSEL8X8(d_out1, d_out2, d_out1, d_out2, d_shuffle_t);

AE_SA8X8X2_IP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp);
}
if (elements_per_channel_t_rem) {
AE_LAV8X8X2_XP(d_index, d_dummy, align_index, (ae_int8x16*)pIn_tmp,
3); /* Loading 48 bits for decoding 16 weight values */

d1 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 1));
d2 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 2));
d3 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 3));
d4 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 4));

d1 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d1), 0x7007007007000000LL));
d2 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d2), 0x0700700700700000LL));
d3 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d3), 0x0070070070070000LL));
d4 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d4), 0x0007007007007000LL));

d5 = d1 | d2;
d6 = d3 | d4;

d7 = AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d5), 4));
d8 = AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d6), 4));

d9 = AE_SEL8X8(d5, d7, d_shuffle_t1);
d10 = AE_SEL8X8(d6, d8, d_shuffle_t2);
d11 = AE_SEL8X8(d9, d10, d_shuffle_t3);

AE_DSEL8X8(d_out1, d_out2, d_value_0, d_value_1, d11);
AE_DSEL8X8(d_out1, d_out2, d_out1, d_out2, d_shuffle_t);

AE_SAV8X8X2_XP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp,
elements_per_channel_t_rem);
}

value_table = value_table + stride;
}
AE_SA128POS_FP(align_store, (ae_int8x16*)p_out_tmp);
}

void DecompressionStateXtensa::DecompressToBufferWidth2_Xtensa(int8_t* buffer) {
ScopedMicroProfiler scoped_profiler(__func__, micro_profiler_);

int i, j;
ae_int8* __restrict p_out_tmp = (ae_int8*)buffer;
ae_int8x8* pIn_tmp = (ae_int8x8*)compressed_indices_;
const uint8_t* __restrict value_table =
static_cast<const uint8_t*>(comp_data_.data.lut_data->value_table);

for (int i = 0; i < num_channels_t; i++) {
ae_int8x8 d_value_0_t = *(ae_int8x8*)&value_table[0];
ae_int8x8 d_value_1_t = *(ae_int8x8*)&value_table[8];
const uint8_t* __restrict value_table_t = value_table;

AE_DSEL8X8(d_value_0, d_value_1, d_value_0_t, d_value_1_t, d_shuffle_t);
int num_channels_t = num_channels_;
const size_t stride = comp_data_.data.lut_data->value_table_channel_stride;

for (int j = 0; j < elements_per_channel_t; j += 16) {
AE_L8X8_IP(d_index, pIn_tmp, 8);
AE_DSEL8X8(d_out1, d_out2, d_value_0, d_value_1, d_index);
AE_DSEL8X8(d_out1, d_out2, d_out1, d_out2, d_d_shuffle_t2);
AE_S8X8X2_IP(d_out1, d_out2, (ae_int8x16*)p_out_tmp, 16);
int elements_per_channel_t_by_5 = elements_per_channel_ >> 5;
int elements_per_channel_t_rem = elements_per_channel_ & 0x1F;
int elements_per_channel_t_rem_minus_16 = 0;
if (elements_per_channel_t_rem > 16) {
elements_per_channel_t_rem_minus_16 = elements_per_channel_t_rem - 16;
}

ae_int8x8 d_index, d_dummy;
ae_int8x8 d0, d1, d2, d3, d4, d5;
ae_int8x8 q0, q1, q2, q3;
ae_int8x8 d_out1, d_out2;

ae_valignx2 align_index = AE_LA128_PP(pIn_tmp);

ae_int8x8 d_shuffle_value_t = AE_MOVINT8X8_FROMINT64(0x08192A3B4C5D6E7FLL);
ae_int8x8 d_shuffle_t1 = AE_MOVINT8X8_FROMINT64(0xFB73EA62D951C840LL);
ae_int8x8 d_shuffle_t2 = AE_MOVINT8X8_FROMINT64(0xFBEA7362D9C85140LL);

ae_valignx2 align_store = AE_ZALIGN128();

for (i = 0; i < num_channels_t; i++) {
ae_int8x8 d_value_0 = AE_MOVINT8X8_FROMINT64(AE_ZERO());
ae_int8x8 d_value_1 = AE_MOVINT8X8_FROMINT64(AE_ZERO());

value_table_t = value_table;

ae_valign align_vtab = AE_LA64_PP(value_table_t);
AE_LA8X8_IP(d_value_0, align_vtab, (ae_int8x8*)value_table_t);
AE_DSEL8X8(d_value_0, d_value_1, d_value_0, d_value_1, d_shuffle_value_t);

for (j = 0; j < elements_per_channel_t_by_5; j++) {
// AE_LA8X8_IP( d_index, align_index, pIn_tmp ); /* Loading 64 bits
// for decoding 32 weight values */

AE_LAV8X8X2_XP(d_index, d_dummy, align_index, (ae_int8x16*)pIn_tmp,
8); /* Loading 64 bits for decoding 32 weight values */
d0 = d_index;
d1 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 2));

d2 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d0),
0x3333333333333333LL)); // i1,i3,i5, ....
d3 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d1),
0x3333333333333333LL)); // i0,i2,i4, ....

AE_DSEL8X8(d4, d5, d3, d2,
d_shuffle_t1); // d4 = i0,i2,i1,i3,i4,i6,... d5 =
// i16,i18, i17,i19, ....

AE_DSEL8X8(q0, q1, d_value_0, d_value_1,
d4); // q0 = 0,1,4,5,8,9,12,13 q1 = 2,3,6,7,10,11,14,15
AE_DSEL8X8(
q2, q3, d_value_0, d_value_1,
d5); // q2 = 16,17,20,21,24,25,28,29 q3 = 18,19,22,23,26,27,30,31

AE_DSEL8X8(d_out1, d_out2, q0, q1, d_shuffle_t2);
AE_SA8X8X2_IP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp);

AE_DSEL8X8(d_out1, d_out2, q2, q3, d_shuffle_t2);
AE_SA8X8X2_IP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp);
}
if (elements_per_channel_t_rem) {
AE_LAV8X8X2_XP(d_index, d_dummy, align_index, (ae_int8x16*)pIn_tmp,
(elements_per_channel_t_rem >>
2)); /* Loading 48 bits for decoding 16 weight values */
d0 = d_index;
d1 =
AE_MOVINT8X8_FROMINT64(AE_SRLI64(AE_MOVINT64_FROMINT8X8(d_index), 2));
d2 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d0),
0x3333333333333333LL)); // i1,i3,i5, ....
d3 = AE_MOVINT8X8_FROMINT64(
AE_AND64(AE_MOVINT64_FROMINT8X8(d1),
0x3333333333333333LL)); // i0,i2,i4, ....

AE_DSEL8X8(d4, d5, d3, d2,
d_shuffle_t1); // d4 = i0,i2,i1,i3,i4,i6,... d5 =
// i16,i18, i17,i19, ....

AE_DSEL8X8(q0, q1, d_value_0, d_value_1,
d4); // q0 = 0,1,4,5,8,9,12,13 q1 = 2,3,6,7,10,11,14,15
AE_DSEL8X8(
q2, q3, d_value_0, d_value_1,
d5); // q2 = 16,17,20,21,24,25,28,29 q3 = 18,19,22,23,26,27,30,31

AE_DSEL8X8(d_out1, d_out2, q0, q1, d_shuffle_t2);

AE_SAV8X8X2_XP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp,
elements_per_channel_t_rem);

AE_DSEL8X8(d_out1, d_out2, q2, q3, d_shuffle_t2);

AE_SAV8X8X2_XP(d_out1, d_out2, align_store, (ae_int8x16*)p_out_tmp,
elements_per_channel_t_rem_minus_16);
}

value_table += stride;
value_table = value_table + stride;
}
AE_SA128POS_FP(align_store, (ae_int8x16*)p_out_tmp);
}

void DecompressionStateXtensa::DecompressToBufferWidthAnyInt8_Xtensa(
Expand Down Expand Up @@ -407,20 +607,25 @@ int8_t* DecompressionState::DecompressToBuffer<int8_t>(void* buffer) {

if (comp_data_.data.lut_data->compressed_bit_width == 4 &&
!comp_data_.data.lut_data->use_alternate_axis) {
if (!(elements_per_channel_ & 0x0F) &&
comp_data_.data.lut_data->value_table_channel_stride == 16) {
dsx.DecompressToBufferWidth4_Xtensa_Old(static_cast<int8_t*>(buffer));
if (!(elements_per_channel_ & 0x01)) {
dsx.DecompressToBufferWidth4_Xtensa(static_cast<int8_t*>(buffer));
} else {
dsx.DecompressToBufferWidth4_16(static_cast<int8_t*>(buffer));
dsx.DecompressToBufferWidthAnyInt8_Xtensa(static_cast<int8_t*>(buffer));
}
} else if (comp_data_.data.lut_data->compressed_bit_width == 3 &&
!comp_data_.data.lut_data->use_alternate_axis) {
// TODO(ddavis-2015): placeholder
dsx.DecompressToBufferWidthAnyInt8_Xtensa(static_cast<int8_t*>(buffer));
if (!(elements_per_channel_ & 0x07)) {
dsx.DecompressToBufferWidth3_Xtensa(static_cast<int8_t*>(buffer));
} else {
dsx.DecompressToBufferWidthAnyInt8_Xtensa(static_cast<int8_t*>(buffer));
}
} else if (comp_data_.data.lut_data->compressed_bit_width == 2 &&
!comp_data_.data.lut_data->use_alternate_axis) {
// TODO(ddavis-2015): placeholder
dsx.DecompressToBufferWidthAnyInt8_Xtensa(static_cast<int8_t*>(buffer));
if (!(elements_per_channel_ & 0x03)) {
dsx.DecompressToBufferWidth2_Xtensa(static_cast<int8_t*>(buffer));
} else {
dsx.DecompressToBufferWidthAnyInt8_Xtensa(static_cast<int8_t*>(buffer));
}
} else {
dsx.DecompressToBufferWidthAnyInt8_Xtensa(static_cast<int8_t*>(buffer));
}
Expand Down
4 changes: 0 additions & 4 deletions tensorflow/lite/micro/testing/BUILD
Original file line number Diff line number Diff line change
@@ -1,9 +1,5 @@
load("@rules_python//python:defs.bzl", "py_binary", "py_library")
load("@tflm_pip_deps//:requirements.bzl", "requirement")
load("//tensorflow/lite/micro:build_def.bzl",
"tflm_cc_library",
"tflm_cc_test",
)
load(
"//tensorflow/lite/micro:build_def.bzl",
"tflm_cc_library",
Expand Down

0 comments on commit b7bc438

Please sign in to comment.