Llama nncf test #3

cavusmustafa · 2025-05-22T23:52:27Z

No description provided.

Differential Revision: D74365586 Pull Request resolved: pytorch#10765

Differential Revision: D74117402 Pull Request resolved: pytorch#10697

Notably, pinned prelude version includes facebook/buck2-prelude@958af4f . Also, we're able to simplify our Buck versioning logic now that Buck has consistent versions across platforms (facebook/buck2#828 (comment))

Differential Revision: D74369346 Pull Request resolved: pytorch#10764

…10771) ## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/)

## Context Title says it all! ## Changes Extended the implementation of `linear_qcsnw` to support packed 4-bit weight tensors. Differential Revision: [D73941991](https://our.internmc.facebook.com/intern/diff/D73941991/)

This way other Llama variants than stories110m can be run.

### Summary Refactoring of unit tests to allow for testing of TOSA 1.0 Adds command-line argument --arm_run_tosa_version to run tests on particular version

@larryliu0820

### Summary Instead of manually printing all the options in `tools/cmake/Utils.cmake`, let's just "automatically" print all the configured options. ### Test plan ``` $ ./scripts/build_apple_frameworks.sh --Debug -- --- Configurated Options --- -- EXECUTORCH_ENABLE_LOGGING : ON -- --------------------------- ``` ``` $ ./scripts/build_apple_frameworks.sh --Release -- --- Configurated Options --- -- EXECUTORCH_ENABLE_LOGGING : OFF -- --------------------------- ``` cc @larryliu0820

…10773) Signed-off-by: Sebastian Larsson <[email protected]>

…orch#10774) Refactor assertion statements to raise ValueErrors for better error handling in permutation matrix and vector transformations. Ensure that conditions are checked and appropriate exceptions are raised to enhance code robustness and readability. Signed-off-by: Sebastian Larsson <[email protected]>

Summary: Minor change to reserve size for VkWriteDescriptorSet and VkDescriptorSetLayoutBinding vectors. Differential Revision: D74335276

@larryliu0820

### Summary In this diff we create a helper that will allow presets to set options. Again this is mostly a helper to check if the option has been defined already, then no-oping. To test it, I also create the first preset `macos-arm64`. I will test it in upcoming diffs. ### Test plan pytest for now, manual test in future diffs cc @larryliu0820

### Summary This change converts the unit test from java to kotlin. ### Test plan ./gradlew :executorch_android:testDebugUnitTest --------- Co-authored-by: Haiting Pu <[email protected]>

@larryliu0820

### Summary * Create the base for a macos-arm64 preset — bigger migration in future diffs * Create an Apple CI job to test builds ### Test plan CI + ``` $ cmake --preset macos-arm64 -- Loading build preset: /Users/jathu/executorch/tools/cmake/preset/macos-arm64.cmake -- --- Configurated Options --- -- EXECUTORCH_BUILD_PRESET_FILE : /Users/jathu/executorch/tools/cmake/preset/macos-arm64.cmake -- EXECUTORCH_ENABLE_LOGGING : ON -- EXECUTORCH_BUILD_COREML : ON -- --------------------------- $ cmake --build cmake-out --parallel ``` cc @larryliu0820

Differential Revision: D73440517 Pull Request resolved: pytorch#10493

Differential Revision: D74349918 Pull Request resolved: pytorch#10760

Differential Revision: D74350331 Pull Request resolved: pytorch#10762

…nv op instead of cpu op for shapes not supported by the TIE kernel. Differential Revision: D74337713 Pull Request resolved: pytorch#10770

Differential Revision: D74420616 Pull Request resolved: pytorch#10778

Differential Revision: D74041198 Pull Request resolved: pytorch#10660

Differential Revision: D74447383 Pull Request resolved: pytorch#10780

…10783) Dont try to print with colors in the pre-push script if the script is non-interactive. This is to avoid getting broken output in the CI which doesnt support colors. Signed-off-by: [email protected]

bloaty told me that we were paying a noticeable size cost for the ::value members of these structs (at least after the PR in this stack that reapplies pytorch#9841) and now we're not. Test Plan: bash test/build_optimized_size_test.sh ``` before: adopt functionref ========== ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 11:08 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 11:08 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5927336 Apr 25 11:08 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4505600 98304 0 4296376320 4300980224 1005bc000 cmake-out/test/size_test_all_optimized_ops after: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` (yes it's neutral; improves size results for further diffs)

…ve build is not in use (pytorch#10490) We duplicate a lot of functions depending on the operator name so that dtype selective build will work. We can just detect if dtype selective build is in use and, if not, stop duplicating. Test Plan: compared results of bash test/build_optimized_size_test.sh before/after this rev. Before: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` After: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops ``` (This was reverted because the diff it was stacked on was a size regression. Reversing the order instead this time around, and reverted part of the change that was actually regressing size.)

…s with out_dtypes in template arguments (pytorch#10491) This is necessary to take advantage of pytorch#9388, which creates dtype-specialized implementations for the non-mixed dtype case. Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size: ``` Before: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops After: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:57 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1376256 81920 0 4295491584 4296949760 1001e4000 cmake-out/test/size_test_all_ops 4423680 98304 0 4296327168 4300849152 10059c000 cmake-out/test/size_test_all_optimized_ops ``` However, on an absolute basis, size is still below where we are at two PRs ago, which was: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ```

Differential Revision: D74495058 Pull Request resolved: pytorch#10793

Differential Revision: D74226258 Pull Request resolved: pytorch#10708

Differential Revision: D74833331 Pull Request resolved: pytorch#10921

### Summary Adds input size validation to `Module.execute` to prevent possible silent memory corruption when too many EValue inputs are passed. Fixes pytorch#10510 ### Test plan - Added unit test `TestExecuteWithTooManyInputs` - Verified by successfully running all `module_test.cpp` tests, except `TestPTD` (did not have access to `ModuleLinear.ptd`) - To run locally: - Bypass `is_fbcode` guard in `targets.bzl` and redirect test file paths to use a locally exported `ModuleAdd.pte` file - Build and run tests via: ``` buck2 build //extension/module/test:test buck2 run //extension/module/test:test --------- Co-authored-by: Anthony Shoumikhin <[email protected]>

Differential Revision: D75006941 Pull Request resolved: pytorch#10974

Differential Revision: D74967760 Pull Request resolved: pytorch#10962

…Ethos-U85 (pytorch#10973) Temporary solution to the problem in pytorch#10958 The arm_executor_runner.cpp need to declare the ethosu_fast_scratch array and pass it onto to the EthosUBackend.cpp. It is important that for Shared_Sram, the ethosu_fast_scratch is nullptr and for Dedicated_Sram it points to the fast memory array.

Summary: ## Context Fix third party `CMakeLists.txt` to allow `flatcc` to build for Windows. Some CMake configuration settings need to be adjusted for windows platforms. Test Plan: ## Test Plan ``` python install_executorch.py ```

### Summary - use 'fold_quantize=False' in convert_pt2e to prevent overwriting state_dict during lowering - change in _get_updated_graph_siganture to have signature detected correctly

Differential Revision: D75024936 Pull Request resolved: pytorch#10889

Pull Request resolved: pytorch#10877 So we can use them in codegen.bzl later (can't pull in definitions from targets.bzl files). ghstack-source-id: 284862879 Differential Revision: [D74741846](https://our.internmc.facebook.com/intern/diff/D74741846/)

Differential Revision: D74865527 Pull Request resolved: pytorch#10938

Pull Request resolved: pytorch#10878 Add dtype selective build for optimized ops. Follows the same process as portable, where we copy the source files and rebuild the library. 1. Generalize copy genrule for portable/optimized/source/header. 2. Copy optimized source files + headers. 3. Build optimized ops using source files, dependencies, portable header. 4. Add test, confirm that we can run addmul with float dtypes (when we remove, the test fails). ghstack-source-id: 284862896 @exported-using-ghexport Differential Revision: [D74688554](https://our.internmc.facebook.com/intern/diff/D74688554/)

Makes it possible to annotate patterns with more than two operators. This allows us to annotate patterns: conv -> bn and conv -> bn -> relu to be able to fold away BN after training in QAT. Also adds support for QAT in Tester class. Signed-off-by: Oscar Andersson <[email protected]>

### Summary Update model unit tests to use the new test infrastructure pipeline.

… stride (pytorch#10972) * AvgPool2dVisitor will adjust the padding so the pooling window is divisible by the stride * Improve tests in test_max_pool.py Signed-off-by: Tom Allsop <[email protected]>

- Removes duplicated matmul tests. - Replaces pytest.mark_flaky with qtol for quantized tests cases of mm/bmm. Signed-off-by: Oscar Andersson <[email protected]>

ortExport llama executorch

Differential Revision: D75104487 Pull Request resolved: pytorch#11021

Differential Revision: D75718888 Pull Request resolved: pytorch#11444

Differential Revision: D76157744 Pull Request resolved: pytorch#11501

BNNS copy crashes the process when the dtypes differ (pytorch#11714). With the example in this PR (pytorch#11714), we crash the process on main. Here is the stack trace from LLDB: ``` Process 19234 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 libsystem_kernel.dylib`__pthread_kill: -> 0x190ac9388 <+8>: b.lo 0x190ac93a8 ; <+40> 0x190ac938c <+12>: pacibsp 0x190ac9390 <+16>: stp x29, x30, [sp, #-0x10]! 0x190ac9394 <+20>: mov x29, sp (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296 frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124 frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892 frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64 frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564 frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680 frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616 frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188 frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72 frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148 frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376 frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52 frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340 frame #15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152 frame pytorch#16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296 frame pytorch#17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180 ``` With this PR, the process succeeds.

kirklandsign and others added 30 commits May 7, 2025 18:24

Vulkan tests use executorch_core

51c4a5a

Differential Revision: D74365586 Pull Request resolved: pytorch#10765

Handle avg_pool2d with padding == 0 as no padding

4863988

Differential Revision: D74117402 Pull Request resolved: pytorch#10697

Update buck2 to 2025-05-06 (pytorch#10742)

bf5b99a

Notably, pinned prelude version includes facebook/buck2-prelude@958af4f . Also, we're able to simplify our Buck versioning logic now that Buck has consistent versions across platforms (facebook/buck2#828 (comment))

Tests use executorch_core

bb7e50f

Differential Revision: D74369346 Pull Request resolved: pytorch#10764

[ET-VK] Implement linear_qcs4w (pytorch#10772)

5e8295e

## Context Title says it all! ## Changes Extended the implementation of `linear_qcsnw` to support packed 4-bit weight tensors. Differential Revision: [D73941991](https://our.internmc.facebook.com/intern/diff/D73941991/)

Arm backend: Add model name to -llama_inputs (pytorch#10775)

d9c6f80

This way other Llama variants than stories110m can be run.

Arm Backend: Update unit tests for TOSA 1.0 (pytorch#10776)

3c21e3a

### Summary Refactoring of unit tests to allow for testing of TOSA 1.0 Adds command-line argument --arm_run_tosa_version to run tests on particular version

Arm backend: Remove redundant validation check for op_where (pytorch#…

c352813

…10773) Signed-off-by: Sebastian Larsson <[email protected]>

Minor vector sizing change. (pytorch#10753)

a37b369

Summary: Minor change to reserve size for VkWriteDescriptorSet and VkDescriptorSetLayoutBinding vectors. Differential Revision: D74335276

Convert the unit test from java to kotlin (pytorch#10702)

d25ce54

### Summary This change converts the unit test from java to kotlin. ### Test plan ./gradlew :executorch_android:testDebugUnitTest --------- Co-authored-by: Haiting Pu <[email protected]>

Extract trace from prepare_and_convert and remove export_program

5ad676d

Differential Revision: D73440517 Pull Request resolved: pytorch#10493

Make constant_folding's _DEFAULT_SKIP_TARGETS public

277c39d

Differential Revision: D74349918 Pull Request resolved: pytorch#10760

: constant fold None

b1b46ee

Differential Revision: D74350331 Pull Request resolved: pytorch#10762

to make TIE quantized conv operator to fall back to hifi quantized co…

6e3cb79

…nv op instead of cpu op for shapes not supported by the TIE kernel. Differential Revision: D74337713 Pull Request resolved: pytorch#10770

Arm Backend: Use tosa_ref_model only if it is avaiable

01a5d81

Differential Revision: D74420616 Pull Request resolved: pytorch#10778

Use std::align_alloc in file_data_loader

7e1f3e3

Differential Revision: D74041198 Pull Request resolved: pytorch#10660

fix transpose / permutations fusion pass

6759d35

Differential Revision: D74447383 Pull Request resolved: pytorch#10780

Arm backend: Suppress colors in pre-push if non-interactive (pytorch#…

54a14d9

…10783) Dont try to print with colors in the pre-push script if the script is non-interactive. This is to avoid getting broken output in the CI which doesnt support colors. Signed-off-by: [email protected]

Cortex-M: Use q/dq ops in Arm Ethos Runner (pytorch#10782)

f7c906f

Use torchtune 0.6.1 (pytorch#10792)

b866837

bugfix

6e959be

Differential Revision: D74495058 Pull Request resolved: pytorch#10793

fix bug with sequential backends

bf50527

Differential Revision: D74226258 Pull Request resolved: pytorch#10708

anzr299 and others added 22 commits May 19, 2025 22:43

add quantization support for disable_dynamic_shapes

ea9eeb8

ToOutVarPass skips inplace ops

a905728

Differential Revision: D74833331 Pull Request resolved: pytorch#10921

minor fix

770569d

support function + method variants

4d7b64e

Differential Revision: D75006941 Pull Request resolved: pytorch#10974

Remove ReplaceTCopyWithTransform

b2f9ef9

Differential Revision: D74967760 Pull Request resolved: pytorch#10962

Fix Windows build (pytorch#10946)

cb3eba0

Summary: ## Context Fix third party `CMakeLists.txt` to allow `flatcc` to build for Windows. Some CMake configuration settings need to be adjusted for windows platforms. Test Plan: ## Test Plan ``` python install_executorch.py ```

Qualcomm AI Engine Direct - fix for pytorch uplevel (pytorch#10769)

d1c2683

### Summary - use 'fold_quantize=False' in convert_pt2e to prevent overwriting state_dict during lowering - change in _get_updated_graph_siganture to have signature detected correctly

Lint links for modified lines only on PR (pytorch#10994)

b73f9d5

Add etdump to android

7d9b15f

Differential Revision: D75024936 Pull Request resolved: pytorch#10889

Add a android log implementation

7d194cf

Differential Revision: D74865527 Pull Request resolved: pytorch#10938

partitioner update

9916cee

Arm backend: Refactor models to allow for TOSA 1.0 (pytorch#10904)

d509ee3

### Summary Update model unit tests to use the new test infrastructure pipeline.

Arm backend: Adjust AvgPool2d padding when window is not divisible by…

08dfe52

… stride (pytorch#10972) * AvgPool2dVisitor will adjust the padding so the pooling window is divisible by the stride * Improve tests in test_max_pool.py Signed-off-by: Tom Allsop <[email protected]>

Arm backend: Clean up matmul tests (pytorch#10971)

da62d5f

- Removes duplicated matmul tests. - Replaces pytest.mark_flaky with qtol for quantized tests cases of mm/bmm. Signed-off-by: Oscar Andersson <[email protected]>

Merge branch 'export_llama_executorch' into main

78779b0

Merge main

3871a5f

ortExport llama executorch

update for latest

0c20955

cavusmustafa had a problem deploying to upload-benchmark-results May 24, 2025 23:52 — with GitHub Actions Error

cavusmustafa added 2 commits June 4, 2025 19:32

quant and fp16 temp fix

d3730ea

enable import override for export_to_edge with openvino

3fef8fd

cavusmustafa had a problem deploying to upload-benchmark-results June 7, 2025 23:41 — with GitHub Actions Error

cavusmustafa pushed a commit that referenced this pull request Jun 20, 2025

Use GraphBuilder in unit tests for ops removal #3.

42519db

Differential Revision: D75104487 Pull Request resolved: pytorch#11021

cavusmustafa pushed a commit that referenced this pull request Jun 20, 2025

GraphBuilder in memory passes unit tests. #3

e6440a0

Differential Revision: D75718888 Pull Request resolved: pytorch#11444

cavusmustafa pushed a commit that referenced this pull request Jun 20, 2025

Use GraphBuilder in test_replace_ops_passes. #3

2a30250

Differential Revision: D76157744 Pull Request resolved: pytorch#11501

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama nncf test #3

Llama nncf test #3

Uh oh!

cavusmustafa commented May 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

54 participants

Llama nncf test #3

Are you sure you want to change the base?

Llama nncf test #3

Uh oh!

Conversation

cavusmustafa commented May 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

54 participants