forked from pytorch/executorch
    
        
        - 
                Notifications
    You must be signed in to change notification settings 
- Fork 0
Llama nncf test #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
          
     Draft
      
      
            cavusmustafa
  wants to merge
  180
  commits into
  export_llama_test_1
  
    
      
        
          
  
    
      Choose a base branch
      
     
    
      
        
      
      
        
          
          
        
        
          
            
              
              
              
  
           
        
        
          
            
              
              
           
        
       
     
  
        
          
            
          
            
          
        
       
    
      
from
llama_nncf_test
  
      
      
   
  
    
  
  
  
 
  
      
    base: export_llama_test_1
Could not load branches
            
              
  
    Branch not found: {{ refName }}
  
            
                
      Loading
              
            Could not load tags
            
            
              Nothing to show
            
              
  
            
                
      Loading
              
            Are you sure you want to change the base?
            Some commits from the old base branch may be removed from the timeline,
            and old review comments may become outdated.
          
          
                
     Draft
            
            
          Conversation
  
    
      This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
      Learn more about bidirectional Unicode characters
    
  
  
    
    Differential Revision: D74365586 Pull Request resolved: pytorch#10765
Differential Revision: D74117402 Pull Request resolved: pytorch#10697
Notably, pinned prelude version includes facebook/buck2-prelude@958af4f . Also, we're able to simplify our Buck versioning logic now that Buck has consistent versions across platforms (facebook/buck2#828 (comment))
Differential Revision: D74369346 Pull Request resolved: pytorch#10764
…10771) ## Context When quantizing models with the PT2E quantization flow, quantize/dequantize nodes will be inserted into the graph. However, these quantize/dequantize nodes must be fused with operators such as `aten.linear.default` to produce nodes corresponding to quantized operators (e.g. `weight_int8pack_mm`) in order for quantized operator implementations to be called at runtime. Currently, the op fusion is done by the `fuse_dequant_linear.py` pass, however, this only handles one specific fusion pattern to generate a `weight_int8pack_mm` operator. As more quantized operators are to be supported in ET-VK via the PT2E quantization flow, a more generic fusion pass is needed that can handle a variety of fusion patterns. ## Changes Introduce the `FuseQuantizedOpsTransform()` pass. I elected to introduce a new pass under the `backends/vulkan/_passes` directory, as opposed to modifying the existing pass because I anticipate the majority of the fusion patterns to be specific to ET-VK. Remove the existing `FuseDequantLinearPass()` Switch to using the `FuseQuantizedOpsTransform` pass instead of the old `FuseDequantLinear` pass. Add `test_vulkan_passes` Python test to test export passes. Some small refactors to `test_vulkan_delegate` Python test to improve code organizations. Differential Revision: [D73794042](https://our.internmc.facebook.com/intern/diff/D73794042/)
## Context Title says it all! ## Changes Extended the implementation of `linear_qcsnw` to support packed 4-bit weight tensors. Differential Revision: [D73941991](https://our.internmc.facebook.com/intern/diff/D73941991/)
This way other Llama variants than stories110m can be run.
### Summary Refactoring of unit tests to allow for testing of TOSA 1.0 Adds command-line argument --arm_run_tosa_version to run tests on particular version
### Summary Instead of manually printing all the options in `tools/cmake/Utils.cmake`, let's just "automatically" print all the configured options. ### Test plan ``` $ ./scripts/build_apple_frameworks.sh --Debug -- --- Configurated Options --- -- EXECUTORCH_ENABLE_LOGGING : ON -- --------------------------- ``` ``` $ ./scripts/build_apple_frameworks.sh --Release -- --- Configurated Options --- -- EXECUTORCH_ENABLE_LOGGING : OFF -- --------------------------- ``` cc @larryliu0820
…10773) Signed-off-by: Sebastian Larsson <[email protected]>
…orch#10774) Refactor assertion statements to raise ValueErrors for better error handling in permutation matrix and vector transformations. Ensure that conditions are checked and appropriate exceptions are raised to enhance code robustness and readability. Signed-off-by: Sebastian Larsson <[email protected]>
Summary: Minor change to reserve size for VkWriteDescriptorSet and VkDescriptorSetLayoutBinding vectors. Differential Revision: D74335276
### Summary In this diff we create a helper that will allow presets to set options. Again this is mostly a helper to check if the option has been defined already, then no-oping. To test it, I also create the first preset `macos-arm64`. I will test it in upcoming diffs. ### Test plan pytest for now, manual test in future diffs cc @larryliu0820
### Summary This change converts the unit test from java to kotlin. ### Test plan ./gradlew :executorch_android:testDebugUnitTest --------- Co-authored-by: Haiting Pu <[email protected]>
### Summary * Create the base for a macos-arm64 preset — bigger migration in future diffs * Create an Apple CI job to test builds ### Test plan CI + ``` $ cmake --preset macos-arm64 -- Loading build preset: /Users/jathu/executorch/tools/cmake/preset/macos-arm64.cmake -- --- Configurated Options --- -- EXECUTORCH_BUILD_PRESET_FILE : /Users/jathu/executorch/tools/cmake/preset/macos-arm64.cmake -- EXECUTORCH_ENABLE_LOGGING : ON -- EXECUTORCH_BUILD_COREML : ON -- --------------------------- $ cmake --build cmake-out --parallel ``` cc @larryliu0820
Differential Revision: D73440517 Pull Request resolved: pytorch#10493
Differential Revision: D74349918 Pull Request resolved: pytorch#10760
Differential Revision: D74350331 Pull Request resolved: pytorch#10762
…nv op instead of cpu op for shapes not supported by the TIE kernel. Differential Revision: D74337713 Pull Request resolved: pytorch#10770
Differential Revision: D74420616 Pull Request resolved: pytorch#10778
Differential Revision: D74041198 Pull Request resolved: pytorch#10660
Differential Revision: D74447383 Pull Request resolved: pytorch#10780
…10783) Dont try to print with colors in the pre-push script if the script is non-interactive. This is to avoid getting broken output in the CI which doesnt support colors. Signed-off-by: [email protected]
bloaty told me that we were paying a noticeable size cost for the ::value members of these structs (at least after the PR in this stack that reapplies pytorch#9841) and now we're not. Test Plan: bash test/build_optimized_size_test.sh ``` before: adopt functionref ========== ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 11:08 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 11:08 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5927336 Apr 25 11:08 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4505600 98304 0 4296376320 4300980224 1005bc000 cmake-out/test/size_test_all_optimized_ops after: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` (yes it's neutral; improves size results for further diffs)
…ve build is not in use (pytorch#10490) We duplicate a lot of functions depending on the operator name so that dtype selective build will work. We can just detect if dtype selective build is in use and, if not, stop duplicating. Test Plan: compared results of bash test/build_optimized_size_test.sh before/after this rev. Before: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ``` After: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops ``` (This was reverted because the diff it was stacked on was a size regression. Reversing the order instead this time around, and reverted part of the change that was actually regressing size.)
…s with out_dtypes in template arguments (pytorch#10491) This is necessary to take advantage of pytorch#9388, which creates dtype-specialized implementations for the non-mixed dtype case. Measured the size cost of this approach with test/build_optimized_size_test.sh . It does cost us some size: ``` Before: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:51 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1796928 Apr 25 12:51 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5605176 Apr 25 12:51 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1310720 81920 0 4295458816 4296851456 1001cc000 cmake-out/test/size_test_all_ops 4358144 98304 0 4296212480 4300668928 100570000 cmake-out/test/size_test_all_optimized_ops After: ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:57 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 1889792 Apr 25 12:57 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5799704 Apr 25 12:57 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1376256 81920 0 4295491584 4296949760 1001e4000 cmake-out/test/size_test_all_ops 4423680 98304 0 4296327168 4300849152 10059c000 cmake-out/test/size_test_all_optimized_ops ``` However, on an absolute basis, size is still below where we are at two PRs ago, which was: ``` ExecuTorch with no ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 153928 Apr 25 12:24 cmake-out/test/size_test ExecuTorch with portable ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 2150960 Apr 25 12:24 cmake-out/test/size_test_all_ops ExecuTorch with optimized ops binary size, unstripped: -rwxr-xr-x 1 swolchok staff 5887368 Apr 25 12:24 cmake-out/test/size_test_all_optimized_ops (.venv) swolchok@swolchok-mac ~/src/executorch> size cmake-out/test/size_test* __TEXT __DATA __OBJC others dec hex 81920 81920 0 4295049216 4295213056 10003c000 cmake-out/test/size_test 1474560 81920 0 4295655424 4297211904 100224000 cmake-out/test/size_test_all_ops 4489216 98304 0 4296359936 4300947456 1005b4000 cmake-out/test/size_test_all_optimized_ops ```
Differential Revision: D74495058 Pull Request resolved: pytorch#10793
Differential Revision: D74226258 Pull Request resolved: pytorch#10708
Differential Revision: D74833331 Pull Request resolved: pytorch#10921
### Summary Adds input size validation to `Module.execute` to prevent possible silent memory corruption when too many EValue inputs are passed. Fixes pytorch#10510 ### Test plan - Added unit test `TestExecuteWithTooManyInputs` - Verified by successfully running all `module_test.cpp` tests, except `TestPTD` (did not have access to `ModuleLinear.ptd`) - To run locally: - Bypass `is_fbcode` guard in `targets.bzl` and redirect test file paths to use a locally exported `ModuleAdd.pte` file - Build and run tests via: ``` buck2 build //extension/module/test:test buck2 run //extension/module/test:test --------- Co-authored-by: Anthony Shoumikhin <[email protected]>
Differential Revision: D75006941 Pull Request resolved: pytorch#10974
Differential Revision: D74967760 Pull Request resolved: pytorch#10962
…Ethos-U85 (pytorch#10973) Temporary solution to the problem in pytorch#10958 The arm_executor_runner.cpp need to declare the ethosu_fast_scratch array and pass it onto to the EthosUBackend.cpp. It is important that for Shared_Sram, the ethosu_fast_scratch is nullptr and for Dedicated_Sram it points to the fast memory array.
Summary: ## Context Fix third party `CMakeLists.txt` to allow `flatcc` to build for Windows. Some CMake configuration settings need to be adjusted for windows platforms. Test Plan: ## Test Plan ``` python install_executorch.py ```
### Summary - use 'fold_quantize=False' in convert_pt2e to prevent overwriting state_dict during lowering - change in _get_updated_graph_siganture to have signature detected correctly
Differential Revision: D75024936 Pull Request resolved: pytorch#10889
Pull Request resolved: pytorch#10877 So we can use them in codegen.bzl later (can't pull in definitions from targets.bzl files). ghstack-source-id: 284862879 Differential Revision: [D74741846](https://our.internmc.facebook.com/intern/diff/D74741846/)
Differential Revision: D74865527 Pull Request resolved: pytorch#10938
Pull Request resolved: pytorch#10878 Add dtype selective build for optimized ops. Follows the same process as portable, where we copy the source files and rebuild the library. 1. Generalize copy genrule for portable/optimized/source/header. 2. Copy optimized source files + headers. 3. Build optimized ops using source files, dependencies, portable header. 4. Add test, confirm that we can run addmul with float dtypes (when we remove, the test fails). ghstack-source-id: 284862896 @exported-using-ghexport Differential Revision: [D74688554](https://our.internmc.facebook.com/intern/diff/D74688554/)
Makes it possible to annotate patterns with more than two operators. This allows us to annotate patterns: conv -> bn and conv -> bn -> relu to be able to fold away BN after training in QAT. Also adds support for QAT in Tester class. Signed-off-by: Oscar Andersson <[email protected]>
### Summary Update model unit tests to use the new test infrastructure pipeline.
… stride (pytorch#10972) * AvgPool2dVisitor will adjust the padding so the pooling window is divisible by the stride * Improve tests in test_max_pool.py Signed-off-by: Tom Allsop <[email protected]>
- Removes duplicated matmul tests. - Replaces pytest.mark_flaky with qtol for quantized tests cases of mm/bmm. Signed-off-by: Oscar Andersson <[email protected]>
ortExport llama executorch
    
  cavusmustafa 
      pushed a commit
      that referenced
      this pull request
    
      Jun 20, 2025 
    
    
      
  
    
      
    
  
Differential Revision: D75104487 Pull Request resolved: pytorch#11021
    
  cavusmustafa 
      pushed a commit
      that referenced
      this pull request
    
      Jun 20, 2025 
    
    
      
  
    
      
    
  
Differential Revision: D75718888 Pull Request resolved: pytorch#11444
    
  cavusmustafa 
      pushed a commit
      that referenced
      this pull request
    
      Jun 20, 2025 
    
    
      
  
    
      
    
  
Differential Revision: D76157744 Pull Request resolved: pytorch#11501
    
  cavusmustafa 
      pushed a commit
      that referenced
      this pull request
    
      Aug 19, 2025 
    
    
      
  
    
      
    
  
BNNS copy crashes the process when the dtypes differ (pytorch#11714). With the example in this PR (pytorch#11714), we crash the process on main. Here is the stack trace from LLDB: ``` Process 19234 stopped * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 libsystem_kernel.dylib`__pthread_kill: -> 0x190ac9388 <+8>: b.lo 0x190ac93a8 ; <+40> 0x190ac938c <+12>: pacibsp 0x190ac9390 <+16>: stp x29, x30, [sp, #-0x10]! 0x190ac9394 <+20>: mov x29, sp (lldb) bt * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGABRT * frame #0: 0x0000000190ac9388 libsystem_kernel.dylib`__pthread_kill + 8 frame #1: 0x0000000190b0288c libsystem_pthread.dylib`pthread_kill + 296 frame #2: 0x0000000190a0bc60 libsystem_c.dylib`abort + 124 frame #3: 0x0000000190910174 libsystem_malloc.dylib`malloc_vreport + 892 frame #4: 0x0000000190913c90 libsystem_malloc.dylib`malloc_report + 64 frame #5: 0x000000019091821c libsystem_malloc.dylib`___BUG_IN_CLIENT_OF_LIBMALLOC_POINTER_BEING_FREED_WAS_NOT_ALLOCATED + 32 frame #6: 0x000000019d2f4084 libBNNS.dylib`___lldb_unnamed_symbol1620 + 564 frame #7: 0x000000019d2f5bac libBNNS.dylib`___lldb_unnamed_symbol1628 + 680 frame #8: 0x000000019d69ce48 libBNNS.dylib`BNNSCopy + 616 frame #9: 0x000000030c74d950 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy_using_bnns(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&) + 188 frame #10: 0x000000030c74cfdc _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(executorchcoreml::MultiArray const&, executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) + 72 frame #11: 0x000000030c74ceec _portable_lib.cpython-310-darwin.so`executorchcoreml::MultiArray::copy(executorchcoreml::MultiArray&, executorchcoreml::MultiArray::CopyOptions) const + 148 frame #12: 0x000000030c7488d4 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 376 frame #13: 0x000000030c748ac8 _portable_lib.cpython-310-darwin.so`invocation function for block in (anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 52 frame #14: 0x000000019ad33f4c CoreML`CoreML::MultiArrayBuffer::getBytesWithHandler(void (void const*, unsigned long) block_pointer) const + 340 frame #15: 0x000000019ad34138 CoreML`-[MLMultiArray(ScopedBufferAccess) getBytesWithHandler:] + 152 frame pytorch#16: 0x000000030c7485ec _portable_lib.cpython-310-darwin.so`(anonymous namespace)::copy(MLMultiArray*, executorchcoreml::MultiArray&) + 296 frame pytorch#17: 0x000000030c744f68 _portable_lib.cpython-310-darwin.so`(anonymous namespace)::set_outputs(std::__1::vector<executorchcoreml::MultiArray, std::__1::allocator<executorchcoreml::MultiArray>>&, NSArray<MLMultiArray*>*) + 180 ``` With this PR, the process succeeds.
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
      
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
No description provided.