Skip to content

Conversation

@guangy10
Copy link
Collaborator

@guangy10 guangy10 commented Apr 4, 2025

  1. Add latency measurement for text models
  • Bring the ExecuTorch c++ runner measurements (stats.h) to optimum-executorch in python. Eventually we could remove the py impl and utilize it from extensions/llm via pybind.
  • Measure the latency of ExecuTorchModelForSeq2SeqLM and ExecuTorchModelForCausalLM
  1. modeling.py refactor to reduce duplicate code:
  • Create a ExecuTorchModelBase that represent the ExecuTorch inference model, with implementation of export, load from cache/hub, etc. common methods. Defined abstract method forward, generate, etc. that each derived concrete class must implement
  • Making all supported ExecuTorch inference classes (ExecuTorchModelForSeq2SeqLM, ExecuTorchModelForCausalLM, etc.) lightweight and easy to scale
  1. Added efficientnet model. Conditionally run only when executorch >= 0.6 (so it will run on the pinned nightly). Previously broken due to a bug a in XNNPACK, which has been fixed and picked to the upcoming executorch 0.6 release.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@guangy10 guangy10 force-pushed the et_modeling_refactor branch 2 times, most recently from f0df2f8 to 16c238c Compare April 4, 2025 17:53
@guangy10 guangy10 mentioned this pull request Apr 4, 2025
@guangy10 guangy10 force-pushed the et_modeling_refactor branch from 16c238c to 4d2dc1f Compare April 4, 2025 21:22
@guangy10 guangy10 force-pushed the et_modeling_refactor branch from 4d2dc1f to f78a4a1 Compare April 4, 2025 21:26
Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from what I can understand.
This is great work but could you please try to make separate PRs by topics, like here we could have:

  • A pr for the refactoring
  • A pr for efficientnet
  • A pr for stats computation

It is no big deal at all, but it makes things easier to review.

@michaelbenayoun michaelbenayoun merged commit 2f917c3 into huggingface:main Apr 7, 2025
139 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants