Add latency measurement for text models #44

guangy10 · 2025-04-04T04:21:00Z

Add latency measurement for text models

Bring the ExecuTorch c++ runner measurements (stats.h) to optimum-executorch in python. Eventually we could remove the py impl and utilize it from extensions/llm via pybind.
Measure the latency of ExecuTorchModelForSeq2SeqLM and ExecuTorchModelForCausalLM

modeling.py refactor to reduce duplicate code:

Create a ExecuTorchModelBase that represent the ExecuTorch inference model, with implementation of export, load from cache/hub, etc. common methods. Defined abstract method forward, generate, etc. that each derived concrete class must implement
Making all supported ExecuTorch inference classes (ExecuTorchModelForSeq2SeqLM, ExecuTorchModelForCausalLM, etc.) lightweight and easy to scale

Added efficientnet model. Conditionally run only when executorch >= 0.6 (so it will run on the pinned nightly). Previously broken due to a bug a in XNNPACK, which has been fixed and picked to the upcoming executorch 0.6 release.

HuggingFaceDocBuilderDev · 2025-04-04T04:24:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

michaelbenayoun

LGTM from what I can understand.
This is great work but could you please try to make separate PRs by topics, like here we could have:

A pr for the refactoring
A pr for efficientnet
A pr for stats computation

It is no big deal at all, but it makes things easier to review.

guangy10 requested review from echarlaix and michaelbenayoun April 4, 2025 04:21

guangy10 force-pushed the et_modeling_refactor branch 2 times, most recently from f0df2f8 to 16c238c Compare April 4, 2025 17:53

guangy10 mentioned this pull request Apr 4, 2025

Support Whisper #45

Merged

guangy10 force-pushed the et_modeling_refactor branch from 16c238c to 4d2dc1f Compare April 4, 2025 21:22

Add latency measurement for text models

f78a4a1

guangy10 force-pushed the et_modeling_refactor branch from 4d2dc1f to f78a4a1 Compare April 4, 2025 21:26

michaelbenayoun approved these changes Apr 7, 2025

View reviewed changes

michaelbenayoun merged commit 2f917c3 into huggingface:main Apr 7, 2025
139 of 140 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add latency measurement for text models #44

Add latency measurement for text models #44

Uh oh!

guangy10 commented Apr 4, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2025

Uh oh!

michaelbenayoun left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add latency measurement for text models #44

Add latency measurement for text models #44

Uh oh!

Conversation

guangy10 commented Apr 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2025

Uh oh!

michaelbenayoun left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

guangy10 commented Apr 4, 2025 •

edited

Loading