Skip to content

Navigation Menu

Explore
By size
By industry
By use case
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

pytorch / xla Public

Notifications You must be signed in to change notification settings
Fork 454
Star 2.4k

Code
Issues 583
Pull requests 136
Actions
Projects 11
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[torchbench] Training benchmarks failing with: OOM #6003

Open

11 of 14 tasks

ysiraichi opened this issue Dec 3, 2023 · 0 comments

Open

11 of 14 tasks

[torchbench] Training benchmarks failing with: OOM #6003

ysiraichi opened this issue Dec 3, 2023 · 0 comments

Labels

xla:gpu

Comments

Copy link

Collaborator

ysiraichi commented Dec 3, 2023 •

edited

Loading

This post has two lists of training benchmarks failing with OOM in a NVIDIA A100 40GB GPU:

Eager-mode
Dynamo+openxla

These lists were put together by running the upstreamed benchmarking scripts. More specifically, the following command:

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT --xla None \
       --dynamo openxla --dynamo None \
       --test train \
       --repeat 30 --iterations-per-run 5 \
       --print-subprocess \
       --no-resume

Eager-mode

demucs
densenet121
hf_GPT2_large
hf_T5_base
llama_v2_7b_16h (skipped -- torchbench.yaml)
stable_diffusion_unet
timm_nfnet
timm_vision_transformer_large

Dynamo+`openxla`

demucs
densenet121
llama_v2_7b_16h (skipped -- torchbench.yaml)
stable_diffusion_unet
timm_vision_transformer
timm_vision_transformer_large

The text was updated successfully, but these errors were encountered:

All reactions

ysiraichi mentioned this issue

Failing Torchbench Models: tracking issue #5932

Open

ysiraichi added the xla:gpu label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Assignees

No one assigned

Labels

Projects

None yet

Milestone

No milestone

Development

No branches or pull requests

1 participant

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.