Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[benchmarks] PyTorch HUD runs on different data-types. #6483

Closed
ysiraichi opened this issue Feb 6, 2024 · 5 comments · Fixed by #6518
Closed

[benchmarks] PyTorch HUD runs on different data-types. #6483

ysiraichi opened this issue Feb 6, 2024 · 5 comments · Fixed by #6518
Assignees
Labels

Comments

@ysiraichi
Copy link
Collaborator

Problem

Currently, we only change the data-type of the models on CUDA, and only if they have DEFAULT_CUDA_<test>_PRECISION specified. Otherwise, the models run on float32 precision. Meanwhile, PyTorch HUD runs inference on bfloat16 and training on AMP.

The data-type of the models is relevant not only for performance, but also for coverage. That is because depending on the data-type of the model, the amount of memory it uses differ.

Possible Solutions

In order to better compare the results on PyTorch HUD and the results we get from running the scripts that live in PyTorch/XLA repository, I think there are a couple of options:

  • Introduce --bfloat16 and --amp arguments for forcing the models to be of a specific precision
  • Introduce --pytorch-hud argument for the behavior described above
  • Make PyTorch HUD used data-types be the default, and add --default-data-type for the old behavior

cc @miladm @golechwierowicz @cota @frgossen @zpcore @vanbasten23

@lezcano
Copy link
Collaborator

lezcano commented Feb 8, 2024

I think defaulting to the PyTorch HUD behaviour and moving the current behaviour under a flag makes sense. cc @miladm

@lezcano
Copy link
Collaborator

lezcano commented Feb 8, 2024

Perhaps we may not even want to keep the previous behaviour. I'll defer to google engs to decide this.

@frgossen
Copy link
Collaborator

frgossen commented Feb 9, 2024

Agreed, resembling PyTorch HUD behaviour as closely as possible makes sense.

@ManfeiBai
Copy link
Collaborator

Hi, @lsy323, is it ok to assign this ticket to you?

@ysiraichi
Copy link
Collaborator Author

@ManfeiBai @lsy323 Ah, sorry. I should have mentioned it. I'm working on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants