Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Torchbench Models: tracking issue #5932

Open
ysiraichi opened this issue Nov 28, 2023 · 38 comments
Open

Failing Torchbench Models: tracking issue #5932

ysiraichi opened this issue Nov 28, 2023 · 38 comments
Assignees
Labels

Comments

@ysiraichi
Copy link
Collaborator

ysiraichi commented Nov 28, 2023

Summary of Contributions (9th Feb)

  1. Improve the number of models in TorchBench that work with Dynamo as a tracer: These passing rates are now comparable to those from torch.compile using Inductor. Some of the fixes also improved the previous tracer that PyTorch/XLA used to use.

    Inference Training
    Inductor 87 63
    Dynamo 60 to 82 41 to 53
    Non-Dynamo 79 to 82 54 to 56
  2. Improve the benchmarking tools used by Google: The initial Google runs benchmarking these models showed a discrepancy of about 15 models with the results reported. We identified and fixed 10+ issues that helped reconcile Google's benchmarks with those reported and, in turn, with the PyTorch HUD.

Current State

This post has two lists:

  • Failing inference models
  • Failing training models

Each of them shows the failing models:

  • Tracing without Dynamo (Eager-mode)
  • Tracing with Dynamo into openxla (Dynamo+openxla)

These lists were created using the benchmarking scripts that currently live in the upstream. The following command was executed:

python xla/benchmarks/experiment_runner.py \
       --suite-name torchbench \
       --accelerator cuda \
       --xla PJRT --xla None \
       --dynamo openxla --dynamo inductor --dynamo None \
       --test eval --test train \
       --repeat 30 --iterations-per-run 5 \
       --print-subprocess \
       --no-resume

Environment

  • GPU: A100 40GB

Inference

Non-Dynamo. Pass rate: 78/81 - 96% (against inductor)

Dynamo+openxla. 78/81 - 96% (against inductor)

Models also Failing on Inductor

Inference Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

  • hf_clip
    • 'str' object has no attribute 'shape'
  • mobilenet_v2_quantized_qat
  • resnet50_quantized_qat

Inference Failing on Inductor CUDA with Different Errors

Training

Non-Dynamo. Pass rate: 64/66 - 96% (against inductor)

Dynamo+openxla. Pass rate: 55/66 - 83% (against inductor)

Models also Failing on Inductor

No Training Support on Inductor CUDA

Benchmarks that raise the error: Model's DEFAULT_TRAIN_BSIZE is not implemented.

  • cm3leon_generate
  • detectron2_fcos_r_50_fpn
  • doctr_det_predictor
  • doctr_reco_predictor
  • hf_T5_generate
  • llama
  • phi_1_5
  • pyhpc_equation_of_state
  • pyhpc_isoneutral_mixing
  • pyhpc_turbulent_kinetic_energy
  • sam
  • simple_gpt
  • simple_gpt_tp_manual

Training Failing on Inductor CUDA with the Same Error

Benchmarks that raise the same error on inductor:

Training Failing on Inductor CUDA with Different Errors

cc @JackCaoG @miladm

@lezcano lezcano changed the title Torchbench benchmarks: tracking issue Failing Torchbench benchmarks: tracking issue Nov 28, 2023
@miladm miladm added the xla:gpu label Dec 1, 2023
@lezcano lezcano changed the title Failing Torchbench benchmarks: tracking issue Failing Torchbench Models: tracking issue Dec 1, 2023
@lezcano
Copy link
Collaborator

lezcano commented Dec 1, 2023

State after 7 weeks of work:

Models fixed so far:

  • pyhpc_isoneutral_mixing
  • pyhpc_turbulent_kinetic_energy
  • dlrm
  • Super_SloMo
  • speech_transformer

PRs to fix the models. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Dec 11, 2023

Weekly update (Dec 1~Dec 10):

Models fixed:

  • DALLE2_pytorch
    • training is now failing with the same error as inductor
  • stable_diffusion_unet
    • training is still failing with OOM
  • stable_diffusion_text_encoder
  • hf_GPT2
  • hf_GPT2_large
    • training without dynamo is still failing
  • yolov3
    • Failing possibly due to a cuNND error, which is likely an OOM, on a RTX 2060. Haven't tested it yet on a A100, though

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Dec 15, 2023

@miladm
Copy link
Collaborator

miladm commented Jan 10, 2024

Can we please add a pass rate table in the weekly report that includes:

Inference

  • Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

Training

  • Inductor, Dynamo+PyTocrh/XLA:GPU, Non-Dynamo+PyTocrh/XLA:GPU

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Jan 16, 2024

Weekly update (Jan 8 ~ Jan 12):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 91 64
Non-Dynamo 87 67
Dynamo 86 57

Models fixed:

  • detectron2 models (inference with dynamo)
  • hf_BigBird (inference and training with dynamo)
  • torch_multimodal_clip (training with dynamo)
  • timm_vision_transformer (training with dynamo)
  • Likely not due to the merged PRs below:
    • detectron2 models: all but detectron2_fcos_r_50_fpn (training without dynamo)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Jan 15 ~ Jan 19):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 85 62
Non-Dynamo 70 57
Dynamo 71 55

Models that started failing:

  • After Re-land: Fix model initialization. #6296:
    • detectron2_fasterrcnn_r_101_c4
    • detectron2_fasterrcnn_r_101_dc5
    • detectron2_fasterrcnn_r_101_fpn
    • detectron2_fasterrcnn_r_50_c4
    • detectron2_fasterrcnn_r_50_dc5
    • detectron2_fasterrcnn_r_50_fpn
    • detectron2_fcos_r_50_fpn
    • detectron2_maskrcnn_r_101_c4
    • detectron2_maskrcnn_r_101_fpn
    • detectron2_maskrcnn_r_50_c4
    • detectron2_maskrcnn_r_50_fpn
    • mobilenet_v3_large
    • timm_regnet
    • hf_Bart
  • Started being skipped:
    • pytorch_CycleGAN_and_pix2pix
    • pytorch_unet
  • Unsupported precision:
    • pytorch_unet
    • yolov3
  • cuDNN error:
    • Super_SloMo (inductor)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@miladm
Copy link
Collaborator

miladm commented Jan 23, 2024

Can we track separate passrate tables for L4 and A100 GPUs going forward @ysiraichi?

cc @frgossen @golechwierowicz @cota

@ysiraichi
Copy link
Collaborator Author

Weekly update (Jan 22 ~ Jan 26):

Pass rate (out of 99 benchmarks):

Inference Training
Inductor 88 63
Non-Dynamo 69 57
Dynamo 72 55

Models fixed:

  • (inductor) moco
  • (inductor) Super_SloMo
    • Failed when executed with all other benchmarks
    • Passed when executed alone (by specifying --filter argument)
  • (inference) llama_v2_7b_16h

Models that started failing:

  • (inference + non-dynamo) timm_efficientnet (to be fixed by: #6389)
  • (inference + non-dynamo) timm_nfnet (to be fixed by: #6389)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Jan 29 ~ Feb 2):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 87 (last: 88) 63
Non-Dynamo 82 (last: 69) 56 (last: 57)
Dynamo 82 (last: 72) 53 (last: 55)

L4

Inference Training
Inductor 86 60
Non-Dynamo 81 53
Dynamo 82 49

Models Summary (for A100)

  • Inductor: Inference (-4, +3)
    • (fail) New skips by PyTorch's torchbench skip list:
      • detectron2_maskrcnn
      • hf_Bert
      • hf_Bert_large
      • maml
    • (pass) Remove outdated skip:
      • vision_maskrcnn
    • (pass) AMP supported:
      • pytorch_unet
      • yolov3
  • Inductor: Training (-3, +3)
    • (fail) New skips by PyTorch's torchbench skip list:
      • hf_Bert
      • hf_Bert_large
    • (fail) Failing due to sparse error:
      • dlrm
    • (pass) AMP supported:
      • pytorch_unet
    • (pass) No OOM:
      • demucs
      • opacus_cifar10
  • XLA:GPU (non-dynamo): Inference (-3, +16)
    • (fail) New skips by PyTorch's torchbench skip list:
      • detectron2_maskrcnn
      • hf_Bert
      • hf_Bert_large
    • (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
      • detectron2 benchmarks (11)
      • mobilenet_v3_large
      • timm_efficientnet
      • timm_nfnet
      • timm_regnet
    • (pass) AMP supported:
      • yolov3
  • XLA:GPU (non-dynamo): Training (-2, +1)
    • (fail) New skips by PyTorch's torchbench skip list:
      • hf_Bert
      • hf_Bert_large
    • (pass) No OOM:
      • hf_GPT2_large
  • XLA:GPU (dynamo): Inference (-4, +14)
    • (fail) New skips by PyTorch's torchbench skip list:
      • detectron2_maskrcnn
      • hf_Bert
      • hf_Bert_large
      • maml
    • (pass) Remove outdated skip:
      • vision_maskrcnn
    • (pass) Forcing fp32 precision (while setting XLA_USE_FP16):
      • detectron2 benchmarks (11)
      • hf_Bart
    • (pass) AMP supported:
      • yolov3
  • XLA:GPU (dynamo): Training (-2, +0)
    • (fail) New skips by PyTorch's torchbench skip list:
      • hf_Bert
      • hf_Bert_large

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@pytorch pytorch deleted a comment from ysiraichi Feb 9, 2024
@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Feb 12, 2024

Weekly update (Feb 5 ~ Feb 9):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 87 (last: 87) 63
Non-Dynamo 82 (last: 82) 57 (last: 56)
Dynamo 84 (last: 82) 53 (last: 53)

L4

Inference Training
Inductor 86 60
Non-Dynamo 81 53
Dynamo 84 49

Models Summary

  • XLA:GPU (non-dynamo): Training (0, +1)
    • (pass) No OOM:
      • densenet121
  • XLA:GPU (dynamo): Inference (0, +2)
    • (pass) Increased compilation cache:
      • cm3leon_generate
      • hf_T5_generate

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Feb 26, 2024

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Feb 27, 2024

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 87) 65 (last: 63)
Non-Dynamo 72 (last: 82) 61 (last: 57)
Dynamo 73 (last: 84) 54 (last: 53)

L4

Inference Training
Inductor 81 (last: 86) 62 (last: 60)
Non-Dynamo 71 (last: 81) 57 (last: 53)
Dynamo 73 (last: 84) 52 (last: 49)

Models Summary

  • Inductor: Inference (-10, +4)

    • (fail) "roi_align_forward_kernel" not implemented for 'BFloat16' (after: #6518)
      • detectron2 benchmarks (10)
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • maml
      • pytorch_CycleGAN_and_pix2pix
  • Inductor: Training (-3, +5)

    • (fail) Running on AMP (after: #6518)
      • mobilenet_v2_quantized_qat
      • resnet50_quantized_qat
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • pytorch_CycleGAN_and_pix2pix
  • XLA:GPU (non-dynamo): Inference (-15, +5)

    • (fail) Error while lowering: aten::upsample_bilinear2d (after: #6518) (issue: #6520)
      • Background_Matting
    • (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
      • detectron2 benchmarks (11)
    • (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
      • hf_GPT2 and hf_GPT2_large
    • (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
      • llama
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • maml
      • pytorch_CycleGAN_and_pix2pix
      • pytorch_unet
  • XLA:GPU (non-dynamo): Training (0, +4)

    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • pytorch_CycleGAN_and_pix2pix
      • pytorch_unet
  • XLA:GPU (dynamo): Inference (-16, +5)

    • (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
      • Super_SloMo
    • (fail) CPU fallback does not work with mixed dtypes (issue: #6336)
      • detectron2 benchmarks (11)
    • (fail) Seen floating point types of different precisions in HLO (after: #6518) (issue: #6521)
      • hf_GPT2 and hf_GPT2_large
    • (fail) Indices types are not Long (they are Int) (after: #6518) (issue: #6648)
      • llama
    • (fail) Slice size at index 0 in gather op is out of range, must be within [0, 1), got 1. (issue: #6557)
      • vision_maskrcnn
  • XLA:GPU (dynamo): Training (-4, +5)

    • (fail) expected scalar type Float but found Half (after: #6518) (issue: #6556)
      • Super_SloMo
    • (fail) Seen floating point types of different precisions in HLO (after: #6518)
      • hf_GPT2 and hf_GPT2_large (issue: #6521)
      • timm_nfnet (issue: #6649)
    • (pass) Remove outdated skips
      • hf_Bert and hf_Bert_large
      • pytorch_CycleGAN_and_pix2pix
      • pytorch_unet
    • (pass) No OOM
      • stable_diffusion_unet

@ysiraichi
Copy link
Collaborator Author

Weekly update (Feb 26 ~ Mar 01):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 72 (last: 72) 61 (last: 61)
Dynamo 73 (last: 73) 56 (last: 54)

L4

Inference Training
Inductor 81 (last: 81) 63 (last: 62)
Non-Dynamo 72 (last: 71) 58 (last: 57)
Dynamo 71 (last: 73) 54 (last: 52)

Models Summary

  • XLA:GPU (non-dynamo): Training (-1, +1)

    • (fail) Timeout:
      • timm_efficientdet
    • (pass) Smaller batch size
      • demucs
  • XLA:GPU (dynamo): Inference (-2, 0)

    • (fail) Timeout:
      • cm3leon_generate
      • hf_T5_generate
  • XLA:GPU (dynamo): Training (0, +2)

    • (pass) Smaller batch size
      • densenet121
      • timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Mar 04 ~ Mar 08):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 65)
Non-Dynamo 72 (last: 72) 61 (last: 61)
Dynamo 71 (last: 71) 57 (last: 56)

L4

Inference Training
Inductor 81 (last: 81) 64 (last: 63)
Non-Dynamo 72 (last: 72) 58 (last: 58)
Dynamo 71 (last: 71) 55 (last: 54)

Models Summary (A100)

  • Inductor: Training (0, +1)

    • (pass) Reason unknown
      • dlrm
  • XLA:GPU (dynamo): Training (0, +1)

    • (pass) Tensor.new dynamo support
      • hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Mar 11 ~ Mar 15):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 37 (last: 72) 28 (last: 61)
Dynamo 31 (last: 71) 18 (last: 57)

L4

Inference Training
Inductor 81 (last: 81) 64 (last: 63)
Non-Dynamo 45 (last: 72) 38 (last: 58)
Dynamo 44 (last: 71) 22 (last: 55)

Models Summary (A100)

No summary this week because:

  • Diff is too big
  • It might be due to a pin update

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@vanbasten23
Copy link
Collaborator

@ysiraichi The regression you saw might be due to #6677 (open xla pin update). Our team is looking into this issue.

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Mar 25, 2024

Weekly update (Mar 18 ~ Mar 21):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 72) 64 (last: 61)
Dynamo 73 (last: 71) 58 (last: 57)

L4

Inference Training
Inductor 80 (last: 81) 64 (last: 64)
Non-Dynamo 76 (last: 72) 61 (last: 58)
Dynamo 74 (last: 71) 56 (last: 55)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@miladm
Copy link
Collaborator

miladm commented Apr 1, 2024

Last week, the results were unchanged.
We are preparing for performance optimizations.
cc @ysiraichi

@ysiraichi
Copy link
Collaborator Author

Weekly update (Apr 1 ~ Apr 5):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 76) 63 (last: 64)
Dynamo 73 (last: 73) 53 (last: 58)

L4

Inference Training
Inductor 82 (last: 80) 65 (last: 64)
Non-Dynamo 75 (last: 76) 61 (last: 61)
Dynamo 74 (last: 74) 51 (last: 56)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Apr 8 ~ Apr 12):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 74 (last: 75) 64 (last: 63)
Dynamo 74 (last: 73) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 75 (last: 75) 61 (last: 61)
Dynamo 75 (last: 74) 51 (last: 51)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (-1, 0)

    • (fail) doctr_reco_predictor: TIMEOUT
  • XLA:GPU (non-dynamo): Training (0, +1)

    • (pass) timm_efficientdet
  • XLA:GPU (dynamo): Inference (0, +1)

    • (pass) hf_Reformer

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Apr 15 ~ Apr 19):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor ? (last: 81) ? (last: 66)
Non-Dynamo ? (last: 74) ? (last: 64)
Dynamo ? (last: 74) ? (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 75) 61 (last: 61)
Dynamo 76 (last: 75) 51 (last: 51)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Apr 22 ~ Apr 26):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 74) 64 (last: 64)
Dynamo 75 (last: 74) 53 (last: 53)

L4

Inference Training
Inductor 81 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (0, +1)

    • (pass) timm_efficientdet
  • XLA:GPU (dynamo): Inference (0, +1)

    • (pass) timm_efficientdet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Apr 29 ~ May 3):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 75) 64 (last: 64)
Dynamo 75 (last: 75) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 81) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (0, +1)
    • (pass) doctr_reco_predictor

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (May 6 ~ May 10):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 81) 66 (last: 66)
Non-Dynamo 76 (last: 75) 64 (last: 64)
Dynamo 75 (last: 75) 53 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 76 (last: 76) 61 (last: 61)
Dynamo 76 (last: 76) 51 (last: 51)

Notes

  • Inductor on L4 started failing with: SyntaxError: unterminated string literal
    • Oddly enough, A100 didn't have the same error
    • Didn't update the results of L4

Models Summary (A100)

  • Inductor: Inference (0, +1)
    • (pass) maml

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented May 20, 2024

Weekly update (May 13 ~ May 17):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 66 (last: 66)
Non-Dynamo 77 (last: 76) 61 (last: 64)
Dynamo 78 (last: 75) 55 (last: 53)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 77 (last: 76) 59 (last: 61)
Dynamo 78 (last: 76) 52 (last: 51)

Models Summary (A100)

All the difference shown bellow is likely the result of #7067, which fixes AMP. Reason: (i) training benchmarks use AMP, by default; and (ii) there are some inference benchmarks that use AMP instead of bfloat16.

  • XLA:GPU (non-dynamo): Inference (0, +1)

    • (pass) detectron2_fcos_r_50_fpn
  • XLA:GPU (non-dynamo): Training (-5, +2)

    • (fail) Super_SloMo
    • (fail) mobilenet_v2_quantized_qat
    • (fail) resnet50_quantized_qat
    • (fail) timm_efficientdet
    • (fail) timm_nfnet
    • (pass) stable_diffusion_unet
    • (pass) timm_vision_transformer_large
  • XLA:GPU (dynamo): Inference (0, +3)

    • (pass) Super_SloMo
    • (pass) detectron2_fcos_r_50_fpn
    • (pass) doctr_reco_predictor
  • XLA:GPU (dynamo): Training (0, +2)

    • (pass) Super_SloMo
    • (pass) timm_nfnet

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (May 20 ~ May 24):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 66 (last: 66)
Non-Dynamo 77 (last: 77) 63 (last: 61)
Dynamo 78 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 77 (last: 77) 61 (last: 59)
Dynamo 78 (last: 78) 52 (last: 52)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Training (-5, +2)
    • (pass) Super_SloMo #7067
    • (pass) timm_efficientdet #7091

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Jun 10, 2024

Weekly update (June 3 ~ June 6):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 65 (last: 66)
Non-Dynamo 79 (last: 77) 61 (last: 63)
Dynamo 79 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 64 (last: 65)
Non-Dynamo 79 (last: 77) 60 (last: 61)
Dynamo 79 (last: 78) 52 (last: 52)

Models Summary (A100)

  • Inductor: Training (-1, +0)

    • (fail) dlrm
  • XLA:GPU (non-dynamo): Inference (-0, +2)

  • XLA:GPU (non-dynamo): Training (-3, +1)

    • (pass) timm_nfnet #7130
    • (fail) drq #7247
    • (fail) stable_diffusion_unet: OOM
    • (fail) timm_vision_transformer_large: OOM
  • XLA:GPU (dynamo): Inference (-0, +1)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

ysiraichi commented Jun 17, 2024

Weekly update (June 10 ~ June 14):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 82 (last: 82) 65 (last: 65)
Non-Dynamo 79 (last: 79) 63 (last: 61)
Dynamo 79 (last: 79) 55 (last: 55)

L4

Inference Training
Inductor 82 (last: 82) 64 (last: 64)
Non-Dynamo 79 (last: 79) 61 (last: 60)
Dynamo 79 (last: 79) 52 (last: 52)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Training (-1, +3)
    • (pass) drq
    • (pass) stable_diffusion_unet
    • (pass) timm_vision_transformer_large
    • (fail) timm_nfnet #7271

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (June 17 ~ June 21):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 82) 65 (last: 65)
Non-Dynamo 78 (last: 79) 63 (last: 63)
Dynamo 78 (last: 79) 55 (last: 55)

L4

Inference Training
Inductor 81 (last: 82) 64 (last: 64)
Non-Dynamo 78 (last: 79) 61 (last: 61)
Dynamo 78 (last: 79) 52 (last: 52)

Models Summary (A100)


PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (June 24 ~ June 28):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 74 (last: 81) 60 (last: 65)
Non-Dynamo 73 (last: 78) 60 (last: 63)
Dynamo 72 (last: 78) 54 (last: 55)

L4

Inference Training
Inductor 74 (last: 81) 59 (last: 64)
Non-Dynamo 73 (last: 78) 58 (last: 61)
Dynamo 72 (last: 78) 51 (last: 52)

Models Summary (A100)

  • Inductor: Inference (-7, +0)

    • (fail) doctr_det_predictor (likely due to newer PyTorch/benchmark commit)
    • (fail) doctr_reco_predictor (likely due to newer PyTorch/benchmark commit)
    • (fail) hf_T5 (likely due to newer PyTorch/benchmark commit)
    • (fail) hf_T5_base (likely due to newer PyTorch/benchmark commit)
    • (fail) hf_T5_large (likely due to newer PyTorch/benchmark commit)
    • (fail) moco (caused by #7321)
    • (fail) soft_actor_critic (likely NumPy 2.0 issue)
  • Inductor: Training (-5, +0)

    • (fail) hf_T5
    • (fail) hf_T5_base
    • (fail) hf_T5_large
    • (fail) moco
    • (fail) soft_actor_critic
  • XLA:GPU (non-dynamo): Inference (-6, +1)

    • (fail) doctr_det_predictor
    • (fail) doctr_reco_predictor
    • (fail) hf_T5
    • (fail) hf_T5_base
    • (fail) hf_T5_large
    • (fail) soft_actor_critic
    • (pass) moco
  • XLA:GPU (non-dynamo): Training -4, +1)

    • (fail) hf_T5
    • (fail) hf_T5_base
    • (fail) hf_T5_large
    • (fail) soft_actor_critic
    • (pass) moco
  • XLA:GPU (dynamo): Inference (-6, +0)

    • (fail) doctr_det_predictor
    • (fail) doctr_reco_predictor
    • (fail) hf_T5
    • (fail) hf_T5_base
    • (fail) hf_T5_large
    • (fail) soft_actor_critic
  • XLA:GPU (dynamo): Training -1, +0)

    • (fail) soft_actor_critic

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (July 1 ~ July 5):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 74) 66 (last: 60)
Non-Dynamo 78 (last: 73) 64 (last: 60)
Dynamo 78 (last: 72) 55 (last: 54)

L4

Inference Training
Inductor 81 (last: 74) 65 (last: 59)
Non-Dynamo 78 (last: 73) 62 (last: 58)
Dynamo 78 (last: 72) 52 (last: 51)

Models Summary (A100)

  • Inductor: Inference (-0, +7)

    • (pass) doctr_det_predictor (likely due to old PyTorch/benchmark commit)
    • (pass) doctr_reco_predictor (likely due to old PyTorch/benchmark commit)
    • (pass) hf_T5 (likely due to old PyTorch/benchmark commit)
    • (pass) hf_T5_base (likely due to old PyTorch/benchmark commit)
    • (pass) hf_T5_large (likely due to old PyTorch/benchmark commit)
    • (pass) moco (caused by #7598)
    • (pass) soft_actor_critic (likely due to old PyTorch/benchmark commit)
  • Inductor: Training (-0, +6)

    • (pass) dlrm
    • (pass) hf_T5
    • (pass) hf_T5_base
    • (pass) hf_T5_large
    • (pass) moco
    • (pass) soft_actor_critic
  • XLA:GPU (non-dynamo): Inference (-1, +6)

    • (pass) doctr_det_predictor
    • (pass) doctr_reco_predictor
    • (pass) hf_T5
    • (pass) hf_T5_base
    • (pass) hf_T5_large
    • (pass) soft_actor_critic
    • (fail) moco (needs newer torchbench)
  • XLA:GPU (non-dynamo): Training (-1, +5)

    • (pass) hf_T5
    • (pass) hf_T5_base
    • (pass) hf_T5_large
    • (pass) soft_actor_critic
    • (pass) timm_nfnet (fixed in #7602)
    • (fail) moco (needs newer torchbench)
  • XLA:GPU (dynamo): Inference (-0, +6)

    • (pass) doctr_det_predictor
    • (pass) doctr_reco_predictor
    • (pass) hf_T5
    • (pass) hf_T5_base
    • (pass) hf_T5_large
    • (pass) soft_actor_critic
  • XLA:GPU (dynamo): Training -1, +0)

    • (pass) soft_actor_critic

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (July 8 ~ July 12):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 75 (last: 78) 61 (last: 64)
Dynamo 75 (last: 78) 52 (last: 55)

L4

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 75 (last: 78) 59 (last: 62)
Dynamo 75 (last: 78) 49 (last: 52)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (-3, +0)

    • (fail) hf_Bart
    • (fail) nanogpt
    • (fail) torch_multimodal_clip
  • XLA:GPU (non-dynamo): Training (-3, +0)

    • (fail) hf_Bart
    • (fail) nanogpt
    • (fail) torch_multimodal_clip
  • XLA:GPU (dynamo): Inference (-3, +0)

    • (fail) hf_Bart
    • (fail) nanogpt
    • (fail) torch_multimodal_clip
  • XLA:GPU (dynamo): Training (-3, +0)

    • (fail) hf_Bart
    • (fail) nanogpt
    • (fail) torch_multimodal_clip

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (July 15 ~ July 19):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 78 (last: 75) 64 (last: 61)
Dynamo 78 (last: 75) 55 (last: 52)

L4

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 78 (last: 75) 62 (last: 59)
Dynamo 78 (last: 75) 52 (last: 49)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (-0, +3)

    • (pass) hf_Bart
    • (pass) nanogpt
    • (pass) torch_multimodal_clip
  • XLA:GPU (non-dynamo): Training (-0, +3)

    • (pass) hf_Bart
    • (pass) nanogpt
    • (pass) torch_multimodal_clip
  • XLA:GPU (dynamo): Inference (-0, +3)

    • (pass) hf_Bart
    • (pass) nanogpt
    • (pass) torch_multimodal_clip
  • XLA:GPU (dynamo): Training (-0, +3)

    • (pass) hf_Bart
    • (pass) nanogpt
    • (pass) torch_multimodal_clip

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (July 22 ~ July 26):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 81 (last: 81) 66 (last: 66)
Non-Dynamo 77 (last: 78) 64 (last: 64)
Dynamo 78 (last: 78) 55 (last: 55)

L4

Inference Training
Inductor 81 (last: 81) 65 (last: 65)
Non-Dynamo 78 (last: 78) 62 (last: 62)
Dynamo 78 (last: 78) 52 (last: 52)

Models Summary (A100)

  • XLA:GPU (non-dynamo): Inference (-1, +0)
    • (fail) doctr_reco_predictor: timeout

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (July 29 ~ Aug 9):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 77 (last: 81) 66 (last: 66)
Non-Dynamo 78 (last: 77) 63 (last: 64)
Dynamo 77 (last: 78) 52 (last: 55)

L4

Inference Training
Inductor 77 (last: 81) 65 (last: 65)
Non-Dynamo 78 (last: 78) 62 (last: 62)
Dynamo 77 (last: 78) 45 (last: 52)

Models Summary (A100)

  • Inductor: Inference (-4, +0)

    • (fail) cm3leon_generate (likely due to CUDAGraphs introduction #7749)
    • (fail) hf_T5_generate (likely due to CUDAGraphs introduction #7749)
    • (fail) llama (likely due to CUDAGraphs introduction #7749)
    • (fail) maml (likely due to CUDAGraphs introduction #7749)
  • XLA:GPU (dynamo): Inference (-1, +0)

    • (fail) hf_BigBird
  • XLA:GPU (dynamo): Training (-4, +0)

    • (fail) Background_Matting: OOM
    • (fail) hf_BigBird
    • (fail) timm_nfnet: OOM

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

@ysiraichi
Copy link
Collaborator Author

Weekly update (Aug 12 ~ Aug 16):

Pass rate (out of 99 benchmarks):

A100

Inference Training
Inductor 77 (last: 77) 66 (last: 66)
Non-Dynamo 78 (last: 78) 63 (last: 63)
Dynamo 77 (last: 77) 52 (last: 52)

L4

Inference Training
Inductor 77 (last: 77) 65 (last: 65)
Non-Dynamo 78 (last: 78) 62 (last: 62)
Dynamo 77 (last: 77) 44 (last: 45)

PRs merged. For an updated list see [XLA, pytorch/benchmarks, pytorch/pytorch]

PRs in flight. For an updated list see [XLA, pytorch/pytorch, pytorch/benchmarks]

Issues identified that the PRs in flight do not fix. For an updated list see [XLA, pytorch/pytorch]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants