Add a 2-slice pallas training test in pre-submit CI #8850

tengyifei · 2025-03-18T22:05:48Z

We should have a test that trains a very simple model with a pallas kernel across two slices of TPUv4 and checks that it doesn't hang.

Currently our pre-submit CI only runs things on 1 slice of TPUv4 and that doesn't cover cases like multi-slice training.

Post-submit CI requires human diligence to monitor and revert changes, which has proven to be ineffective. As long as we can afford it, we should test things in pre-submit and not post-submit.

ysiraichi added testing Testing and coverage related issues. xla:tpu TPU specific issues and PRs labels Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a 2-slice pallas training test in pre-submit CI #8850

Add a 2-slice pallas training test in pre-submit CI #8850

tengyifei commented Mar 18, 2025

Add a 2-slice pallas training test in pre-submit CI #8850

Add a 2-slice pallas training test in pre-submit CI #8850

Comments

tengyifei commented Mar 18, 2025