This commit #1115 broke this nightly test:
tests/test_suites/llm/dpo-mistral-nemo-instruct-2407-1n8g-fsdp2tp8-actckpt-long.sh
The issue appeared as an OOM, and I had narrowed it down to the transformers version. I believe this regression is the same one identified here where KV cache was suddenly treated as trainable:
huggingface/transformers#39795
The memory pressure on either dtensor path (v1,v2) is exacerbated when using higher TP(this test used 8), and long sequence lengths. In some settings I saw 4x more memory being used.
I have noticed by manually upgrading to 4.56, the memory is back to normal, but Automodel is not ready to upgrade yet, so RL has to disable this test for now.