[CI][NIXL] Split DPEP tests#31491
Conversation
There was a problem hiding this comment.
Code Review
This pull request effectively separates the Tensor Parallelism (TP) and Data Parallelism/Expert Parallelism (DPEP) tests into distinct CI jobs. This is achieved by modifying the .buildkite/test-pipeline.yaml to include a new job for DPEP tests and refactoring the config_sweep_accuracy_test.sh script to select the appropriate test configurations based on the DP_EP environment variable. The changes are well-structured and correctly implement the separation of tests, which should improve CI parallelism and resource utilization. The implementation looks solid.
| if [[ -n "${DP_EP:-}" ]]; then | ||
| configs=("${dp_ep_configs[@]}") | ||
| echo "DP_EP is set, using dp_ep_configs" | ||
| else | ||
| configs=("${tp_configs[@]}") | ||
| fi |
There was a problem hiding this comment.
While the logic for selecting test configurations is correct, there is a potential robustness issue in the run_tests function that follows this block (outside the diff). The function passes extra_args to the test script without quotes:
if ! env ${cfg} bash "${SCRIPT}" ${extra_args}; thenThis will cause issues if extra_args ever contains arguments with spaces, due to shell word splitting. This is a latent bug that could cause future test failures.
To make the script more robust, I recommend modifying run_tests to handle arguments as an array. For example:
run_tests() {
local label=$1
shift
local extra_args=("$@")
# ...
if ! env ${cfg} bash "${SCRIPT}" "${extra_args[@]}"; then
# ...
}
# ...
run_tests "FLASHINFER backend" --attention-backend FLASHINFERSince this is outside the diff, I'm adding this comment here for visibility. Addressing this would improve the script's maintainability.
ab1af6a
into
vllm-project:main
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: shixingliang.sxl <shixingliang.sxl@antgroup.com>
Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Separate TP and DPEP end to end tests to better make use of parallel resources (when available), while also enabling DPEP to scale independently in num_gpus for future test scenarios.