[ROCm][CI] Fix AMD pipeline generator command wrapping, gating, and git diff behavior#347
[ROCm][CI] Fix AMD pipeline generator command wrapping, gating, and git diff behavior#347AndreasKaratzas wants to merge 6 commits into
Conversation
…it diff behavior Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
AndreasKaratzas
left a comment
There was a problem hiding this comment.
@khluu These changes affect the upstream CI logic too, so check them out if you like them or if I should remove them.
| merge_base = merge_base_commit | ||
| if not merge_base: | ||
| pass | ||
| raise RuntimeError("Failed to determine merge base commit for git diff.") |
There was a problem hiding this comment.
@khluu Lmk if I should revert this part. I added it cause I thought that we should not silently pass right?
| def _get_setup_commands(step: Step, setup_profile: SetupProfile) -> List[str]: | ||
| if step.label.startswith(":docker:") or step.no_plugin or setup_profile == "none": | ||
| return [] | ||
|
|
||
| if setup_profile == "nvidia": | ||
| return [ | ||
| "echo '--- :nvidia: GPU Info'", | ||
| "(command nvidia-smi || true)", | ||
| "echo '--- :gear: CUDA Coredump Setup'", | ||
| "export CUDA_ENABLE_COREDUMP_ON_EXCEPTION=1 && export CUDA_COREDUMP_SHOW_PROGRESS=1 && export CUDA_COREDUMP_GENERATION_FLAGS='skip_nonrelocated_elf_images,skip_global_memory,skip_shared_memory,skip_local_memory,skip_constbank_memory'", | ||
| ] | ||
|
|
||
| if setup_profile == "amd": | ||
| return [ | ||
| "echo '--- :amd: GPU Info'", | ||
| "(command amd-smi || true)", | ||
| ] | ||
|
|
||
| raise ValueError(f"Unsupported setup profile: {setup_profile}") |
There was a problem hiding this comment.
Added different profile for AMD and that also gives flexibility for anyone else.
| def _matches_source_dependency(source_file: str, diff_file: str) -> bool: | ||
| normalized = source_file.rstrip("/") | ||
| if not normalized: | ||
| return False | ||
| return diff_file == normalized or diff_file.startswith(f"{normalized}/") | ||
|
|
||
|
|
||
| def _step_is_blocked(step: Step, list_file_diff: List[str]) -> bool: | ||
| global_config = get_global_config() | ||
| return (not _step_should_run(step, list_file_diff) or ( | ||
| step.optional and global_config["nightly"] != "1" | ||
| )) | ||
|
|
||
|
|
There was a problem hiding this comment.
The generator used raw substring matching:
if source_file in diff_file
That is too loose. Examples:
vllm/ would match notvllm/foo.py
src/file would match src/file_extra.py
The helper changes that to proper exact-or-prefix path matching:
exact file match: diff_file == normalized
directory prefix match: diff_file.startswith(f"{normalized}/")
So this one exists to make gating correct.
_step_is_blocked() is mostly a cleanup/helper.
I added it because the “should this step be behind a manual block?” logic was being repeated.
…s_fix # Conflicts: # buildkite/pipeline_generator/buildkite_step.py
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
…tream vllm Signed-off-by: Andreas Karatzas <akaratza@amd.com>
khluu
left a comment
There was a problem hiding this comment.
can you also break this down to smaller PRs?
| if branch == "main": | ||
| output = subprocess.check_output( | ||
| ["git", "diff", "--name-only", "--diff-filter=ACMDR", "HEAD~1"], | ||
| ["git", "diff", "--name-only", "--diff-filter=ACMDR", "HEAD~1", "HEAD"], |
There was a problem hiding this comment.
why are we adding HEAD at the end?
This PR hardens AMD support in the Python pipeline generator and addresses a few related generator issues.
buildkite/bootstrap-amd.shAMD_USE_PIPELINE_GENERATOR=0.bootstrap-amd.shstill uses the dedicated Jinja path.buildkite/pipeline_generator/buildkite_step.pybash .buildkite/scripts/hardware_ci/run-amd-test.shVLLM_TEST_COMMANDSsetup_profile="nvidia" | "amd" | "none"_get_setup_commands(...)so setup logic lives in one place.nvidia-smisetup_profile="amd"amd-sminvidia-smibuildkite/pipeline_generator/utils_lib/git_utils.pygit add .from the diff helper.HEAD~1..HEADonmainmerge-base..HEADon other branchesbuildkite/pipeline_generator/pyproject.tomlutilsmodule entry frompy-modules.buildkite/tests/pipeline_generator/test_step.pystep.pyAPI.sys.pathsetup.buildkite/tests/pipeline_generator/test_amd_mirror.pynvidiaamdnoneamd-smiis presentnvidia-smiis absentcc @kenroche