Skip to content

Conversation

@ammallya
Copy link
Contributor

Updating submodules for rocm-libraries

@ScottTodd
Copy link
Member

RCCL test failures? Are these preexisting or new?

https://github.com/ROCm/TheRock/actions/runs/19459545971/job/55706501212?pr=2184#step:9:1284

=================================== FAILURES ===================================
______________ TestRCCL.test_rccl_correctness_tests[gather_perf] _______________

self = <test_rccl.TestRCCL object at 0x7fbf2f4550d0>, executable = 'gather_perf'

    @pytest.mark.parametrize(
        "executable",
        [
            "all_gather_perf",
            "alltoallv_perf",
            "broadcast_perf",
            "alltoall_perf",
            "all_reduce_perf",
            "reduce_perf",
            "hypercube_perf",
            "gather_perf",
            "scatter_perf",
            "sendrecv_perf",
            "reduce_scatter_perf",
        ],
    )
    def test_rccl_correctness_tests(self, executable):
        cmd = [f"{THEROCK_BIN_DIR}/{executable}"]
        logging.info(f"++ Exec [{THEROCK_DIR}]$ {shlex.join(cmd)}")
        result = subprocess.run(
            cmd,
            cwd=THEROCK_DIR,
            check=False,
        )
>       assert result.returncode == 0
E       AssertionError: assert -6 == 0
E        +  where -6 = CompletedProcess(args=['./build/bin/gather_perf'], returncode=-6).returncode

build_tools/github_actions/test_executable_scripts/test_rccl.py:51: AssertionError
------------------------------ Captured log call -------------------------------
INFO     root:test_rccl.py:45 ++ Exec [/__w/TheRock/TheRock]$ ./build/bin/gather_perf
=========================== short test summary info ============================
FAILED build_tools/github_actions/test_executable_scripts/test_rccl.py::TestRCCL::test_rccl_correctness_tests[gather_perf] - AssertionError: assert -6 == 0
 +  where -6 = CompletedProcess(args=['./build/bin/gather_perf'], returncode=-6).returncode
=================== 1 failed, 11 passed in 286.10s (0:04:46) ===================

Some prior runs on main:

I guess we can re-run that job once the other jobs complete to check.

@ScottTodd
Copy link
Member

This now has merge conflicts with 366cb76

@ammallya ammallya closed this Nov 19, 2025
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Nov 19, 2025
@ammallya ammallya reopened this Nov 19, 2025
@ammallya ammallya closed this Nov 19, 2025
ammallya added a commit that referenced this pull request Nov 19, 2025
Rebasing PR #2184 to solve merge
conflict.

All CI tests passed including re run of previously failed rccl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants