Skip to content

[shared/mxdatagenerator] Use a saner way to select number of threads#673

Merged
jaopaulolc merged 1 commit into
developfrom
joao/mxdatagen_missed_commit
Jul 16, 2025
Merged

[shared/mxdatagenerator] Use a saner way to select number of threads#673
jaopaulolc merged 1 commit into
developfrom
joao/mxdatagen_missed_commit

Conversation

@jaopaulolc
Copy link
Copy Markdown
Contributor

This change was missed during the migration.

@jaopaulolc jaopaulolc requested a review from a team as a code owner July 15, 2025 16:27
@jaopaulolc jaopaulolc changed the title Use a saner way to select number of threads [shared/mxdatagenerator] Use a saner way to select number of threads Jul 15, 2025
@jaopaulolc jaopaulolc merged commit 02df0ea into develop Jul 16, 2025
8 of 9 checks passed
@jaopaulolc jaopaulolc deleted the joao/mxdatagen_missed_commit branch July 16, 2025 16:26
assistant-librarian Bot pushed a commit to ROCm/mxDataGenerator that referenced this pull request Jul 16, 2025
[shared/mxdatagenerator] Use a saner way to select number of threads
ammallya pushed a commit that referenced this pull request Jul 24, 2025
[shared/mxdatagenerator] Use a saner way to select number of threads


[ROCm/mxDataGenerator commit: 31407ef]
ammallya pushed a commit to ROCm/mxDataGenerator that referenced this pull request Jan 13, 2026
[shared/mxdatagenerator] Use a saner way to select number of threads


[ROCm/mxDataGenerator commit: 31407ef]
DDEle added a commit that referenced this pull request Apr 21, 2026
## Motivation

The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`)
inherit from the `rocm/pytorch` base image. Recent updates to that base
image dropped the `render` group from `/etc/group`, so every parallel
test stage now fails on the test agents with:

```
docker: Error response from daemon: Unable to find group render:
no matching entries in group file.
```

Jenkins resolves `--group-add render` against the **container's**
`/etc/group`, not the host's, so even though the test agents have render
in their `/etc/group` (GID 109), the container lookup fails.

This pattern affects every recent develop build
([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673),
[#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674),
[#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686),
[#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688),
[#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699),
[#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708)
— 6 days in a row), where AITER tests fail in seconds and the cascading
failure aborts all downstream Build/FMHA/TILE_ENGINE stages.

## Technical Details

Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`,
mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and
what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag
makes it idempotent — silently succeeds if the group already exists.

This guarantees the `render` group is always present in the container,
regardless of whether the base image happens to ship it.

## Test Plan
Triggering AITER CI job: 

## Test Result

## Submission Checklist

- [x] Look over the contributing guidelines at

https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
aledudek pushed a commit that referenced this pull request May 20, 2026
## Motivation

The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`)
inherit from the `rocm/pytorch` base image. Recent updates to that base
image dropped the `render` group from `/etc/group`, so every parallel
test stage now fails on the test agents with:

```
docker: Error response from daemon: Unable to find group render:
no matching entries in group file.
```

Jenkins resolves `--group-add render` against the **container's**
`/etc/group`, not the host's, so even though the test agents have render
in their `/etc/group` (GID 109), the container lookup fails.

This pattern affects every recent develop build
([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673),
[#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674),
[#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686),
[#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688),
[#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699),
[#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708)
— 6 days in a row), where AITER tests fail in seconds and the cascading
failure aborts all downstream Build/FMHA/TILE_ENGINE stages.

## Technical Details

Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`,
mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and
what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag
makes it idempotent — silently succeeds if the group already exists.

This guarantees the `render` group is always present in the container,
regardless of whether the base image happens to ship it.

## Test Plan
Triggering AITER CI job: 

## Test Result

## Submission Checklist

- [x] Look over the contributing guidelines at

https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants