Skip to content

FP8 enablement - add a pseudorandom number generator, add conversion methods#708

Merged
geyyer merged 42 commits into
developfrom
lwpck-726
Jun 19, 2023
Merged

FP8 enablement - add a pseudorandom number generator, add conversion methods#708
geyyer merged 42 commits into
developfrom
lwpck-726

Conversation

@geyyer
Copy link
Copy Markdown
Contributor

@geyyer geyyer commented May 11, 2023

Add a PRNG

@geyyer
Copy link
Copy Markdown
Contributor Author

geyyer commented May 11, 2023

Add fp8<->float conversion with standard rounding.

@geyyer geyyer changed the title FP8 enablement - add a pseudorandom number generator FP8 enablement - add a pseudorandom number generator, add conversion methods May 12, 2023
@geyyer
Copy link
Copy Markdown
Contributor Author

geyyer commented May 24, 2023

This PR adds:

  • fp8 type definition
  • pseudorandom number generator - the rocBLAS implementation
  • conversion functions fp8<->fp32 with standard rounding
  • conversion functions fp8<->fp32 with stochastic rounding
  • conversion functions fp8<->fp16 with standard rounding
  • conversion functions fp8<->fp16 with stochastic rounding
  • elementwise ops for fp8<->fp32, fp8<->fp16 with standard rounding (PassThrough)
  • elementwise ops for fp8<->fp32, fp8<->fp16 with stochastic rounding (ConvertF8SR)
  • tests for both conversions using both rounding modes

@geyyer geyyer marked this pull request as ready for review May 24, 2023 16:52
@geyyer geyyer requested review from aosewski, asroy and zjing14 May 24, 2023 16:52
Copy link
Copy Markdown
Collaborator

@aosewski aosewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good. I just have few observations regarding overall code structure. I'm not familiar with those bit-wise algorithms so I'm not even touching it :P

Comment thread include/ck/utility/f8_utils.hpp
Comment thread include/ck/utility/f8_utils.hpp Outdated
Comment thread include/ck/utility/f8_utils.hpp
Comment thread include/ck/utility/data_type.hpp Outdated
Comment thread include/ck/utility/data_type.hpp Outdated
@zjing14 zjing14 requested a review from aosewski May 30, 2023 12:19
@geyyer
Copy link
Copy Markdown
Contributor Author

geyyer commented Jun 9, 2023

Casting functions are still far from optimal, I'll work on their performance as we have a test case to track conversion performance.

Copy link
Copy Markdown
Collaborator

@aosewski aosewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for refactoring files! The only one thing which I don't feel is OK is the relation between type_convert.hpp and data_type.hpp headers. Please see comments,

Comment thread include/ck/utility/type_convert.hpp Outdated
Comment thread include/ck/utility/data_type.hpp Outdated
@geyyer
Copy link
Copy Markdown
Contributor Author

geyyer commented Jun 15, 2023

@aosewski, thanks for your review, I've refactored the type_convert.hpp, could you review the change?

@geyyer geyyer requested a review from aosewski June 15, 2023 18:35
Comment thread include/ck/utility/f8_utils.hpp Outdated
Comment on lines +8 to +9
using f8_t = uint8_t;
using half_t = _Float16;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I've overlooked this. Please move the fp8 type alias to data_type.hpp and remove the half_t type alias. As I said for such simple type aliases definitions I think we should keep all the data type definitions in data_type.hpp header. I imagine an exception to this rule only when we would define our own class for a specific data type.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 2ee1c0a

aosewski
aosewski previously approved these changes Jun 16, 2023
Copy link
Copy Markdown
Collaborator

@aosewski aosewski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've overlooked the place of f8_t type definition. Other than that looks good. I do not want to prolong the review process, thus I approve. But please move this type definition and everything should be fine then. Thanks!

@geyyer
Copy link
Copy Markdown
Contributor Author

geyyer commented Jun 16, 2023

@aosewski, looks like I've overridden your approval, could you approve one more time?

@geyyer geyyer requested a review from aosewski June 16, 2023 19:09
@geyyer geyyer merged commit f0c620c into develop Jun 19, 2023
@geyyer geyyer deleted the lwpck-726 branch December 14, 2023 15:41
assistant-librarian Bot pushed a commit that referenced this pull request Apr 21, 2026
[CK] Add render group to AITER and FA dockers
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

The AITER and FA test dockers (`Dockerfile.aiter`, `Dockerfile.fa`)
inherit from the `rocm/pytorch` base image. Recent updates to that base
image dropped the `render` group from `/etc/group`, so every parallel
test stage now fails on the test agents with:

```
docker: Error response from daemon: Unable to find group render:
no matching entries in group file.
```

Jenkins resolves `--group-add render` against the **container's**
`/etc/group`, not the host's, so even though the test agents have render
in their `/etc/group` (GID 109), the container lookup fails.

This pattern affects every recent develop build
([#673](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/673),
[#674](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/674),
[#686](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/686),
[#688](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/688),
[#699](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/699),
[#708](http://micimaster.amd.com/blue/organizations/jenkins/rocm-libraries-folder%2FComposable%20Kernel/detail/develop/708)
— 6 days in a row), where AITER tests fail in seconds and the cascading
failure aborts all downstream Build/FMHA/TILE_ENGINE stages.

## Technical Details

Add `groupadd -f render` to both `Dockerfile.aiter` and `Dockerfile.fa`,
mirroring what the main `Dockerfile` already does (`Dockerfile:96`) and
what `Dockerfile.pytorch` does (`Dockerfile.pytorch:4`). The `-f` flag
makes it idempotent — silently succeeds if the group already exists.

This guarantees the `render` group is always present in the container,
regardless of whether the base image happens to ship it.

## Test Plan
Triggering AITER CI job:

## Test Result

## Submission Checklist

- [x] Look over the contributing guidelines at

https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants