Skip to content

Conversation

@ggouaillardet
Copy link
Contributor

in order to work around a bug in older gcc versions on x86_64,
__atomic_thread_fence (__ATOMIC_SEQ_CST)
was replaced with
__atomic_thread_fence (__ATOMIC_ACQUIRE)
based on the asumption that this did not introduce performance regressions.

It was recently found that this did introduce some performance regression,
mainly at scale on fat nodes.

So simply use an asm memory globber to both workaround older gcc bugs
and fix the performance regression.

Thanks S. Biplab Raut for bringing this issue to our attention.

Refs. #8603

Signed-off-by: Gilles Gouaillardet [email protected]

(cherry picked from commit d7e3f87)

in order to work around a bug in older gcc versions on x86_64,
__atomic_thread_fence (__ATOMIC_SEQ_CST)
was replaced with
__atomic_thread_fence (__ATOMIC_ACQUIRE)
based on the asumption that this did not introduce performance regressions.

It was recently found that this did introduce some performance regression,
mainly at scale on fat nodes.

So simply use an asm memory globber to both workaround older gcc bugs
and fix the performance regression.

Thanks S. Biplab Raut for bringing this issue to our attention.

Refs. open-mpi#8603

Signed-off-by: Gilles Gouaillardet <[email protected]>

(cherry picked from commit d7e3f87)
@ggouaillardet ggouaillardet force-pushed the topic/v4.1.x/gcc_builtin_workaround branch from 0ca05eb to 0c02983 Compare March 16, 2021 04:44
@jsquyres jsquyres changed the title gcc_builtin: fix performance regression on x86_64 v4.1.x: gcc_builtin: fix performance regression on x86_64 Mar 16, 2021
@jsquyres
Copy link
Member

bot:aws:retest

Copy link
Member

@jsquyres jsquyres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge by proxy from the other release branches, and from discussion on the 16 Mar 2021 Webex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants