Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes for SSE detection and propagation #24

Merged
merged 1 commit into from
Oct 10, 2018

Conversation

pietern
Copy link
Contributor

@pietern pietern commented Oct 10, 2018

On a machine without SSE4.1 the HAVE_SSE flag would still be set.
Because CFLAGS is set to include -msse4.2, the compiler happily
generates SSE 4.2 instructions. Running any resulting SSE-enabled
binary would then result in an illegal instruction error.

The HAVE_SSE check now checks for the presence of one of the SSE 4.1
instructions that is used in the SSE enabled convolutional decoder.
The check must run with -march=native to ensure it checks against the
host machine capabilities.

The HAVE_SSE definition is now propagated to downstream targets that
depend on libcorrect. This means they can now ifdef on HAVE_SSE to
decide whether or not to include libcorrect's SSE specific header.

Confirmed that the HAVE_SSE check now fails on an old machine without
SSE 4.1 (but with SSE 3 and SSSE 3).

@pietern
Copy link
Contributor Author

pietern commented Oct 10, 2018

Please hold off on merging... this is broken.

On a machine without SSE4.1 the HAVE_SSE flag would still be set.
Because CFLAGS is set to include -msse4.2, the compiler happily
generates SSE 4.2 instructions. Running any resulting SSE-enabled
binary would then result in an illegal instruction error.

The HAVE_SSE check now checks for the presence of one of the SSE 4.1
instructions that is used in the SSE enabled convolutional decoder.
The check must run with -march=native to ensure it checks against the
host machine capabilities.

The HAVE_SSE definition is now propagated to downstream targets that
depend on libcorrect. This means they can now ifdef on HAVE_SSE to
decide whether or not to include libcorrect's SSE specific header.

Confirmed that the HAVE_SSE check now fails on an old machine without
SSE 4.1 (but with SSE 3 and SSSE 3).
@pietern pietern changed the title Check for the correct SSE version @pietern Fixes for SSE detection and propagation Oct 10, 2018
@pietern pietern changed the title @pietern Fixes for SSE detection and propagation Fixes for SSE detection and propagation Oct 10, 2018
@pietern
Copy link
Contributor Author

pietern commented Oct 10, 2018

No longer broken :-)

@brian-armstrong
Copy link
Member

thank you, this is great. i have also considered putting runtime checks for SSE into the code itself, and i do think that's still worth doing, but i'm glad to merge this now.

@brian-armstrong brian-armstrong merged commit 0ba9330 into quiet:master Oct 10, 2018
@pietern pietern deleted the sse41-check branch October 11, 2018 03:35
@pietern
Copy link
Contributor Author

pietern commented Oct 11, 2018

Thanks @brian-armstrong!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants