Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD integration #217

Closed
hadronized opened this issue Jan 31, 2017 · 10 comments
Closed

SIMD integration #217

hadronized opened this issue Jan 31, 2017 · 10 comments

Comments

@hadronized
Copy link
Contributor

Ohai!

Are you interested in having a SIMD support? Because I truly am, and I might work on it and push a PR if you feel it’s an interesting feature to add to nalgebra.

@milibopp
Copy link
Collaborator

@phaazon are you still interested in doing this? I think @sebcrozet has been busy lately, but I would not mind looking at your code and eventually merging it.

I assume, this is not going to change any public API, but only change implementation details to make use of SIMD, right? It would be nice to get a bit more into detail what you have in mind.

@hadronized
Copy link
Contributor Author

hadronized commented Jul 28, 2017

Hi.

No, I’m not. I should have updated the issue. nalgebra has changed too much and I don’t like the new design (the infinite type aliasing based on Row or I don’t remember what). I migrated my codebase to cgmath.

@sebcrozet
Copy link
Member

I've actually been working a lot on performances of nalgebra lately (so much that I did not maintain very actively existing issue/PRs). My observations is that explicit SIMD integration is currently not worth the effort so I'd prefer to postpone this until SIMD becomes stable in rust. I'll communicate more about this next week but the next major version of nalgebra will be as fast as the SIMD version of cgmath (except that we don't need to use SIMD intrinsics nor the simd crate).

@hadronized
Copy link
Contributor Author

hadronized commented Jul 29, 2017

I’ll be waiting for the benchmarks ;)

@brendanzab
Copy link
Contributor

What is the un-optimized performance like though? The advantage of explicit simd might be that debug builds might be faster.

@sebcrozet
Copy link
Member

@brendanzab You're right, the performance difference is significant for debug builds (SIMD cgmath is faster):

test lowdim::inverse::mat2_inverse_cgmath                     ... bench:         618 ns/iter (+/- 24)
test lowdim::inverse::mat2_inverse_na                         ... bench:       2,293 ns/iter (+/- 1,027)
test lowdim::inverse::mat3_inverse_cgmath                     ... bench:       2,282 ns/iter (+/- 768)
test lowdim::inverse::mat3_inverse_na                         ... bench:       4,793 ns/iter (+/- 853)
test lowdim::inverse::mat4_inverse_cgmath                     ... bench:      13,300 ns/iter (+/- 404)
test lowdim::inverse::mat4_inverse_na                         ... bench:      17,899 ns/iter (+/- 2,758)
test lowdim::product::mat2_mul_mat2_cgmath                    ... bench:         848 ns/iter (+/- 156)
test lowdim::product::mat2_mul_mat2_na                        ... bench:       5,468 ns/iter (+/- 426)
test lowdim::product::mat2_mul_vec2_cgmath                    ... bench:         499 ns/iter (+/- 111)
test lowdim::product::mat2_mul_vec2_na                        ... bench:       2,924 ns/iter (+/- 318)
test lowdim::product::mat3_mul_mat3_cgmath                    ... bench:       1,857 ns/iter (+/- 27)
test lowdim::product::mat3_mul_mat3_na                        ... bench:      12,486 ns/iter (+/- 1,639)
test lowdim::product::mat3_mul_vec3_cgmath                    ... bench:         636 ns/iter (+/- 12)
test lowdim::product::mat3_mul_vec3_na                        ... bench:       4,378 ns/iter (+/- 165)
test lowdim::product::mat4_mul_mat4_cgmath                    ... bench:       2,126 ns/iter (+/- 639)
test lowdim::product::mat4_mul_mat4_na                        ... bench:      26,624 ns/iter (+/- 5,408)
test lowdim::product::mat4_mul_vec4_cgmath                    ... bench:       1,123 ns/iter (+/- 108)
test lowdim::product::mat4_mul_vec4_na                        ... bench:       7,282 ns/iter (+/- 1,195)
test lowdim::product::vec2_dot_vec2_cgmath                    ... bench:         155 ns/iter (+/- 27)
test lowdim::product::vec2_dot_vec2_na                        ... bench:         982 ns/iter (+/- 113)
test lowdim::product::vec3_dot_vec3_cgmath                    ... bench:         121 ns/iter (+/- 29)
test lowdim::product::vec3_dot_vec3_na                        ... bench:       1,439 ns/iter (+/- 133)
test lowdim::product::vec4_dot_vec4_cgmath                    ... bench:         119 ns/iter (+/- 17)
test lowdim::product::vec4_dot_vec4_na                        ... bench:       1,898 ns/iter (+/- 183)

Here is the same benchmark with optimizations turned on:

test lowdim::inverse::mat2_inverse_cgmath                     ... bench:          15 ns/iter (+/- 0)
test lowdim::inverse::mat2_inverse_na                         ... bench:          15 ns/iter (+/- 0)
test lowdim::inverse::mat3_inverse_cgmath                     ... bench:          38 ns/iter (+/- 0)
test lowdim::inverse::mat3_inverse_na                         ... bench:          37 ns/iter (+/- 0)
test lowdim::inverse::mat4_inverse_cgmath                     ... bench:          84 ns/iter (+/- 1)
test lowdim::inverse::mat4_inverse_na                         ... bench:          70 ns/iter (+/- 1)
test lowdim::product::mat2_mul_mat2_cgmath                    ... bench:           4 ns/iter (+/- 0)
test lowdim::product::mat2_mul_mat2_na                        ... bench:           4 ns/iter (+/- 0)
test lowdim::product::mat2_mul_vec2_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::mat2_mul_vec2_na                        ... bench:           1 ns/iter (+/- 0)
test lowdim::product::mat3_mul_mat3_cgmath                    ... bench:          13 ns/iter (+/- 0)
test lowdim::product::mat3_mul_mat3_na                        ... bench:          13 ns/iter (+/- 0)
test lowdim::product::mat3_mul_vec3_cgmath                    ... bench:           5 ns/iter (+/- 0)
test lowdim::product::mat3_mul_vec3_na                        ... bench:           5 ns/iter (+/- 0)
test lowdim::product::mat4_mul_mat4_cgmath                    ... bench:          21 ns/iter (+/- 0)
test lowdim::product::mat4_mul_mat4_na                        ... bench:          20 ns/iter (+/- 0)
test lowdim::product::mat4_mul_vec4_cgmath                    ... bench:           6 ns/iter (+/- 0)
test lowdim::product::mat4_mul_vec4_na                        ... bench:           6 ns/iter (+/- 0)
test lowdim::product::vec2_dot_vec2_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec2_dot_vec2_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec3_dot_vec3_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec3_dot_vec3_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec4_dot_vec4_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec4_dot_vec4_na                        ... bench:           0 ns/iter (+/- 0)

I will post the source code of those benchmarks at the same time as my communication next week.

@milibopp
Copy link
Collaborator

Okay, given that optimizations take care of this any way, I would question, whether the additional code complexity is worth adding the explicit SIMD code.

@sebcrozet
Copy link
Member

@aepsil0n I don't think the additional code complexity is worth it for now. I'm closing this for now. We might want to reexplore the question when SIMD becomes stable in Rust.

Here is the communication I was mentioning earlier in this thread: #274 .

@happenslol
Copy link

Hey, not sure if it's alright to revive this issue or if I should be posting a new one, but I wanted to poke this topic after this much time has passed. I've recently run some benchmarks that have appeared earlier in this thread again, and the performance difference between this crate and cgmath is still staggering:
benchmarks

Especially on opt level 0, the difference is pretty huge. I've been working with the amethyst engine a bit, and we've been getting pretty significant performance drops in debug mode, which is probably related to this.

SIMD hasn't fully stabilized, but it's come a long way and I feel like it would be time to take another look at this.

@sebcrozet
Copy link
Member

Thank you for sharing your concerns @happenslol! I have created a new issue to discuss this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants