SIMD integration #217

hadronized · 2017-01-31T00:04:13Z

Ohai!

Are you interested in having a SIMD support? Because I truly am, and I might work on it and push a PR if you feel it’s an interesting feature to add to nalgebra.

milibopp · 2017-07-28T19:32:00Z

@phaazon are you still interested in doing this? I think @sebcrozet has been busy lately, but I would not mind looking at your code and eventually merging it.

I assume, this is not going to change any public API, but only change implementation details to make use of SIMD, right? It would be nice to get a bit more into detail what you have in mind.

hadronized · 2017-07-28T20:49:35Z

Hi.

No, I’m not. I should have updated the issue. nalgebra has changed too much and I don’t like the new design (the infinite type aliasing based on Row or I don’t remember what). I migrated my codebase to cgmath.

sebcrozet · 2017-07-29T02:34:12Z

I've actually been working a lot on performances of nalgebra lately (so much that I did not maintain very actively existing issue/PRs). My observations is that explicit SIMD integration is currently not worth the effort so I'd prefer to postpone this until SIMD becomes stable in rust. I'll communicate more about this next week but the next major version of nalgebra will be as fast as the SIMD version of cgmath (except that we don't need to use SIMD intrinsics nor the simd crate).

hadronized · 2017-07-29T12:02:06Z

I’ll be waiting for the benchmarks ;)

brendanzab · 2017-07-29T12:35:41Z

What is the un-optimized performance like though? The advantage of explicit simd might be that debug builds might be faster.

sebcrozet · 2017-07-29T16:22:32Z

@brendanzab You're right, the performance difference is significant for debug builds (SIMD cgmath is faster):

test lowdim::inverse::mat2_inverse_cgmath                     ... bench:         618 ns/iter (+/- 24)
test lowdim::inverse::mat2_inverse_na                         ... bench:       2,293 ns/iter (+/- 1,027)
test lowdim::inverse::mat3_inverse_cgmath                     ... bench:       2,282 ns/iter (+/- 768)
test lowdim::inverse::mat3_inverse_na                         ... bench:       4,793 ns/iter (+/- 853)
test lowdim::inverse::mat4_inverse_cgmath                     ... bench:      13,300 ns/iter (+/- 404)
test lowdim::inverse::mat4_inverse_na                         ... bench:      17,899 ns/iter (+/- 2,758)
test lowdim::product::mat2_mul_mat2_cgmath                    ... bench:         848 ns/iter (+/- 156)
test lowdim::product::mat2_mul_mat2_na                        ... bench:       5,468 ns/iter (+/- 426)
test lowdim::product::mat2_mul_vec2_cgmath                    ... bench:         499 ns/iter (+/- 111)
test lowdim::product::mat2_mul_vec2_na                        ... bench:       2,924 ns/iter (+/- 318)
test lowdim::product::mat3_mul_mat3_cgmath                    ... bench:       1,857 ns/iter (+/- 27)
test lowdim::product::mat3_mul_mat3_na                        ... bench:      12,486 ns/iter (+/- 1,639)
test lowdim::product::mat3_mul_vec3_cgmath                    ... bench:         636 ns/iter (+/- 12)
test lowdim::product::mat3_mul_vec3_na                        ... bench:       4,378 ns/iter (+/- 165)
test lowdim::product::mat4_mul_mat4_cgmath                    ... bench:       2,126 ns/iter (+/- 639)
test lowdim::product::mat4_mul_mat4_na                        ... bench:      26,624 ns/iter (+/- 5,408)
test lowdim::product::mat4_mul_vec4_cgmath                    ... bench:       1,123 ns/iter (+/- 108)
test lowdim::product::mat4_mul_vec4_na                        ... bench:       7,282 ns/iter (+/- 1,195)
test lowdim::product::vec2_dot_vec2_cgmath                    ... bench:         155 ns/iter (+/- 27)
test lowdim::product::vec2_dot_vec2_na                        ... bench:         982 ns/iter (+/- 113)
test lowdim::product::vec3_dot_vec3_cgmath                    ... bench:         121 ns/iter (+/- 29)
test lowdim::product::vec3_dot_vec3_na                        ... bench:       1,439 ns/iter (+/- 133)
test lowdim::product::vec4_dot_vec4_cgmath                    ... bench:         119 ns/iter (+/- 17)
test lowdim::product::vec4_dot_vec4_na                        ... bench:       1,898 ns/iter (+/- 183)

Here is the same benchmark with optimizations turned on:

test lowdim::inverse::mat2_inverse_cgmath                     ... bench:          15 ns/iter (+/- 0)
test lowdim::inverse::mat2_inverse_na                         ... bench:          15 ns/iter (+/- 0)
test lowdim::inverse::mat3_inverse_cgmath                     ... bench:          38 ns/iter (+/- 0)
test lowdim::inverse::mat3_inverse_na                         ... bench:          37 ns/iter (+/- 0)
test lowdim::inverse::mat4_inverse_cgmath                     ... bench:          84 ns/iter (+/- 1)
test lowdim::inverse::mat4_inverse_na                         ... bench:          70 ns/iter (+/- 1)
test lowdim::product::mat2_mul_mat2_cgmath                    ... bench:           4 ns/iter (+/- 0)
test lowdim::product::mat2_mul_mat2_na                        ... bench:           4 ns/iter (+/- 0)
test lowdim::product::mat2_mul_vec2_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::mat2_mul_vec2_na                        ... bench:           1 ns/iter (+/- 0)
test lowdim::product::mat3_mul_mat3_cgmath                    ... bench:          13 ns/iter (+/- 0)
test lowdim::product::mat3_mul_mat3_na                        ... bench:          13 ns/iter (+/- 0)
test lowdim::product::mat3_mul_vec3_cgmath                    ... bench:           5 ns/iter (+/- 0)
test lowdim::product::mat3_mul_vec3_na                        ... bench:           5 ns/iter (+/- 0)
test lowdim::product::mat4_mul_mat4_cgmath                    ... bench:          21 ns/iter (+/- 0)
test lowdim::product::mat4_mul_mat4_na                        ... bench:          20 ns/iter (+/- 0)
test lowdim::product::mat4_mul_vec4_cgmath                    ... bench:           6 ns/iter (+/- 0)
test lowdim::product::mat4_mul_vec4_na                        ... bench:           6 ns/iter (+/- 0)
test lowdim::product::vec2_dot_vec2_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec2_dot_vec2_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec3_dot_vec3_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec3_dot_vec3_na                        ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec4_dot_vec4_cgmath                    ... bench:           0 ns/iter (+/- 0)
test lowdim::product::vec4_dot_vec4_na                        ... bench:           0 ns/iter (+/- 0)

I will post the source code of those benchmarks at the same time as my communication next week.

milibopp · 2017-07-31T12:19:33Z

Okay, given that optimizations take care of this any way, I would question, whether the additional code complexity is worth adding the explicit SIMD code.

sebcrozet · 2017-08-02T20:59:59Z

@aepsil0n I don't think the additional code complexity is worth it for now. I'm closing this for now. We might want to reexplore the question when SIMD becomes stable in Rust.

Here is the communication I was mentioning earlier in this thread: #274 .

happenslol · 2018-11-21T20:51:34Z

Hey, not sure if it's alright to revive this issue or if I should be posting a new one, but I wanted to poke this topic after this much time has passed. I've recently run some benchmarks that have appeared earlier in this thread again, and the performance difference between this crate and cgmath is still staggering:
benchmarks

Especially on opt level 0, the difference is pretty huge. I've been working with the amethyst engine a bit, and we've been getting pretty significant performance drops in debug mode, which is probably related to this.

SIMD hasn't fully stabilized, but it's come a long way and I feel like it would be time to take another look at this.

sebcrozet · 2018-11-23T06:20:03Z

Thank you for sharing your concerns @happenslol! I have created a new issue to discuss this.

milibopp added the enhancement label Jul 28, 2017

sebcrozet closed this as completed Aug 2, 2017

sebcrozet mentioned this issue Nov 23, 2018

Improve performance in debug mode #484

Open

sebcrozet mentioned this issue Dec 11, 2018

SIMD Integration #502

Open

4 tasks

Andlon mentioned this issue Jan 26, 2021

Sparse rework: nalgebra-sparse #823

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SIMD integration #217

SIMD integration #217

hadronized commented Jan 31, 2017

milibopp commented Jul 28, 2017

hadronized commented Jul 28, 2017 •

edited

Loading

sebcrozet commented Jul 29, 2017

hadronized commented Jul 29, 2017 •

edited

Loading

brendanzab commented Jul 29, 2017

sebcrozet commented Jul 29, 2017

milibopp commented Jul 31, 2017

sebcrozet commented Aug 2, 2017

happenslol commented Nov 21, 2018

sebcrozet commented Nov 23, 2018

SIMD integration #217

SIMD integration #217

Comments

hadronized commented Jan 31, 2017

milibopp commented Jul 28, 2017

hadronized commented Jul 28, 2017 • edited Loading

sebcrozet commented Jul 29, 2017

hadronized commented Jul 29, 2017 • edited Loading

brendanzab commented Jul 29, 2017

sebcrozet commented Jul 29, 2017

milibopp commented Jul 31, 2017

sebcrozet commented Aug 2, 2017

happenslol commented Nov 21, 2018

sebcrozet commented Nov 23, 2018

hadronized commented Jul 28, 2017 •

edited

Loading

hadronized commented Jul 29, 2017 •

edited

Loading