Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate LV for Julia >= 1.11-DEV #519

Merged
merged 4 commits into from
Jan 3, 2024
Merged

Conversation

chriselrod
Copy link
Member

No description provided.

Copy link

codecov bot commented Jan 3, 2024

Codecov Report

Attention: 14 lines in your changes are missing coverage. Please review.

Comparison is base (d2f749d) 88.64% compared to head (ad2139e) 80.29%.

Files Patch % Lines
src/LoopVectorization.jl 16.66% 10 Missing ⚠️
src/condense_loopset.jl 50.00% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #519      +/-   ##
==========================================
- Coverage   88.64%   80.29%   -8.35%     
==========================================
  Files          39       39              
  Lines        9600     9608       +8     
==========================================
- Hits         8510     7715     -795     
- Misses       1090     1893     +803     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chriselrod chriselrod merged commit a4a160f into main Jan 3, 2024
26 of 59 checks passed
@chriselrod chriselrod deleted the deprecateforjulia1dot11 branch January 3, 2024 17:44
@chriselrod
Copy link
Member Author

Fixes #518

@stillyslalom
Copy link

It's sad to see this project get mothballed, but I understand not wanting to deal with the maintenance burden when you're trying to get LoopModels up to parity. Do you think LoopModels will be usable in the v1.11-release timeframe, or is it still too early in its lifecycle to guess on a release date?

maleadt added a commit to maleadt/LoopVectorization.jl that referenced this pull request Jan 6, 2024
@willow-ahrens
Copy link

I agree, this package will be sorely missed. Thank you Chris for your work on this package, it is quite an achievement.

For future implementers, would you share any ideas towards a more minimal version of the package with less maintenance burden and less compile time? Something that e.g. faithfully unrolls the inner loop, manually applies simd instructions, and tries different loop permutations without getting too serious about it?

@chriselrod
Copy link
Member Author

chriselrod commented Jan 13, 2024

Something that e.g. faithfully unrolls the inner loop

LV will often unroll and SIMD one of the outer loops, not the inner most!
This is an important point to emphasize when trying to replicate its performance, as vectorizing outer loops is often much more profitable (e.g. "unroll and jam").

Another major component for getting good performance is code generation.

Relatively little code in LV was dedicated towards what it should do, and a fairly substantial amount towards actual code generation. I saw an example recently where someone reproduced what LV did, but performance was over 2x worse, simply because LV takes a lot of care in its implementation to generate good code following the execution plan it lays out.
LLVM does a surprisingly terrible job optimizing indexing behavior, and can introduce huge amounts of overhead if one isn't careful.

Another point is, if you care about architectures with wide vectors (especially AVX512), don't use scalar clean up loops, but predicates.

Unfortunately, llvmcalls are slow to compile.
If possible,o not generate Julia Expr, but try and work on the LLVM level. This can help you avoid a host of other problems, such as LV not working with Julia's function multiversioning.
If you really do want to stick with Julia, I'd suggest PRing Base to add tfunc support for SIMD vectors. Oscar and I (mostly Oscar) got a prototype working for the interpreter in a few minutes. That is, instead of needing llvmcall, we got Base.add_float, Base.mul_float, etc working on SIMD types like NTuple{N,Core.VecElement{Float64}}, running through the interpreter.
Mostly, all we did was delete a bunch of asserts, and it "just worked". Of course, we'd need it working for code gen, too, but that shouldn't be hard.

If SIMD code can be written to use add_float, add_int, etc, instead of llvmcall, I think that could improve its compile times fairly substantially.
I'd like add_float_fast, etc, working too.
But LV actually doesn't apply all flags, so more granularity would be great, but that's an orthogonal issue (the fact that the nonans flag makes it difficult to check for nans makes it a nonstarter; LLVM propogates nonans more aggressively than I'd like).

In terms of maintenance burden, my suggestions would be to avoid anything that isn't standard, boring Julia code. Of course, that tends to be at odds with getting good performance. So your best bet would probably be to have a close, open dialogue with the core compiler team on getting standard and stable ways of doing everything you need that they approve of.

@chriselrod
Copy link
Member Author

It's sad to see this project get mothballed, but I understand not wanting to deal with the maintenance burden when you're trying to get LoopModels up to parity. Do you think LoopModels will be usable in the v1.11-release timeframe, or is it still too early in its lifecycle to guess on a release date?

It's too early to guess on a release date, but I would not except it by Julia 1.11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants