feat: cache efficient CPU kernel by ryan-berger · Pull Request #23 · AllTheLines/CacheTVS

ryan-berger · 2025-12-21T00:56:40Z

The benchmark viewshed difference is almost imperceptible. This PRs viewshed is in green:

ryan-berger · 2025-12-21T03:05:02Z

@tombh instead of a command line option, I am just conditionally compiling the vector length into the binary.

I don't think it actually makes sense for me to add the option in depending on the architecture, I think we just want to pick one.

I always want the fastest version supported by my architecture. Adding in an option just confuses things and doesn't actually help me test, I can conditionally compile these very easily anyways by messing with Rust flags.

Let me know if you have any issues, I've also fixed a lot of the clippy lints, although, lots of them happen to just be disabling them and providing a reason.

tombh · 2025-12-21T19:45:12Z

There was just one tiny change I needed to get it to compile. But the heatmap seems to be messed up:

That's for the Cardiff benchmark, but other .bt files suffer similarly.

I made PR #24 to add ARM building/testing, but it seems to suddenly not be happy about the lints. I don't understand why your PR is fine but that one isn't. The lint messages seem correct, but it's like suddenly only now is Clippy seeing all the avx512f gated code.

tombh · 2026-01-09T16:27:45Z

.cargo/config.toml

What do you think about ignoring this file in .gitignore? Are world runs harder without it? Or can we just add a line in Atlas that does the same thing?

I personally prefer it only because we are "JIT"ing so to speak (not in the traditional sense of the word) the program on the native machines before running it. If we were releasing a binary tool for anyone to make use of (i.e. build artifacts) then we would want to disable it since it would end up building on whatever architecture the build machine is on which would blow up in our faces.

And to answer your question, world runs are a bit harder without it, but it could just be a line we add to Atlas with the correct RUSTFLAGS when we provision.

Or another approach is to both gitignore it and put in the repo elsewhere. Then all Atlas as to do is move it to the right place.

crates/total-viewsheds/src/cpu.rs

crates/total-viewsheds/src/cpu/unrolled_los.rs

scripts/apt_init.sh

rust-toolchain.toml

ryan-berger · 2026-01-10T22:26:01Z

I couldn't get main to recognize the module, and the examples I've read/some docs show it as a sibling: https://doc.rust-lang.org/rust-by-example/mod/split.html Sent from [Proton Mail](https://proton.me/mail/home) for Android.

…

-------- Original Message --------

On Saturday, 01/10/26 at 10:00 Thomas Buckley-Houston ***@***.***> wrote: @tombh commented on this pull request. --------------------------------------------------------------- On [crates/total-viewsheds/src/cpu.rs](#23 (comment)): Oh you've made cpu.rs a sibling to /cpu. It's supposed to be cpu/cpu.rs. Or just put the contents of the current cpu.rs into main.rs inside mod cpu {...}. — Reply to this email directly, [view it on GitHub](#23 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADRKZHJFAPTJH4FD25YHKBL4GE43BAVCNFSM6AAAAACPUXJBJCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNBXGEYDCMZTGM). You are receiving this because you authored the thread.Message ID: ***@***.***>

ryan-berger · 2026-01-10T22:26:37Z

Yes... Unfortunately it's quite a few features. Generic const exprs and portable simd Sent from [Proton Mail](https://proton.me/mail/home) for Android.

…

-------- Original Message --------

On Saturday, 01/10/26 at 10:01 Thomas Buckley-Houston ***@***.***> wrote: @tombh commented on this pull request. --------------------------------------------------------------- On [rust-toolchain.toml](#23 (comment)): Oh yeah, do you know why we need this? It'd be good to leave a comment in the file, so we know when whatever the needed feature is makes it to stable. — Reply to this email directly, [view it on GitHub](#23 (review)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADRKZHPNGWPSEQKINXFHPBL4GE47LAVCNFSM6AAAAACPUXJBJCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNBXGEYDGMBVGQ). You are receiving this because you authored the thread.Message ID: ***@***.***>

tombh · 2026-01-10T22:37:48Z

I couldn't get main to recognize the module, and the examples I've read/some docs show it as a sibling:
https://doc.rust-lang.org/rust-by-example/mod/split.html

Oh I never knew that. And what was the issue with putting the definitions in main.rs?

Yes... Unfortunately it's quite a few features. Generic const exprs and portable simd

It's not a problem at all, just good to comment in rust-toolchain.toml what features it is enabling.

ryan-berger · 2026-01-12T01:26:12Z

Oh I never knew that. And what was the issue with putting the definitions in main.rs?

I bet that probably is the issue yes. I'm not terribly concerned about that for this PR, but we could clean it up for sure.

* add inclusive prefix sum code adds an inclusive prefix sum kernel which is generic to unroll factor and vector length * only calculate data for items within the TVS' radius * add filling in of elevations into kernel

This reduces the distance searched for in every line of sight by one elevation less. This is more accurate and consistent with other approaches. No changes to viewsheds in tests or benchmarks. But total surfaces are reduced.

* tests: Integrate viewshed tests into CPU kernel Search for TODO@ryan for remaining tasks. * inline both calls * fix more rotation issues * enable ring data feature for benchmark * fmt * fix some lints, reformat * pass all unit tests, add in refraction * put test above all on default vector length * fix cfg blocks * fix prefix max carry through * fix formatting * reignore tests * fix non-sse build * remove unnecessary carry through --------- Co-authored-by: Ryan Berger <ryanbberger@gmail.com>

Because the TVS is only valid within a certain distance from the center, this adds a better distance calculation which also helps with rasterization

…struction

Remove the old _CMP_GE which was used for the exclusive prefix sum code as this was causing quite a few bugs _only_ in the AVX 512 kernel

…torization

The Vulkan kernel tallies total surface area against itself creating a quadratic (sum i=0; (sum j=0; j; j < i); i < n) surface area rather than a linear one

Everything is just accumulated in `self.total_surfaces`.

They are so similar now that we should expect them to always produce the benchmark viewshed within 1% difference.

Just an excuse to bump the CI cache for the Rust tests.

This is just about the edge case of handling equally long lines from different angles. We always want the first angle to find the line to win. The CPU kernel does this already. But the Vulkan kernel had problems because of how forward and backward lines take it in turns per sector.

Currently we use the cargo toml option to guarantee good performance on x86 machines, which is especially needed for world runs. We will recommend it elsewhere, but make sure it is enabled on all Turin workers

After some thought about how the L1 cache is functioning in our line of sight algorithm, this commit calculates all the angles and puts them into a buffer, and then an unrolled prefix max is then calculated on top of that. It ends up being much quicker on my i9900k, offering about a 20% speedup, and it is expected machines with larger L1 caches will be better.

This was referenced Dec 21, 2025

Rebased main onto cpu-clean #22

Closed

DRAFT: CPU Kernel #12

Closed

ryan-berger requested a review from tombh December 21, 2025 02:53

ryan-berger force-pushed the rberger/cpu-clean branch from 86e27cd to c142b9d Compare December 21, 2025 03:02

ryan-berger force-pushed the rberger/cpu-clean branch from c142b9d to 5262c20 Compare December 30, 2025 22:48

feat: cache efficient CPU kernel

7163edb

ryan-berger force-pushed the rberger/cpu-clean branch 2 times, most recently from f5b0c4b to 91a580e Compare January 9, 2026 06:43

tombh reviewed Jan 9, 2026

View reviewed changes

tombh reviewed Jan 10, 2026

View reviewed changes

rust-toolchain.toml Show resolved Hide resolved

ryan-berger requested a review from tombh January 12, 2026 01:27

ryan-berger force-pushed the rberger/cpu-clean branch 2 times, most recently from aed1763 to 81cdf51 Compare January 17, 2026 22:54

tombh approved these changes Jan 17, 2026

View reviewed changes

ryan-berger force-pushed the rberger/cpu-clean branch from 81cdf51 to b99fdac Compare January 17, 2026 23:18

ryan-berger and others added 9 commits January 17, 2026 15:24

feat: conditionally compile based on architecture

c1c0e56

feat: add inclusive prefix sum code (#25)

589af76

* add inclusive prefix sum code adds an inclusive prefix sum kernel which is generic to unroll factor and vector length * only calculate data for items within the TVS' radius * add filling in of elevations into kernel

feat: add a method of creating a default unroll

83f798c

fix: make main kernel loop range exclusive (#30)

64bca7e

This reduces the distance searched for in every line of sight by one elevation less. This is more accurate and consistent with other approaches. No changes to viewsheds in tests or benchmarks. But total surfaces are reduced.

fix: avx512 build issues

855221f

fix: flakey longest line tests, add thread count command line flag

736c6f6

feat: add better rasterization of valid TVS

8476cb0

Because the TVS is only valid within a certain distance from the center, this adds a better distance calculation which also helps with rasterization

feat: add highest carry-through back in removing unnecessary max in…

21ecede

…struction

ryan-berger and others added 17 commits January 17, 2026 15:29

fix: 16-wide float comparison

a984d4c

Remove the old _CMP_GE which was used for the exclusive prefix sum code as this was causing quite a few bugs _only_ in the AVX 512 kernel

refactor: a more robust line of sight trait, couple unrolling and vec…

6b3394b

…torization

fix: duplicate total surface tallying

c5a219e

The Vulkan kernel tallies total surface area against itself creating a quadratic (sum i=0; (sum j=0; j; j < i); i < n) surface area rather than a linear one

fix: remove asserts, fix refraction

d36f788

fix: single thread tests, output areas

408726d

refactor: remove references to sector_surfaces

4e6d314

Everything is just accumulated in `self.total_surfaces`.

benchmarks: unify cpu and vulkan-cpu viewshed expectation

71269e9

They are so similar now that we should expect them to always produce the benchmark viewshed within 1% difference.

ci: bump Rust version to 1.92

eef1c7c

Just an excuse to bump the CI cache for the Rust tests.

chore: remove mod.rs, delete unneeded script, fix constant

30a2e4e

chore: update benchmark to use 48-wide Cardiff

8ef3a49

chore: remove cargo toml for portability

a5958d5

Currently we use the cargo toml option to guarantee good performance on x86 machines, which is especially needed for world runs. We will recommend it elsewhere, but make sure it is enabled on all Turin workers

chore: document rusttoolchain file

dd72e72

chore: add refraction tests

3d497fe

chore: update unroll factor based on new algorithm

e294d9b

chore: add option to disable image rendering

263977e

ryan-berger force-pushed the rberger/cpu-clean branch from b99fdac to 263977e Compare January 17, 2026 23:30

ryan-berger merged commit d326e9a into main Jan 17, 2026
7 checks passed

ryan-berger deleted the rberger/cpu-clean branch January 17, 2026 23:36

Conversation

ryan-berger commented Dec 21, 2025 • edited by tombh Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ryan-berger commented Dec 21, 2025

Uh oh!

tombh commented Dec 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tombh Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

ryan-berger Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

ryan-berger Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

tombh Jan 10, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ryan-berger commented Jan 10, 2026 via email

Uh oh!

ryan-berger commented Jan 10, 2026 via email

Uh oh!

tombh commented Jan 10, 2026

Uh oh!

ryan-berger commented Jan 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

ryan-berger commented Dec 21, 2025 •

edited by tombh

Loading

tombh commented Dec 21, 2025 •

edited

Loading