Conversation
86e27cd to
c142b9d
Compare
|
@tombh instead of a command line option, I am just conditionally compiling the vector length into the binary. I don't think it actually makes sense for me to add the option in depending on the architecture, I think we just want to pick one. I always want the fastest version supported by my architecture. Adding in an option just confuses things and doesn't actually help me test, I can conditionally compile these very easily anyways by messing with Rust flags. Let me know if you have any issues, I've also fixed a lot of the clippy lints, although, lots of them happen to just be disabling them and providing a reason. |
|
There was just one tiny change I needed to get it to compile. But the heatmap seems to be messed up:
That's for the Cardiff benchmark, but other I made PR #24 to add ARM building/testing, but it seems to suddenly not be happy about the lints. I don't understand why your PR is fine but that one isn't. The lint messages seem correct, but it's like suddenly only now is Clippy seeing all the |
c142b9d to
5262c20
Compare
f5b0c4b to
91a580e
Compare
.cargo/config.toml
Outdated
There was a problem hiding this comment.
What do you think about ignoring this file in .gitignore? Are world runs harder without it? Or can we just add a line in Atlas that does the same thing?
There was a problem hiding this comment.
I personally prefer it only because we are "JIT"ing so to speak (not in the traditional sense of the word) the program on the native machines before running it. If we were releasing a binary tool for anyone to make use of (i.e. build artifacts) then we would want to disable it since it would end up building on whatever architecture the build machine is on which would blow up in our faces.
There was a problem hiding this comment.
And to answer your question, world runs are a bit harder without it, but it could just be a line we add to Atlas with the correct RUSTFLAGS when we provision.
There was a problem hiding this comment.
Or another approach is to both gitignore it and put in the repo elsewhere. Then all Atlas as to do is move it to the right place.
|
I couldn't get main to recognize the module, and the examples I've read/some docs show it as a sibling:
https://doc.rust-lang.org/rust-by-example/mod/split.html
Sent from [Proton Mail](https://proton.me/mail/home) for Android.
…-------- Original Message --------
On Saturday, 01/10/26 at 10:00 Thomas Buckley-Houston ***@***.***> wrote:
@tombh commented on this pull request.
---------------------------------------------------------------
On [crates/total-viewsheds/src/cpu.rs](#23 (comment)):
Oh you've made cpu.rs a sibling to /cpu. It's supposed to be cpu/cpu.rs. Or just put the contents of the current cpu.rs into main.rs inside mod cpu {...}.
—
Reply to this email directly, [view it on GitHub](#23 (comment)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADRKZHJFAPTJH4FD25YHKBL4GE43BAVCNFSM6AAAAACPUXJBJCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNBXGEYDCMZTGM).
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
|
Yes... Unfortunately it's quite a few features. Generic const exprs and portable simd
Sent from [Proton Mail](https://proton.me/mail/home) for Android.
…-------- Original Message --------
On Saturday, 01/10/26 at 10:01 Thomas Buckley-Houston ***@***.***> wrote:
@tombh commented on this pull request.
---------------------------------------------------------------
On [rust-toolchain.toml](#23 (comment)):
Oh yeah, do you know why we need this? It'd be good to leave a comment in the file, so we know when whatever the needed feature is makes it to stable.
—
Reply to this email directly, [view it on GitHub](#23 (review)), or [unsubscribe](https://github.com/notifications/unsubscribe-auth/ADRKZHPNGWPSEQKINXFHPBL4GE47LAVCNFSM6AAAAACPUXJBJCVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTMNBXGEYDGMBVGQ).
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Oh I never knew that. And what was the issue with putting the definitions in
It's not a problem at all, just good to comment in |
I bet that probably is the issue yes. I'm not terribly concerned about that for this PR, but we could clean it up for sure. |
aed1763 to
81cdf51
Compare
81cdf51 to
b99fdac
Compare
* add inclusive prefix sum code adds an inclusive prefix sum kernel which is generic to unroll factor and vector length * only calculate data for items within the TVS' radius * add filling in of elevations into kernel
This reduces the distance searched for in every line of sight by one elevation less. This is more accurate and consistent with other approaches. No changes to viewsheds in tests or benchmarks. But total surfaces are reduced.
* tests: Integrate viewshed tests into CPU kernel Search for TODO@ryan for remaining tasks. * inline both calls * fix more rotation issues * enable ring data feature for benchmark * fmt * fix some lints, reformat * pass all unit tests, add in refraction * put test above all on default vector length * fix cfg blocks * fix prefix max carry through * fix formatting * reignore tests * fix non-sse build * remove unnecessary carry through --------- Co-authored-by: Ryan Berger <ryanbberger@gmail.com>
Because the TVS is only valid within a certain distance from the center, this adds a better distance calculation which also helps with rasterization
Remove the old _CMP_GE which was used for the exclusive prefix sum code as this was causing quite a few bugs _only_ in the AVX 512 kernel
The Vulkan kernel tallies total surface area against itself creating a quadratic (sum i=0; (sum j=0; j; j < i); i < n) surface area rather than a linear one
Everything is just accumulated in `self.total_surfaces`.
They are so similar now that we should expect them to always produce the benchmark viewshed within 1% difference.
Just an excuse to bump the CI cache for the Rust tests.
This is just about the edge case of handling equally long lines from different angles. We always want the first angle to find the line to win. The CPU kernel does this already. But the Vulkan kernel had problems because of how forward and backward lines take it in turns per sector.
Currently we use the cargo toml option to guarantee good performance on x86 machines, which is especially needed for world runs. We will recommend it elsewhere, but make sure it is enabled on all Turin workers
After some thought about how the L1 cache is functioning in our line of sight algorithm, this commit calculates all the angles and puts them into a buffer, and then an unrolled prefix max is then calculated on top of that. It ends up being much quicker on my i9900k, offering about a 20% speedup, and it is expected machines with larger L1 caches will be better.
b99fdac to
263977e
Compare

The benchmark viewshed difference is almost imperceptible. This PRs viewshed is in green:
