Improve CUDA support #612

dot-asm · 2021-07-25T16:51:44Z

It's rather tricky to use the CUDA compiler with cc-rs, because as it is now you would customarily get a bunch of unresolved symbols linking errors. Answer is a) perform CUDA-specific --device-link step, b) link with CUDA run-time. ~~The latter is solved as a feature.~~

For reference, this is similar to #508, but things are done in a more Windows-friendly manner and a CI test is added.

dot-asm · 2021-07-25T20:10:30Z

Just in case, reference to "Windows friendliness" means that I can confirm that it works on Windows. Well, cargo test --manifest-path cc-test/Cargo.toml --features cudart doesn't as is, but it can be rectified by moving the new which(nvcc) thing up in cc-test/build.rs, more specifically prior to Test that the 'windows_registry' module ..., because it wipes %PATH%. Arranging CI test on Windows is problematic because of the lack of usable installation instructions.

alexcrichton

Seems reasonable to me! I don't know anything about cuda really but I'm happy to defer!

Cargo.toml

src/lib.rs

dot-asm · 2021-07-27T16:40:22Z

I apologize for the noise from CI. Rust is not my language of 1st choice:-)

alexcrichton

Ok sounds good to me about the which business. I'll take on the responsibility of dealing with those bugs if they ever arise in the future :)

src/lib.rs

alexcrichton · 2021-07-28T15:48:06Z

src/lib.rs

+    env::var_os("PATH").as_ref().and_then(|path_entries| {
+        env::split_paths(path_entries).find_map(|path_entry| {
+            let mut exe = path_entry.join(tool);
+            return if check_exe(&mut exe) { Some(exe) } else { None };


The return here can be elided (the last expression in a closure is an implicit return)

It says "you might have meant to ... return Some(exe);"...

Oh that probably means you need to remove the semicolon at the end of the line as well, changing it to:

if check_exe(&mut exe) { Some(exe) } else { None }

Right, that's what I tried, and it did complain, 1.53.

src/lib.rs

alexcrichton · 2021-07-28T15:49:29Z

src/lib.rs

+                    // it's on user to specify one by passing it through
+                    // RUSTFLAGS environment variable.
+                    let mut libtst = false;
+                    let mut libdir = nvcc.canonicalize().unwrap();


I've found historically that canonicalize() can fail for weird reasons, but is that necesary to do here? We already find nvcc in PATH so I don't think we may need to canonicalize here?

Note that PATH is evaluated only in the case nvcc here is a "single-word" command. Otherwise one can set the NVCC environment variable to explicit path to whatever you want to call nvcc. The idea behind canonicalize() is to handle the case when the environment variable is set to relative path. One can argue that it's an edge case that is not worth covering. If you want to say "scrap it," then just say the word:-)

If nvcc is set to a relative path is there an issue with passing a relative path as a -L argument? I would naively expect that to work ok, which I think means that the canonicalize() probably isn't necessary here.

It's surely just me, but I tend to think of unlike cases. Let's say you want to compile with NVCC set to './nvcc'. Of course nobody in their right mind would actually do it, but it's not really an excuse to disregard the possibility. Without canonicalize() the current code will break. Alternative can be to not rely on libdir.pop(), libdir.pop(), but do libdir.pop(), libdir.push("..")... This way most unlike case is covered without canonicalize(). Let me test how it works...

canonicalize() is gone.

src/lib.rs

alexcrichton · 2021-07-28T15:50:33Z

src/lib.rs

+            match which(&self.get_compiler().path) {
+                None => (),
+                Some(nvcc) => {
+                    // Try to figure out the -L search path. If it fails,


Out of curiosity, do you know why nvcc doesn't do this logic automatically? Is there documentation we can point to somewhere recommending this -L path and such?

It does do this automatically, but the problem is that cargo doesn't call nvcc to link the final binary. Instead is calls the system linker, and then you have to craft -L and -l yourself. Of course ideal would be if one could query nvcc and ask "what's your library path," but there is no such option:-(

Hmmm, when I think about it... If there is a way to dynamically override the target linker, and set it to nvcc, it might work too... In such case one wouldn't need this -L thing, but then cross-compilation might get tricky. [Just in case, reference to "cross-compilation" means that it does work, I can cross-compile CUDA program for aarch64.]

The final linker is actually invoked by rustc rather than Cargo, and the target's default can be overridden with -Clinker=/path/to/nvcc (which can also be configured in cargo either via RUSTFLAGS or as a string with CARGO_TARGET_..._LINKER=...)

Is this a case where the Rust target should be defaulting to nvcc rather than cc?

(I should point out again I'm very naive about CUDA/nvcc/etc, I naively assume right now that there's one Rust target that everyone uses for GPU things (and that nvcc is for GPU things) and that you're compiling for that target)

Is this a case where the Rust target should be defaulting to nvcc rather than cc?

Yes. Or rather that's what I had in mind, it remains to be seen if it actually works all the way.

I naively assume right now that there's one Rust target that everyone uses for GPU things (and that nvcc is for GPU things) and that you're compiling for that target.

nvcc is available for x86_64, arm64 and ppc64le, And it can do cross-compilations. It's most common scenario when you run eclipse on x86_64 and target remote arm64.

Setting LINKER to nvcc doesn't work, because it doesn't recognize all system's linker flags.

Oh it might be the case that rustc needs more invasive knowledge about nvcc if it runs nvcc, there's a few flavors of linkers implemented in rustc and nvcc may need its own flavor (unless it looks exactly like gcc, which sounds like that's not the case)

Cargo.toml

dot-asm · 2021-07-29T09:42:34Z

src/lib.rs

@@ -1205,6 +1245,9 @@ impl Build {
        if !msvc || !is_asm || !is_arm {
            cmd.arg("-c");
        }
+        if self.cuda {
+            cmd.arg("--device-c");
+        }


Additional thought. And I suppose it's rather a question to @YoshikawaMasashi. I've effectively taken this from #508, but the flag is not generally required, only when you compile multiple modules. For this reason it might be more appropriate to shift the burden to application and suggest to add .flag("--devce-c") when constructing the compilation. Alternatively one can make the [flag] addition contingent on amount of files being compiled.

I'll have to defer to your judgement on this, I don't think I know enough about cuda to know what the best option is here.

I went for additional self.files.len() > 1.

This allows to perform the final link with system linker.

alexcrichton · 2021-07-29T18:07:52Z

src/lib.rs

+            Some(opt) => opt.as_str(),
+            None => "none",
+        };
+        if self.cuda && cudart != "none" {


I think here you can probably just do if let Some(cudart) = &self.cudart { ... } because if someone specifies cudart then self.cuda is automatically set.

If I let Some(cudart) = &self.cudart, I'll still have to check for "none". Because cudart being None and Some("none") are equivalent cases. But yes, I can omit self.cuda...

Oh how come Some("none") is equivalent to None? (I thought that was accidental, didn't realize it was intentional)

self.cuda is removed, added a commentary to remind that Some("none") is an option. I've also spotted error, I recognized "dynamic" instead of "shared."

how come Some("none") is equivalent to None?

Since .cudart() aims to mimic --cudart, it recognizes all --cudart's options, {none|shared|static}. So that nvcc's users feel "home."

Cargo.toml

dot-asm · 2021-07-29T22:20:42Z

As for mimicking the --cudart compiler option. Currently not calling .cudart() method is equivalent to --cudart none, which is not nvcc's default. The rationale behind this suggestion is following. I would assume that if somebody is using the cc-rs with CUDA now, they have to do something about linking. And making sudden moves like forcing equivalent of --cudart static can interfere with whatever current users do. Now, one can make case for not worrying about the said interference, enforce the nvcc's default, and tell users to adjust. I personally would prefer the nvcc's default, but I'm merely starting with Rust-CUDA combo, so my opinion doesn't weigh much:-)

Try to locate the library in standard location relative to nvcc command. If it fails, user is held responsible for specifying one in RUSTFLAGS.

Execution is bound to fail without card, but the failure is ignored. It's rather a compile-n-link test. The test is suppressed if 'nvcc' is not found on the $PATH.

This can interfere with current deployments in the wild, in which case some adjustments might be required. Most notably one might have to add .cuda("none") to the corresponding Builder instantiation to restore the original behaviour.

dot-asm · 2021-07-30T10:55:39Z

one can make case for not worrying about the said interference, enforce the nvcc's default, and tell users to adjust.

The top-most commit, "Harmonize CUDA support with NVCC default --cudart static," 1bc17a0, is to show how enforcing the nvcc's default looks like in practice. Whether or not it's accepted is up to the maintainer. I for one am obviously in favour:-) I'm pinging earlier CUDA support contributors, @peterhj, @trolleyman, @YoshikawaMasashi, for possible feedback.

alexcrichton · 2021-07-30T15:05:54Z

Ok that all sounds good to me, and I think that this is good to go from at least my end in terms of idioms and such. Are you comfortable with the implemented semantics here? If so I can hit the button

dot-asm · 2021-07-30T15:13:37Z

It might be appropriate to give some time for other contributors to weigh in. Naturally if they choose to... To recap, the question is if it would be appropriate to start linking with cudart by default. As opposed to ~~having~~ making user add an additional method call to the Builder.

alexcrichton · 2021-08-02T15:45:26Z

Ok well in any case this doesn't change the defaults as-is but I think it's fine to consider changing the defaults in the future as well. I'm gonna go ahead and merge this.

dot-asm · 2021-08-02T19:59:40Z

Thank you for being patient! :-) If there are any, even remotely related problems, don't hesitate to tag my handle. I'll be happy to help. Cheers.

alexcrichton reviewed Jul 27, 2021

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

src/lib.rs Outdated Show resolved Hide resolved

dot-asm force-pushed the cuda-dlink branch from 16d0df3 to e4f212c Compare July 28, 2021 08:27

alexcrichton reviewed Jul 28, 2021

View reviewed changes

Cargo.toml Outdated Show resolved Hide resolved

dot-asm commented Jul 29, 2021

View reviewed changes

Perform CUDA --device-link.

ad6e216

This allows to perform the final link with system linker.

dot-asm force-pushed the cuda-dlink branch from a8f3e27 to d283e5e Compare July 29, 2021 13:52

alexcrichton reviewed Jul 29, 2021

View reviewed changes

dot-asm added 4 commits July 30, 2021 12:34

Add 'cudart' method mimicking the '--cudart' nvcc command-line option.

d1eb3c5

Try to locate the library in standard location relative to nvcc command. If it fails, user is held responsible for specifying one in RUSTFLAGS.

Add dummy CUDA test to cc-test.

f8a269a

Execution is bound to fail without card, but the failure is ignored. It's rather a compile-n-link test. The test is suppressed if 'nvcc' is not found on the $PATH.

Add dummy CUDA CI test.

e57b2ca

dot-asm force-pushed the cuda-dlink branch from 6d74226 to 1bc17a0 Compare July 30, 2021 10:42

alexcrichton merged commit 4a6e8b7 into rust-lang:master Aug 2, 2021

nazar-pc mentioned this pull request Aug 30, 2021

New release, please #616

Closed

YoshikawaMasashi mentioned this pull request Oct 19, 2021

Support for Cuda Separate Compilation #508

Closed

dot-asm mentioned this pull request Jun 1, 2022

Add API to Record Compile Invocation #682

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve CUDA support #612

Improve CUDA support #612

dot-asm commented Jul 25, 2021 •

edited

Loading

dot-asm commented Jul 25, 2021 •

edited

Loading

alexcrichton left a comment

dot-asm commented Jul 27, 2021

alexcrichton left a comment

alexcrichton Jul 28, 2021

dot-asm Jul 28, 2021

alexcrichton Jul 28, 2021

dot-asm Jul 28, 2021

alexcrichton Jul 28, 2021

dot-asm Jul 28, 2021

alexcrichton Jul 28, 2021

dot-asm Jul 28, 2021

dot-asm Jul 28, 2021

alexcrichton Jul 28, 2021

dot-asm Jul 28, 2021

alexcrichton Jul 28, 2021

dot-asm Jul 28, 2021

dot-asm Jul 28, 2021

dot-asm Jul 28, 2021

alexcrichton Jul 29, 2021

dot-asm Jul 29, 2021 •

edited

Loading

alexcrichton Jul 29, 2021

dot-asm Jul 29, 2021

alexcrichton Jul 29, 2021

dot-asm Jul 29, 2021

alexcrichton Jul 29, 2021

dot-asm Jul 29, 2021

dot-asm Jul 29, 2021 •

edited

Loading

dot-asm commented Jul 29, 2021

dot-asm commented Jul 30, 2021 •

edited

Loading

alexcrichton commented Jul 30, 2021

dot-asm commented Jul 30, 2021 •

edited

Loading

alexcrichton commented Aug 2, 2021

dot-asm commented Aug 2, 2021

Improve CUDA support #612

Improve CUDA support #612

Conversation

dot-asm commented Jul 25, 2021 • edited Loading

dot-asm commented Jul 25, 2021 • edited Loading

alexcrichton left a comment

Choose a reason for hiding this comment

dot-asm commented Jul 27, 2021

alexcrichton left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dot-asm Jul 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dot-asm Jul 29, 2021 • edited Loading

Choose a reason for hiding this comment

dot-asm commented Jul 29, 2021

dot-asm commented Jul 30, 2021 • edited Loading

alexcrichton commented Jul 30, 2021

dot-asm commented Jul 30, 2021 • edited Loading

alexcrichton commented Aug 2, 2021

dot-asm commented Aug 2, 2021

dot-asm commented Jul 25, 2021 •

edited

Loading

dot-asm commented Jul 25, 2021 •

edited

Loading

dot-asm Jul 29, 2021 •

edited

Loading

dot-asm Jul 29, 2021 •

edited

Loading

dot-asm commented Jul 30, 2021 •

edited

Loading

dot-asm commented Jul 30, 2021 •

edited

Loading