-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve CUDA support #612
Improve CUDA support #612
Conversation
Just in case, reference to "Windows friendliness" means that I can confirm that it works on Windows. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems reasonable to me! I don't know anything about cuda really but I'm happy to defer!
I apologize for the noise from CI. Rust is not my language of 1st choice:-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok sounds good to me about the which
business. I'll take on the responsibility of dealing with those bugs if they ever arise in the future :)
src/lib.rs
Outdated
env::var_os("PATH").as_ref().and_then(|path_entries| { | ||
env::split_paths(path_entries).find_map(|path_entry| { | ||
let mut exe = path_entry.join(tool); | ||
return if check_exe(&mut exe) { Some(exe) } else { None }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The return
here can be elided (the last expression in a closure is an implicit return)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It says "you might have meant to ... return Some(exe);
"...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that probably means you need to remove the semicolon at the end of the line as well, changing it to:
if check_exe(&mut exe) { Some(exe) } else { None }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, that's what I tried, and it did complain, 1.53.
src/lib.rs
Outdated
// it's on user to specify one by passing it through | ||
// RUSTFLAGS environment variable. | ||
let mut libtst = false; | ||
let mut libdir = nvcc.canonicalize().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've found historically that canonicalize()
can fail for weird reasons, but is that necesary to do here? We already find nvcc
in PATH
so I don't think we may need to canonicalize here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that PATH
is evaluated only in the case nvcc
here is a "single-word" command. Otherwise one can set the NVCC environment variable to explicit path to whatever you want to call nvcc
. The idea behind canonicalize()
is to handle the case when the environment variable is set to relative path. One can argue that it's an edge case that is not worth covering. If you want to say "scrap it," then just say the word:-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If nvcc
is set to a relative path is there an issue with passing a relative path as a -L
argument? I would naively expect that to work ok, which I think means that the canonicalize()
probably isn't necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's surely just me, but I tend to think of unlike cases. Let's say you want to compile with NVCC set to './nvcc'. Of course nobody in their right mind would actually do it, but it's not really an excuse to disregard the possibility. Without canonicalize()
the current code will break. Alternative can be to not rely on libdir.pop(), libdir.pop()
, but do libdir.pop(), libdir.push("..")
... This way most unlike case is covered without canonicalize()
. Let me test how it works...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
canonicalize()
is gone.
src/lib.rs
Outdated
match which(&self.get_compiler().path) { | ||
None => (), | ||
Some(nvcc) => { | ||
// Try to figure out the -L search path. If it fails, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, do you know why nvcc
doesn't do this logic automatically? Is there documentation we can point to somewhere recommending this -L
path and such?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does do this automatically, but the problem is that cargo
doesn't call nvcc
to link the final binary. Instead is calls the system linker, and then you have to craft -L and -l yourself. Of course ideal would be if one could query nvcc
and ask "what's your library path," but there is no such option:-(
Hmmm, when I think about it... If there is a way to dynamically override the target linker
, and set it to nvcc
, it might work too... In such case one wouldn't need this -L thing, but then cross-compilation might get tricky. [Just in case, reference to "cross-compilation" means that it does work, I can cross-compile CUDA program for aarch64.]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final linker is actually invoked by rustc rather than Cargo, and the target's default can be overridden with -Clinker=/path/to/nvcc
(which can also be configured in cargo either via RUSTFLAGS
or as a string with CARGO_TARGET_..._LINKER=...
)
Is this a case where the Rust target should be defaulting to nvcc
rather than cc
?
(I should point out again I'm very naive about CUDA/nvcc/etc, I naively assume right now that there's one Rust target that everyone uses for GPU things (and that nvcc is for GPU things) and that you're compiling for that target)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a case where the Rust target should be defaulting to
nvcc
rather thancc
?
Yes. Or rather that's what I had in mind, it remains to be seen if it actually works all the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I naively assume right now that there's one Rust target that everyone uses for GPU things (and that nvcc is for GPU things) and that you're compiling for that target.
nvcc is available for x86_64, arm64 and ppc64le, And it can do cross-compilations. It's most common scenario when you run eclipse on x86_64 and target remote arm64.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting LINKER to nvcc doesn't work, because it doesn't recognize all system's linker flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh it might be the case that rustc needs more invasive knowledge about nvcc if it runs nvcc, there's a few flavors of linkers implemented in rustc and nvcc may need its own flavor (unless it looks exactly like gcc, which sounds like that's not the case)
@@ -1205,6 +1245,9 @@ impl Build { | |||
if !msvc || !is_asm || !is_arm { | |||
cmd.arg("-c"); | |||
} | |||
if self.cuda { | |||
cmd.arg("--device-c"); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional thought. And I suppose it's rather a question to @YoshikawaMasashi. I've effectively taken this from #508, but the flag is not generally required, only when you compile multiple modules. For this reason it might be more appropriate to shift the burden to application and suggest to add .flag("--devce-c")
when constructing the compilation. Alternatively one can make the [flag] addition contingent on amount of files being compiled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to defer to your judgement on this, I don't think I know enough about cuda to know what the best option is here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went for additional self.files.len() > 1
.
This allows to perform the final link with system linker.
src/lib.rs
Outdated
Some(opt) => opt.as_str(), | ||
None => "none", | ||
}; | ||
if self.cuda && cudart != "none" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here you can probably just do if let Some(cudart) = &self.cudart { ... }
because if someone specifies cudart then self.cuda
is automatically set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I let Some(cudart) = &self.cudart
, I'll still have to check for "none". Because cudart being None
and Some("none")
are equivalent cases. But yes, I can omit self.cuda
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh how come Some("none")
is equivalent to None
? (I thought that was accidental, didn't realize it was intentional)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.cuda
is removed, added a commentary to remind that Some("none")
is an option. I've also spotted error, I recognized "dynamic" instead of "shared."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come Some("none") is equivalent to None?
Since .cudart()
aims to mimic --cudart
, it recognizes all --cudart
's options, {none|shared|static}. So that nvcc's users feel "home."
As for mimicking the |
Try to locate the library in standard location relative to nvcc command. If it fails, user is held responsible for specifying one in RUSTFLAGS.
Execution is bound to fail without card, but the failure is ignored. It's rather a compile-n-link test. The test is suppressed if 'nvcc' is not found on the $PATH.
This can interfere with current deployments in the wild, in which case some adjustments might be required. Most notably one might have to add .cuda("none") to the corresponding Builder instantiation to restore the original behaviour.
The top-most commit, "Harmonize CUDA support with NVCC default --cudart static," 1bc17a0, is to show how enforcing the nvcc's default looks like in practice. Whether or not it's accepted is up to the maintainer. I for one am obviously in favour:-) I'm pinging earlier CUDA support contributors, @peterhj, @trolleyman, @YoshikawaMasashi, for possible feedback. |
Ok that all sounds good to me, and I think that this is good to go from at least my end in terms of idioms and such. Are you comfortable with the implemented semantics here? If so I can hit the button |
It might be appropriate to give some time for other contributors to weigh in. Naturally if they choose to... To recap, the question is if it would be appropriate to start linking with cudart by default. As opposed to |
Ok well in any case this doesn't change the defaults as-is but I think it's fine to consider changing the defaults in the future as well. I'm gonna go ahead and merge this. |
Thank you for being patient! :-) If there are any, even remotely related problems, don't hesitate to tag my handle. I'll be happy to help. Cheers. |
It's rather tricky to use the CUDA compiler with cc-rs, because as it is now you would customarily get a bunch of unresolved symbols linking errors. Answer is a) perform CUDA-specific
--device-link
step, b) link with CUDA run-time.The latter is solved as a feature.For reference, this is similar to #508, but things are done in a more Windows-friendly manner and a CI test is added.