-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ACP: Add nul-terminated version of core::panic::Location::file
#466
Comments
IIRC string constants are always null terminated in the binary anyways. |
I don't think they are |
it's a constant with known length, I wouldn't consider "computing the length" to be a problem. pub struct Location<'a> {
file_with_nul: &'a [u8],
line: u32,
column: u32,
}
impl<'a> Location<'a> {
pub fn file(&self) -> &'a str {
unsafe { str::from_utf8_unchecked(self.file_with_nul.get_unchecked(..self.file_with_nul.len() - 1)) }
}
pub fn file_cstr(&self) -> &'a CStr {
unsafe { CStr::from_bytes_with_nul_unchecked(self.file_with_nul) }
}
} |
one thing that changes though is that if we get an API to set the implicit |
But it becomes a problem if we later go through with the size optimization from rust-lang/rust#117431. Then, the length is no longer known, so it really does have to be computed by calling |
It seems unfortunate if all the Could we have |
if combined with the optimization in |
Using null terminated strings may also unlock linker string merging size optimizations, which could further decrease binary size. |
It would make a lot of things that could otherwise be function calls into macros. :( |
The libs-api team talked about this today on a short-staffed call. Those on the call had a question: Looking at the motivating example, why not write a version of As context, the feeling on the call was that this represents a tradeoff of whether to make the C codebase more Rust-like or Rust more C-like, and people weren't sure it was worth making Rust more C-like, and paying any costs here for all users, in this case. It was noted on the call that this PR... ...had been closed as not being worth it. Though, reading the comments here now more closely, such as the one from @Noratrieb (who was the author of that PR) here, I gather that perhaps there is some interest in trying this again. If there is a way to do this that does in fact result in a worthwhile improvement for all Rust users, then my own feeling is that probably would have affected the mood on the call about this proposal. |
I am not sure that there is in fact a "cost" here. Every file path already has a de-facto terminator: it is suffixed by ".rs", and this causes it to be "prefix-free": https://en.wikipedia.org/wiki/Prefix_code This is the same property possessed by NUL-terminated CStr. Each has "\0" at the end, which means no CStr can be a prefix of another CStr. Thus the argument about the cost seems wildly speculative, unless we wish to introduce a very curious new state of affairs, like not providing the ".rs" suffix! Meanwhile, these file paths also could benefit from linker-driven deduplication (which revolves around the fact that CStrs can share a suffix). |
main.rs #[path = "foo.js"]
mod bar;
fn main() {
bar::bang();
} foo.js pub fn bang() {
panic!("Where am I?");
} Alas, I regret not picking the file extension |
@shepmaster this is true! there is actually afaik nothing preventing you from making all your files have the which I think would be the most interesting comparison point for this proposal: it would be a "hard left turn" in diagnostics for us to not report the exact file path, but we could compare this against something that does strip file extension suffixes and thus allows prefix-based coalescing of file paths into strings. this would work especially well with "Rust 2018" style import paths. |
I think if people are going to claim that this feature has a cost to all users of Rust, they should be able to demonstrate that cost measured by our benchmark suite. I have previously measured the overhead of the There's another way out of this, depending on the vibes people are willing to tolerate. For interop with C generally, we could add a compiler flag that null-terminates Of course this can't happen on Linux, but has anyone thought about what happens when a user of this null-terminated API encounters an interior null byte?
Well, considering these are always full paths, the only coalescing that I think is possible here is merging together a module root with its children, so these two files:
I'd expect this to be rather low-yield, because it doesn't work at all in projects that use |
Does the formatting machinery preassemble strings? I.e. |
The maximum binary size overhead measured in our benchmark suite is 0.04%, and most benchmarks reported the size change is below their significance threshold for binary size changes. rust-lang/rust#132081 (comment) The average binary size increase is 0.01%. |
did you also try removing the length field from |
There is already a perf run for that in the PR linked above, which I can link again here: rust-lang/rust#117431 |
Here's the thing. We don't truly need this for the kernel; it would be annoying to work around it, but we can change the C code and have it accept a length instead of a nul-terminated string. However, I think it's an unfortunate direction because adding Rust to existing C/C++ codebases is the future, and if we want Rust to work well in such codebases, we have to admit that they are not just going to give up their nul-terminated strings. I'm certain that this feature (or something equivalent) is not limited to only being useful in the kernel, and that any other codebase that does C/C++ interop will require this feature in order to have good error messages when something goes wrong when Rust calls into C/C++. After all, when Rust is in the minority in a codebase, we want the experience of using Rust there to be good. Having the Rust side give worse error messages than you get in C/C++ is not a good look. But that is the current state in the kernel right now, and fixing it would be a lot easier with this feature. |
Proposal
Problem statement
When using
#[track_caller]
in codebases that mix C and Rust, you often wish to pass the caller's filename to a C api. However, this usually requires a nul-terminated string.Motivating examples or use cases
I would like to utilize this in the Linux kernel to implement a Rust equivalent of the following utility:
It's essentially an assertion that crashes the kernel if a function is used in the wrong context. The filename and line number is used in the error message when it fails. Unfortunately, the
__might_sleep
function requires the filename to be a nul-terminated string.Note that unlike with things like the
file!()
macro, it's impossible for us to do this ourselves statically. Copying the filename at runtime into another string to nul-terminate it is also not a great solution because we need to create the string even if the assertion doesn't fail, as the assertion is checked on the C side.Solution sketch
Add a new function
core::panic::Location::file_with_nul
that returns a&CStr
instead of a&str
.This has the implication that the compiler must now always store a nul-byte in the filename when generating the string constants.
Alternatives
It could make sense to return
*const c_char
instead of&CStr
to avoid having to compute the length when all you need is a pointer you can pass into C code. This could be important as possible future work involves reducing the size ofLocation
by removing the length. In this case, the existingcore::panic::Location::file
function would be updated to compute the length using the nul-terminator. Right now, the&CStr
return value forces us to compute the length even when we don't need it.Links and related work
An implementation can be found at rust-lang/rust#131828.
For more context, please see zulip and the Linux kernel mailing list. This is one of RfL's wanted features in core.
Adding a nul-terminator to the
Location
string has been tried before in rust-lang/rust#117431. However, back then, it was motivated by reducing the size ofLocation
, and the previous PR did not actually expose the c string in the API.What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
cc @ojeda @Noratrieb
The text was updated successfully, but these errors were encountered: