-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster and better location data #165
Conversation
@@ -718,9 +814,23 @@ macro_rules! profile_function { | |||
}; | |||
($data:expr) => { | |||
let _profiler_scope = if $crate::are_scopes_on() { | |||
static mut _FUNCTION_NAME: &'static str = ""; | |||
static mut _LOCATION: &'static str = ""; | |||
static _INITITIALIZED: ::std::sync::Once = ::std::sync::Once::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to use thread local here to avoid the synchronization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using thread_local
, and performance seems to be 1-2% worse, but maybe within noise, and code is much nicer. However, it means duplicating the strings for each thread, and also it won't work once we implement #167, so I think this slightly more verbose version is better for now.
This is the code I used:
thread_local! {
pub static _FUNCTION_NAME: &'static str = $crate::current_function_name!().leak();
pub static _LOCATION: &'static str = format!("{}:{}", $crate::current_file_name!(), line!()).leak();
}
Some($crate::ProfilerScope::new(
_FUNCTION_NAME.with(|x| *x),
_LOCATION.with(|x| *x),
$data,
))
Nice! I have a branch here also with some other optimizations I want to finish up ASAP https://github.com/EmbarkStudios/puffin/tree/perf-improvement-test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice longer term if the function name processing functions would be able to be const fn
, then no need for caching nor dynamic allocations and string creation at runtime for it.
but likely are quite far from that being supported in Rust. so this does seem to make sense here.
I have also been toying with an opposite idea of not doing any function parsing in here, just interning the static strings in the stream (#167), and having the client/viewer deal with prettifying the rust strings instead. that feels tighter and much simpler (once one has implemented the string interning for the format) and with no impact on the code being profiled.
but could still start with this approach!
Co-authored-by: Johan Andersson <[email protected]>
Co-authored-by: Johan Andersson <[email protected]>
Co-authored-by: Johan Andersson <[email protected]>
pub fn short_file_name(name: &str) -> &str { | ||
// TODO: "foo/bar/src/lib.rs" -> "bar/src/lib.rs" | ||
#[inline(never)] | ||
pub fn short_file_name(path: &'static str) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this breaking API change has been published as non-semver-breaking 0.17.1
release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, we should hide those functions with #[doc(hidden)]
as don't see a reason to have them part of the public API.
hiding them in patch release is still a breaking change, though we are pre v1 and could give ourselves some flexibility and say they were considered internal details just exposed for the macros.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We rely on them to reimplement the macros while also calling our own implementation with it next to puffin
. Any way we could keep this around?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take it for that use case the static
lifetime also doesn't work?
we could revert that specific change, it was more of style/semantics, not for any concrete benefit. didn't think of that these functions were part of the public interface.
@emilk we could revert it and publish another patch version without these to get back to the previous public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd take it for that use case the
static
lifetime also doesn't work?
I think that's fine because we were calling the puffin::current_file_name!()
macro. The expression it emits suddenly returns String
while ProfilerScope::new()
expects &str
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that also means we need to switch over to std::once::Once
and format!(...).leak()
here. Perhaps it is useful to see for us how much we can reuse, maybe I can actually call the puffin
macro within our own macro.
Really the only thing we're doing extra in our macro is calling into https://crates.io/crates/superluminal-perf, from what I can tell (so that we don't have to annotate our functions with profiling markers for multiple tools).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emilk can you yank the breaking release? I'm going through dependency upgrades and keep forgetting to run cargo update -p puffin --precise 0.17.0
to opt out of semver-breaking changes 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - 0.17.1 is yanked; 0.18.0 published
This went in a bit too quickly and missed being able to review it this morning. For those types of changes let's try to do at least two reviews. I'll have a look at removing unsafe block and perhaps just going for the stream-info-once solution as proposed in #167. |
### Checklist * [x] I have read the [Contributor Guide](../CONTRIBUTING.md) * [x] I have read and agree to the [Code of Conduct](../CODE_OF_CONDUCT.md) * [x] I have added a description of my changes and why I'd like them included in the section below ### Summary Before: <img width="409" alt="puffin-before" src="https://github.com/EmbarkStudios/puffin/assets/1148717/e5722ab6-3865-4f5d-a953-672e4c1e5c86"> After: <img width="665" alt="puffin-after" src="https://github.com/EmbarkStudios/puffin/assets/1148717/dc829830-d2cf-400d-b5d5-9f92a892ff60"> ### Description of Changes This PR: * Improve the function names for `trait` calls * Adds crate name to the file path * Adds line number to the location * Optimize `profile_function` On my M1 MacBook Pro, `profile_function` goes from 86ns->58ns, about 50% faster. `profile_scope` goes from 60ns to 58ns, which is within noise levels. The optimization is done by caching the results of the location gathering functions. This also allows us to make them more expensive (since they are now only run once), and therefor more powerful. So where puffin used to output the location `mod.rs`, it now outputs `cratename/…/modulename/mod.rs:124`. I also improved the function name for when calling a trait method on a type. Previously you'd only see `TraitName>::function_name` but now you'll see `<ConreteTypeName as TraitName>::function_name`. The downside is more data being sent, but I think the improves usefulness of the location is worth it! This is _not_ a breaking change, so this can be released in a patch-release of `puffin` --------- Co-authored-by: Johan Andersson <[email protected]>
…scopes I inadvertantly added this in #165, somehow not noticing it until now 🤦
…scopes I inadvertantly added this in #165, somehow not noticing it until now 🤦
…scopes (#175) ### Checklist * [x] I have read the [Contributor Guide](../CONTRIBUTING.md) * [x] I have read and agree to the [Code of Conduct](../CODE_OF_CONDUCT.md) * [x] I have added a description of my changes and why I'd like them included in the section below ### Description of Changes In #165 I inadvertently added a `::{{closure}}` suffix to the name of all profile scopes created with `profile_function!`, somehow not noticing it until now 🤦
Checklist
Summary
Before:
After:
Description of Changes
This PR:
trait
callsprofile_function
On my M1 MacBook Pro,
profile_function
goes from 86ns->58ns, about 50% faster.profile_scope
goes from 60ns to 58ns, which is within noise levels.The optimization is done by caching the results of the location gathering functions. This also allows us to make them more expensive (since they are now only run once), and therefor more powerful.
So where puffin used to output the location
mod.rs
, it now outputscratename/…/modulename/mod.rs:124
.I also improved the function name for when calling a trait method on a type. Previously you'd only see
TraitName>::function_name
but now you'll see<ConreteTypeName as TraitName>::function_name
.The downside is more data being sent, but I think the improves usefulness of the location is worth it!
This is not a breaking change, so this can be released in a patch-release of
puffin