-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mark phase prefetching. #73375
Mark phase prefetching. #73375
Conversation
Tagging subscribers to this area: @dotnet/gc Issue DetailsThis adds prefetching to the mark phase. The idea is that once we have established that an object is in one of the generations we want to collect, we prefetch its memory before we determine whether we have marked it already. This is because the mark bit is in the object itself, and thus requires accessing the object's memory. As the prefetching will take some time to take effect, we park the object in a queue (see type mark_queue_t below). We then retrieve an older object from the queue, and test whether it has been marked. This should be faster, because we have issued a prefetch for this older object's memory a while back. In quite a few places we now need to drain the queue to ensure correctness - see calls to drain_mark_queue().
|
src/coreclr/gc/gc.cpp
Outdated
|
||
// retrieve a newly marked object from the queue | ||
// returns nullptr if there is no such object | ||
uint8_t* mark_queue_t::drain() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should this be renamed to something that better implies that it just marks one object and returns it (maybe mark_next
or something)? drain
implies completely emptying it out IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think of this method as draining but just needs to return if there's still objects to mark. it does drain the slot_table
at the end when all slots become null (and that's the end goal, to have all slots become null).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about get_next_marked
, slight variation on Aditya's suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then I would probably do get_next_to_mark
since you are getting an object to do the mark work on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the object returned is already marked, so get_next_to_mark
doesn't seem entirely right either. Can't think of anything better than get_next_marked
.
src/coreclr/gc/gc.cpp
Outdated
#endif | ||
_mm_prefetch((const char*)addr, _MM_HINT_T0); | ||
#elif defined(TARGET_ARM64) && defined(TARGET_WINDOWS) | ||
__prefetch((const char*)addr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__builtin_prefetch
should work on non-Windows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://clang.llvm.org/docs/LanguageExtensions.html describes the arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, on linux we should use __buildin_prefetch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Jan - it looks like calling it with the default arguments should be just fine for our purposes.
some results I measured on a 1st party prod workload, I had 3 machines -
|
…ating systems. Checkin tests showed issue traced to missing drain_mark_queue() call in WKS version of scan_dependent_handles.
…_t::get_next_marked.
This adds prefetching to the mark phase.
The idea is that once we have established that an object is in one of the generations we want to collect, we prefetch its memory before we determine whether we have marked it already. This is because the mark bit is in the object itself, and thus requires accessing the object's memory.
As the prefetching will take some time to take effect, we park the object in a queue (see type mark_queue_t below). We then retrieve an older object from the queue, and test whether it has been marked. This should be faster, because we have issued a prefetch for this older object's memory a while back.
In quite a few places we now need to drain the queue to ensure correctness - see calls to drain_mark_queue().