-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor gc to only need to allocate virtual addresses as needed #21135
Conversation
Could you explain what region0, region1, and region2 mean? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general. The page_metadata
fast path can be tweaked a little bit. Other slow paths can probably be tweaked later.
@@ -590,7 +590,7 @@ STATIC_INLINE void gc_setmark_pool_(jl_ptls_t ptls, jl_taggedvalue_t *o, | |||
STATIC_INLINE void gc_setmark_pool(jl_ptls_t ptls, jl_taggedvalue_t *o, | |||
uint8_t mark_mode) | |||
{ | |||
gc_setmark_pool_(ptls, o, mark_mode, page_metadata(o)); | |||
gc_setmark_pool_(ptls, o, mark_mode, page_metadata_ext(o).meta); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be page_metadata_ext(o).meta
and not page_metadata(o)
? (Also a few cases below.)
Edit: I guess this is for NULL check? Seems that these can be replaced with jl_assume(page_metadata(o))
. Or at least this one and the one in the pool allocator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, main difference is that _ext
requires it to be a valid page, while page_metadata
did not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, _ext
is the fast path
src/gc-pages.c
Outdated
info.region0->freemap[info.region0_i32] |= msk; // is free | ||
pmeta = &info.region0->meta[i]; | ||
info.meta = (*pmeta = &page_meta[pg]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The allocation here can probably be cleaned up later..... (we can have a version of perm_alloc that guarantees zero filling).
src/gc.h
Outdated
uintptr_t data = ((uintptr_t)_data); | ||
unsigned i; | ||
i = REGION_INDEX(data); | ||
region1_t *r1 = memory_map.meta1[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can probably use a compile time optimization too. (Though I doubt 32bits matters too much.......)
I believe the "regions" forms a 2-3 level "page table". |
@nanosoldier |
src/gc-pages.c
Outdated
#ifdef _OS_WINDOWS_ | ||
VirtualAlloc(ptr, GC_PAGE_SZ, MEM_COMMIT, PAGE_READWRITE); | ||
VirtualAlloc(meta->data, GC_PAGE_SZ, MEM_COMMIT, PAGE_READWRITE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is using meta
outside of its scope (defined on line 178 inside a block) which is causing the Windows failures.
@@ -96,113 +84,184 @@ static void jl_gc_alloc_region(region_t *region) | |||
jl_throw(jl_memory_exception); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is getting hit on 32-bit Linux
947699b
to
d0d8f4c
Compare
@@ -9,32 +9,21 @@ | |||
extern "C" { | |||
#endif | |||
|
|||
// A region is contiguous storage for up to DEFAULT_REGION_PG_COUNT naturally aligned GC_PAGE_SZ pages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't just delete this, update it if no longer true
Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels |
d0d8f4c
to
ff4ede1
Compare
the difference between pagetable, pagetable0, and pagetable1 is still not described very clearly |
There is no difference, just level h=0, h=1, and h=2 of the table. C is just really encouraging of copy-pasting. |
ff4ede1
to
609f5bb
Compare
these look copy pasted but in several cases aren't quite, the 0 function refers to 1, or similar. is this intentional? if so, it should be explained. |
@vtjnash: I don't really understand why you can't just say what they are for. If they're the same, why are there three of them? Why not just one? |
Exactly. This is why people don't like C. Three data-structures that differ only by their distance from the root of the tree each have their own independently unique type.
Of course, it's a table of height 3, so level 2 points at 1 and level 2 points at level 1 which contains the leaf nodes. That means I can hand-optimize away some of the conditional checks at each level, so that the copies of the code ends up littered with special cases. |
Then describe that in comments, it's not obvious without any explanation. |
609f5bb
to
975812c
Compare
|
We don't usually have descriptive comments on helper functions (unless the work they do is non-intuitive). In this case, I've just split out the existing iterators into their own functions. |
These aren't intuitive - the problem isn't that they're repetitive, it's that they look copy pasted but are each slightly different in ways that aren't at all obvious if you didn't write the code. For maintainability of the code base, please explain the scheme better. We need many many more comments throughout, and confusing aspects are the best place to fix that first. |
975812c
to
e5af499
Compare
src/gc.h
Outdated
// - pagetable0_t: the top/root level | ||
// - pagetable1_t: the middle level | ||
// - pagetable2_t: the bottom/leaf level | ||
// Corresponding to these simlar structures is a large amount of repetetive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
repetetive -> repetitive (here and below)
src/gc.h
Outdated
// various conditions at each level. | ||
|
||
// The following constants define the branching factors at each level. | ||
// They constants and GC_PAGE_LG2 must therefore sum to sizeof(void*). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they -> the
b322044
to
599d6e2
Compare
the gc.h comment helps, but I meant inline on the functions I referred to - the pieces that differ, when a function called 0 refers to a variable called 1, or a variable called 1 has a field named 0, etc, is what's confusing and could use point-of-use indications of intent |
599d6e2
to
a1f8ff9
Compare
src/gc.h
Outdated
// - pagetable0_t: the bottom/leaf level (covers the contiguous addresses) | ||
// - pagetable1_t: the middle level | ||
// - pagetable2_t: the top/leaf level (covers the entire virtual address space) | ||
// Corresponding to these simlar structures is a large amount of repetitive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similar
gc_clear_mark_*
, gc_verify_tags_*
, gc_stats_pagetable*
, and gc_count_pool_pagetable*
still look misleading and could use comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are debugging functions not actually part of he gc. They are usually commented out, so I wouldn't worry about them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course you wouldn't, you wrote them
rather than using multiple enormous flat regions, this divides all memory into a 2-3 level page map
a1f8ff9
to
6baa6c8
Compare
edit: wasn't #21308, since https://build.julialang.org/builders/package_tarballppc64le/builds/143 worked fine |
From local testing, this doesn't seem to have any performance impact. To get virtual address space to line up with physical allocation a bit more would require also adjusting openblas to avoid allocation of it's 512 MB buffer (32 MB * 16 threads) during initialization.
fix #17987
alterate fix for #10390