Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jl_gc_calloc_aligned and friends: obtaining aligned zero initialized memory efficiently #42704

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

mkitti
Copy link
Contributor

@mkitti mkitti commented Oct 19, 2021

This is a rebase, revision, and squash of #22953. The main objective was to develop an aligned and GC tracked calloc to obtain zero initialized memory efficiently. I took out the tagged bit and the renaming of public API functions, and simplified such that the current PR is mainly just adding new code rather than modifying existing code.

My main motivation is that a calloc based zeros improves performances on Windows:
https://discourse.julialang.org/t/faster-zeros-with-calloc/69860

The main new contribution here is adding an offset to make space for saving the original pointer and the alignment. This hopefully addresses the issue in #22953 (comment) by @JeffBezanson . I suspect better space usage is possible in retrospect.

This could be used to address #130 or #9147. At the moment though, no actual change is made except to catch a potential overflow situation in jl_calloc.

An important mechanism is where to put a tagged bit such that arrays created using these new functions can be correctly freed by jl_gc_free_aligned from jl_gc_free_array. Previously, it was proposed that the aligned bit in jl_array_flags_t could be removed and possibly reused for this purpose.

An alternative approach suggested by stevengj@7c9ab1d#r58133599 is to develop internal calloc and/or posix_memalign implementations based upon musl (MIT License). However, after studying the implementations, I am not sure if they take full advantage of newly zeroed pages from the operating system.

I would appreciate any comments and thoughts about future directions.

Demonstration

julia> p = ccall(:jl_gc_calloc_aligned, Ptr{Int64}, (Csize_t, Csize_t, Csize_t), 2048, sizeof(Int64), 64);

julia> A = unsafe_wrap(Array, p, 2048; own = false)
2048-element Vector{Int64}:
 0
 0
 
 0
 0

julia> sum(A)
0

julia> UInt(p) % 64 # pointer is aligned
0x0000000000000000

julia> p
Ptr{Int64} @0x000000000ace2100

julia> Int(p)
181281024

julia> unsafe_load(p, 0) #p0, original pointer from jl_calloc
181280992

julia> unsafe_load(p, -1) # alignment
64

julia> p - unsafe_load(p, 0)
Ptr{Int64} @0x0000000000000020

julia> unsafe_load(Ptr{Int}(unsafe_load(p,0)),-1) # Number of bytes allocated by jl_calloc / _unchecked_calloc
16463

julia> sizeof(A) + 64 + 8 + 8 - 1  # data + alignment + sizeof(Csize_t) + sizeof(Ptr{Nothing}) - 1
16463

julia> p2 = ccall(:jl_gc_calloc_aligned, Ptr{Int64}, (Csize_t, Csize_t, Csize_t), 8192*8192, sizeof(Int64), 64)
Ptr{Int64} @0x000000009fff4080

julia> A2 = unsafe_wrap(Array, p2, 8192*8192; own = false);

julia> @btime fill!($A2, 0);
  31.866 ms (0 allocations: 0 bytes)

julia> @btime unsafe_wrap(Array, ccall(:jl_gc_calloc_aligned, Ptr{Int64}, (Csize
_t, Csize_t, Csize_t), 8192*8192, sizeof(Int64), 64), (8192,8192); own = false)
  12.414 ms (2 allocations: 512.00 MiB)

julia> @btime zeros(8192, 8192);
  128.934 ms (2 allocations: 512.00 MiB)

julia> versioninfo()
Julia Version 1.8.0-DEV.756
Commit ce011a82cb* (2021-10-18 17:13 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i9-9880H CPU @ 2.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

Squashed Commit Log (edited for what remains)

commit 927da5f067f51700c1d5ffb22304382710b1643b
Author: Mark Kittisopikul [email protected]
Date: Tue Oct 19 05:52:50 2021 -0400

Export jl_gc_*_aligned, jl_gc_managed_calloc

...

Fix #42673 by checking for unsigned integer wrapping for jl_calloc
Reused former isaligned tagged bit for howtofree tagged bit
Retained all existing functions
jl_gc_aligned defaults to jl since macOS guarantees page alignment
offset alignment to hold original (void *) p0, and size_t align

commit 842bf06e785d3be97558083abe181505c6840549
Author: Mark Kittisopikul [email protected]
Date: Sun Oct 17 02:20:11 2021 -0400

Undo renaming of jl_calloc, jl_free, and jl_realloc

commit 4bb083a5f6b44766632dd394397257ebafc91726
Author: Daniel Matz [email protected]
Date: Fri Nov 4 08:59:43 2016 -0500

Disambiguate jl_malloc from jl_malloc_aligned

Add gc counted, aligned malloc

commit 927da5f067f51700c1d5ffb22304382710b1643b
Author: Mark Kittisopikul <[email protected]>
Date:   Tue Oct 19 05:52:50 2021 -0400

    Export jl_gc_*_aligned, jl_gc_managed_calloc

commit ce011a82cba044f88a962b6da24a02bd105fed3e
Author: Mark Kittisopikul <[email protected]>
Date:   Mon Oct 18 13:13:09 2021 -0400

    Simplify, reduce diff versus master

commit af4cf94520cc2c9fd5008b929c6e4ef2cd55bca3
Author: Mark Kittisopikul <[email protected]>
Date:   Mon Oct 18 03:40:07 2021 -0400

    Mixed and working

Fix JuliaLang#42673 by checking for unsigned integer wrapping for jl_*calloc*
Reused former isaligned tagged bit for howtofree tagged bit
Retained all existing functions
jl_gc_*_aligned defaults to jl_* since macOS guarantees page alignment
offset alignment to hold original (void *) p0, and size_t align

commit 842bf06e785d3be97558083abe181505c6840549
Author: Mark Kittisopikul <[email protected]>
Date:   Sun Oct 17 02:20:11 2021 -0400

    Undo renaming of jl_calloc, jl_free, and jl_realloc

commit 9594400d207a9f8baac832af5dd0830165d1192e
Author: Daniel Matz <[email protected]>
Date:   Fri Nov 11 18:31:54 2016 -0600

    Remove the isaligned array flag

commit 4bb083a5f6b44766632dd394397257ebafc91726
Author: Daniel Matz <[email protected]>
Date:   Fri Nov 4 08:59:43 2016 -0500

    Disambiguate jl_malloc from jl_malloc_aligned

    Add gc counted, aligned malloc
@mkitti
Copy link
Contributor Author

mkitti commented Oct 19, 2021

Perhaps @yuyichao could also review?

src/gc.c Outdated Show resolved Hide resolved
Comment on lines +3558 to +3562
// TODO add special casing for macOS, where there is guarantee of alignment
// - DONE, just use jl_malloc, jl_calloc, jl_realloc, jl_free since always aligned
// TODO add special casing for 64 bit systems when 16 byte alignment is requested
// TODO add checks on align?
// - Enforce posix_memalign reqs of power of 2 multiple of sizeof(void *)?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO add special casing for macOS, where there is guarantee of alignment
// - DONE, just use jl_malloc, jl_calloc, jl_realloc, jl_free since always aligned
// TODO add special casing for 64 bit systems when 16 byte alignment is requested
// TODO add checks on align?
// - Enforce posix_memalign reqs of power of 2 multiple of sizeof(void *)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may have come from previous rounds of review on prior iterations. If you don't mind, I'll keep this around for a bit longer until we have more eyes on this.

mkitti and others added 2 commits October 20, 2021 18:24
Adjust overflow detection condition

Co-authored-by: Jameson Nash <[email protected]>
@mkitti
Copy link
Contributor Author

mkitti commented Oct 22, 2021

I separated the jl_calloc change to #42761 since the rest of this may take longer to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

jl_calloc is subject to unchecked unsigned integer wrapping
2 participants