-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
Array resizing fixes and clean up #16893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It's a little sad to see more stuff move into C, though of course a lot of that code was basically C in disguise (lots of |
|
The The |
|
And the write barrier concern above only applies to pointer array, it is perfectly safe to use |
| return a | ||
| end | ||
| _deleteat!(a::Vector, i::Integer, delta::Integer) = | ||
| ccall(:jl_array_del_at, Void, (Any, Int, UInt), a, i - 1, delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this still return a ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's an internal function so it doesn't have to.
| } | ||
| memcpy(newdata + offsnb, (char*)a->data, oldnbytes); | ||
| } | ||
| (void)oldlen; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this accomplish?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suppress compiler warning with assertion off.
ebe9d6a to
b5e6f0d
Compare
base/c.jl
Outdated
| cconvert(::Type{Cstring}, s::String) = | ||
| ccall(:jl_array_cconvert_cstring, Ref{Vector{UInt8}}, | ||
| (Vector{UInt8},), s.data) | ||
| cconvert(::Type{Cstring}, s::AbstractString) = String(s)::String |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= cconvert(Cstring, String(s)::String)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I guess custom string type could use the unsafe constructor.....
|
-1 Moving the code to C does not make it safer. We just need the appropriate checks to only use memcpy/memmove when safe. I think this is too much code churn. I thought we were moving away from the guaranteed 0 byte at the end of byte arrays? The direction we want to go is to make the built-in type a simpler buffer type, and move more of the array code to Julia. |
Move code to C does make it safer since we can guarantee that there's no allocation in between and the array cannot get old because of an unexpected allocation. Another way to solve this is to not use
That means we'll have to do a copy every time someone does a
I don't really think that's the direction either. The buffer type needs to be strongly typed or it can't be used in julia code safely. The buffer type will also kill the inline data optimization for small arrays. The more regression on array we want to avoid, the more we'll need features from the current array implementation and the closer the new buffer type to the current array. I don't really see any problem of using the array type as a buffer. I agree that there are certain features that are useful in other types too (c99 va array member for example) but replace the array implementation with a different one and take the performance hit doesn't feel like the right way to go. |
src/array.c
Outdated
| return new_ary; | ||
| } | ||
|
|
||
| JL_DLLEXPORT jl_array_t *jl_array_vcat_vectors(jl_value_t *arrays) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too high-level a function to put in the runtime. It also might be slower in some cases, since it requires allocating a tuple for the arguments, while the julia version can leave the arguments on the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this should be the only case that there can be (one) more allocation compare to the original version. The allocation of a tuple is much cheaper than an array though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use the copy!/unsafe_copy! API more here? We can use repeated calls to that to implement vcat in julia, and we can use it to implement copy as well. unsafe_copy! is a low-level enough function to move to C if necessary, but it doesn't seem necessary fortunately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really for vcat and copy. The problem is not about the memcpy/memmove functions itself but what we do before it. It is not safe to use unsafe_copy! on a pointer array as long as we have any allocation after the array itself is allocated.
I agree that having a gc-unsafe region concept in julia is useful. However, for it to be remotely useful, we need to be able to statically analyze what can allocate and what can't (also what can iteract with GC and what can't). The vcat and copy case can also be more tricky since we will need a way to express that no allocation or safe points are allowed when we return from jl_new_array but we don't want to disable the GC before calling jl_new_array (or Array{T}(n)) (the C code is written to make sure the array is young when jl_new_array returns).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be that expensive to simply dynamically check that it is still young (almost all cases) ? basically exactly what gc_wb_back does anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but isnt what's missing just a couple manual wb_back/wb_fwd intrinsics ? That's a fundamental basic block that I don't think anybody would object to it being implemented in codegen. I'm actually surprised we never needed those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to have some function like copy! that is safe to use, and can be used to implement vcat. It's a very useful function, since it supports destination and source offsets (and could support strides as well if needed). unsafe_copy! already does an isbits check, so it should be ok to use. The extra tuple allocation is also unacceptable; we have too many performance regressions as it is. So please move vcat back to julia and just use unsafe_copy! instead of memcpy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but isnt what's missing just a couple manual wb_back/wb_fwd intrinsics
How should it be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yep, well maybe for this restricted case the easiest way to do so would be a function-wide (meta nogc) that has a short whitelist of whatever is allowed and implement copy! with that. In that case if it's only a ccall to memcpy (or even a bunch of unsafe_load/stores in a loop if llvm does a good enough job) followed by a wb intrinsic the whitelist should be short enough.
Or maybe implement copy! in C but since both those little features seems like something that we could use/extend in the future...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can implement unsafe_copy! in C. That should solve these issues.
FWIW, using the current unsafe_copy! isn't the right solution since it'll be much more expensive than the current memcpy based implementation for pointer arrays.
When benchmarking this, it seems that the store_unboxed check might have a significant performance effect. I'll try to verify this more carefully and if that is the case I may move some of the similar implementation to C (they call C to allocate the array anyway)
The goal would be to do it without a performance hit. |
Sure, there are a few of the optimizations and features that I don't think can be implemented without a lot of the features of the current optimization. (e.g. that high dim arrays can't be resized and In any case, that discussion is unrelated to this PR. The functions in this PR will be equally useful on the buffer type if we have that. Assuming we don't want to loose the ability to use |
|
Can this be handled by manually introducing gc-unsafe regions in the julia code? |
I believe it's much harder (needs more special cases in the codegen to know what code can or cannot allocate/trigger GC), more likely to have performance regression (we need to disable GC, which is a thread synchronization), and will encourage people to disable GC (since that's how we can implement it). This also won't help |
14f1ccc to
16fb745
Compare
src/array.c
Outdated
| } | ||
|
|
||
| // Copy element by element until we hit a young object, at which point | ||
| // we can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can what? don't leave me hanging!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
continue using memmove.
Apparently got interrupted while writing this comment....
bd535d7 to
23c75fa
Compare
|
CI passed a few times. Local GC stress tests passed. Local performance tests checks out. AFAICT, almost all of the functions touched by this PR should have the same or better performance (especially fast path). The only exception is Some additional notes about the implicit NUL byte. The guarantee we had before and relies on is that, a byte array that has only be resized at the end always has the implicit NUL byte. I kept this behavior since it seems to be way too breaking if that is changed. A macro In the long term, I believe we should at least keep the guarantee to allocate one more implicit byte for the byte array since otherwise the Another issue about using the |
dcf3413 to
d463b8e
Compare
* Make sure that newly allocated arrays are always young * Micro optimize `sizehint!` * Implement `copy(::Array)` in C to avoid calling `memcpy` that bypasses the write barrier.
* Always call `cconvert` before calling `unsafe_convert`. * Optimize parse functions for non-NUL terminated input. * Use `Cstring` for `ccall`s that's expecting a NUL terminated string
* Use it in `jl_load_` * Remove the length parameter from `jl_load`, `jl_load_file_string` and `jl_parse_eval_all` to NOT pretend they support non-NUL-terminated strings.
Add tests for implicit extra byte check.
* Move some `memcpy` and `memmove` from julia to C This is dangerous especially for `ptrarray` since it bypasses the write barrier. * Make sure resizing a ptrarray always have the new memory cleared * Add more test for shared array resizing
* Use `unsafe_copy!` instead of `memcpy` in `vcat` to avoid bypassing the write barrier. * Add test for `copy!` on `#undef` and `unsafe_copy!` with memory alias.
|
Will merge tomorrow if no comments. |
|
|
||
| include("test_sourcepath.jl") | ||
| thefname = "the fname!//\\&\0\1*" | ||
| thefname = "the fname!//\\&\1*" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this being removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We never actually support filenames (either real or fake ones) that are not NUL-terminated strings. E.g., for real filenames, the jl_stats call assumes NUL termination, for both real and fake ones, every functions that uses jl_filename assumes it is NUL terminated. It might be possible to go through everything and fix them but it would be much more work and I highly doubt supporting embeded NUL or non-NUL terminating strings as filename is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how were we previously passing this test then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value was accepted but is basically treated as a C string up to the NUL character. It is never used anywhere or checked against anything. You can see this by doing something that actually uses the filename.
julia> thefname = "the fname!\0and this part is missing"
"the fname!\0and this part is missing"
julia> include_string("include_string_test() = error()", thefname)()
ERROR:
in include_string_test() at ./the fname!:1
in eval(::Module, ::Any) at ./boot.jl:234
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46Note that the filename for include_string_test doesn't have anything after the \0 byte.
The commit remove the support for non NUL-terminating string explicitly so passing a malformed string like that throws an error.
|
lgtm. |
| // No need to explicitly unshare. | ||
| // Shared arrays are guaranteed to trigger the slow path for growing. | ||
| size_t n = jl_array_nrows(a); | ||
| if (idx < 0 || idx > n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be >= ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is allowed to grow at the end of the array (jl_array_grow_end below also calls jl_array_grow_at_end with inc == n). In general, there are n + 1 positions one can insert elements in an n elements array, which are represented as 0 to n here.
|
|
||
| function unsafe_copy!{T}(dest::Ptr{T}, src::Ptr{T}, n) | ||
| # Do not use this to copy data between pointer arrays. | ||
| # It can't be made safe no matter how carefully you checked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For someone like me not in this business, this comment is a bit frightening. Basically when I use this function, dest and src point to arrays, in the C meaning, so I interpret this comment as "never use this function", because my goal is precisely to copy data between "pointer arrays". So it would be useful to clarify your warning, and to put it in the docstring, which is more visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sorry, you meant arrays of pointer! still, would be worth it to move the warning in the docstring.
memcpyandmemmovefrom julia to C (this is dangerous especiallyfor
ptrarray)byte in the array.
Started as removing unsafe
memcpy/memmovefrom julia files and make sure array resizing clear the memory if necessary and fix the implicit NUL byte and #16499 along the way