-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Always resize input to zero length in String(::Vector{UInt8}) #26093
Conversation
The docstring should not give the misleading implication that |
Indeed. I've rephrased the docstring, let me know what you think. |
base/strings/string.jl
Outdated
|
||
When possible, the data in `v` will be used directly rather than making | ||
a copy. This is the case for `Vector{UInt8}` arrays returned by [`take!`](@ref) | ||
on a writable [`IOBuffer`](@ref) or by [`read(io, nb)`](@ref). Else, a copy is made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Else -> Otherwise
base/strings/string.jl
Outdated
When possible, the data in `v` will be used directly rather than making | ||
a copy. This is the case for `Vector{UInt8}` arrays returned by [`take!`](@ref) | ||
on a writable [`IOBuffer`](@ref) or by [`read(io, nb)`](@ref). Else, a copy is made. | ||
In all cases, one should assume that the function takes "ownership" of the array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"In all cases" seems overly broad here. Why not just say that this is the case for a Vector
and leave it at that?
test/arrayops.jl
Outdated
@testset "check that resize! is a no-op when size does not change" for T in (Int, UInt8, Int8) | ||
x = Vector{T}(uninitialized, 1) | ||
resize!(x, 0) | ||
unsafe_store!(pointer(x), 1, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this test is valid --- it does not need to be possible to store an element to an empty array. In the future the array's data pointer could be NULL, or tooling could flag this as a memory error, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's pretty hacky. I guess we can leave this one out and just rely on the String
-based test below. I'm not sure whether we can find other cases where this bug would have user-visible consequences we could test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed the test and squashed the commits.
The text about resizing to zero seems a bit misleading to me. If all it did was keep using the same buffer but truncate it to zero length, then you could still mutate the String's memory by pushing elements to the truncated array. The real point is that the memory buffer of the array is "stolen" by the Incidentally, this is one of the reasons that having |
Can you suggest a wording? It's hard to describe the behavior in a simple way while still giving enough details for people who are interested in the performance and copying characteristics. EDIT: feel free to commit changes directly on GitHub if you like. |
88b1248
to
6c58d10
Compare
base/strings/string.jl
Outdated
|
||
If you need to subsequently modify `v`, use `String(copy(v))` instead. | ||
Create a new `String` from a byte vector `v` containing UTF-8 encoded | ||
characters. If `v` is a mutable vector, it will be truncated and any future |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
truncated to zero length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this statement seems false. It only truncates Vector
, not arbitrary mutable subtypes of AbstractVector
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior for other types should be implemented to match, but yes, at the moment this may not be 100% so so I guess this docstring is aspirational.
I agree |
6c58d10
to
369820d
Compare
Maybe we should have
|
I don't like that so much, since (1) there's no type called One place where something like |
e491edc
to
cabe8ad
Compare
NEWS.md
Outdated
* `String(array)` now accepts an arbitrary `AbstractVector{UInt8}` and "steals" the | ||
memory buffer of mutable arrays, leaving the byte vector with an empty buffer which | ||
is guaranteed not to be shared with the `String` object; if the byte vector is | ||
immutable, it simply shares memory with the string and is not truncated ([#26093]). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This promises too much --- we don't have the ability to share arbitrary immutable vector memory with strings. It also doesn't steal the memory of arbitrary mutable arrays, only Vector{UInt8}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you suggest a wording?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've proposed something, let me know what you think.
test/arrayops.jl
Outdated
@@ -2342,4 +2342,4 @@ Base.view(::T25958, args...) = args | |||
@test t[end,1,end] == @view(t[end,1,end]) == @views t[end,1,end] | |||
@test t[end,end,1] == @view(t[end,end,1]) == @views t[end,end,1] | |||
@test t[end,end,end] == @view(t[end,end,end]) == @views t[end,end,end] | |||
end | |||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be reverted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, those damned ending newlines... Fixed.
The basic problem here is that sometimes Another problem is that if you want to make a copy, then |
In theory that seems nice, but in practice 99% of |
9d64ae9
to
2154ea6
Compare
99% of (With |
See also discussion about |
I suspect that a very large portion of cases where people write |
At this point I think this should just be rebased and merged. |
This makes the behavior more predictable than only resizing Vector{UInt8} inputs when they have been allocated via StringVector, as the caller may have obtained them from other code without knowing how they were created. This ensures code will not rely on the fact that a copy is made in many common cases. The behavior is also simpler to document.
Else we set the last element to zero for one-byte element types like UInt8, which means that resizing a vector allocated with StringVector corrupts it. Also add redundant checks in Julia code to avoid a function call.
Well, this led to some fun behavior because MbedTLS was calling string on an input Vector{UInt8} (for no reason really); poor web apps that were just trying to hmac their request bodies..... Hopefully there's not too many cases where this change causes such subtle changes in behavior. |
Currently because `String` steals the buffer allocated, we create a copy before conversion. This may be suboptimal. See JuliaLang/julia#26093
Imagine you are just printing some debug info with |
I think the way to get change here is not to comment on this long-closed issue but to open a new issue to discuss the possibility of migrating away from the current behavior. |
This makes the behavior more predictable than only resizing
Vector{UInt8}
inputs when they have been allocated viaStringVector
, as the caller may have obtained them from other code without knowing how they were created. This ensures code will not rely on the fact that a copy is made in many common cases. The behavior is also simpler to document.This is an alternative to @stevengj's #25846 (I included a few changes from there), as discussed on the #strings Slack channel with @JeffBezanson and @StefanKarpinski.
The second commit fixes a bug I discovered when preparing the first one. It's not actually needed for the first one so better avoid squashing it.