Skip to content

Conversation

@jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented Apr 9, 2024

This is a breaking change, but it was a bug that it subtyped DenseVector previsouly, as there is no guarantee that CodeUnits are stored densely in memory.

Test failures currently seem unrelated

See #53996

This is a breaking change, but it was a bug that it subtyped `DenseVector`
previsouly, as there is no guarantee that CodeUnits are stored densely in memory.
@jakobnissen jakobnissen added minor change Marginal behavior change acceptable for a minor release needs pkgeval Tests for all registered packages should be run with this change needs nanosoldier run This PR should have benchmarks run on it bugfix This change fixes an existing bug labels Apr 9, 2024
@jakobnissen
Copy link
Member Author

As mentioned in #53996, since this is a breaking bugfix, this needs a PkgEval + Nanosoldier run in case someone relied on its supertype.
Depending on the results of these runs, it may be triage worthy.

@mbauman
Copy link
Member

mbauman commented Apr 9, 2024

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier runtests()

Can we do both in the same comment?

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@jakobnissen
Copy link
Member Author

Nanosoldier report looks fine - some suspicious numbers for BitSet but I can't see how that could be related.

@Zentrik
Copy link
Member

Zentrik commented Apr 10, 2024

Those are a couple benchmarks going from 10ns on master to 20ns and the rest of the benchmarks on BitSet seem fine (https://tealquaternion.camdvr.org/compare.html?end=54002&showRawData=true&nonRelevant=true&name=BitSet), so probably just noise.

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@jakobnissen
Copy link
Member Author

There are some legitimate package errors in there: PyCall, StringViews, StrBase, Base58.
Small enough that I could probably make PRs to all the packages to fix them, but then again, this might also mean this change causes issues in packages not in PkgEval.

@jakobnissen jakobnissen removed needs nanosoldier run This PR should have benchmarks run on it needs pkgeval Tests for all registered packages should be run with this change labels Apr 10, 2024
@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Apr 10, 2024
@LilithHafner
Copy link
Member

This PR is too breaking.

However, we have a codeunits function. We should, instead, require that strings with non-contiguous memory implement the codeunits function and return something that is not a DenseVector (or if it is, make sure it is a valid DenseVector). This would be a small breaking documentation change.

(In 2.0?, we should probably just? remove DenseVector and make it a trait/interface)

@KristofferC
Copy link
Member

There are some legitimate package errors in there:

It is not enough to look at if a package tests starts to fail based on a change to call it a "legitimate package error". Many times, packages directly test the buggy behavior in the test itself which causes the error while the package code itself is completely fine. For example:

StrBase is literally a direct call to strides on a CodeUnits: https://github.com/JuliaString/StrBase.jl/blob/90ac50265de0f356f414b9d493140fc724b7ae9d/test/basic.jl#L882 so that isn't breaking anything, it's just a nonsense test that tests the very thing that is being changed here.

Or StringViews.jl is directly testing the change that is done here:
https://github.com/JuliaStrings/StringViews.jl/blob/f94b09e5f5056ab17a3f50490e500aee4476479d/test/runtests.jl#L35

PyCall is also just a test that isn't testing any real functionality in the package https://github.com/JuliaPy/PyCall.jl/blob/2f600fbebee50ab0672e153455e3c0fda1694fba/test/runtests.jl#L563

I guess the people calling this "too breaking" just looked at the list of PkgEval results and decided it based upon that but as someone who has to do this a lot of releases, that is insufficient to make any decision at all.

I would say that the current behavior is buggy, it should be changed, and the packages just need to tweak their tests a bit (not change any code in their package) to pass with this.

@jakobnissen
Copy link
Member Author

Some of these failures are significant. The failures you link to, which seem less significant, are just the first to fail in their test suite.

  • For PyCall, pybytes is a user-facing function that no longer work on codeunits. Hence, this will break scripts that call pybytes to pass it into Python. This is shown as an example on the README.md of PyCall, so I'd call this breakage.
  • For StringViews, a large part of the functionality of the whole package is defined as methods with ::DenseStringView, which relies on codeunits being a subtype of dense vector. I.e. if I remove the test you link to here, then it just fails with another test (concretely, regex matching then breaks, because it's only defined for DenseStringViewAndSub)

StrBase though, does pass its tests when removing that one test for strides.

@gbaraldi
Copy link
Member

I agree with Kristoffer (I can't make that triage time :( ). This is a bug, not a behaviour change we are doing willy nilly, if someone expects codeunits do be a dense vector and it's not that's UB, i.e passing it to a C function which then might segfault.

@oscardssmith
Copy link
Member

is there some reason triage's suggested fix wouldn't also fix the bug?

@gbaraldi
Copy link
Member

gbaraldi commented Apr 25, 2024

IMO triages change is more breaking and more subtle because it seems it would basically that that codeunits doesn't take an AbstractString as it's argument but something like a AbstractDenseString

@brenhinkeller
Copy link
Contributor

That seems less likely to actually break existing code -- is there even any extant nontrivially used AbstractString type that does not store its contents densely?

@mbauman
Copy link
Member

mbauman commented Apr 25, 2024

I mean, the point of this PR was precisely to do this experiment. It is one possible fix, but not necessarily the fix.

is there even any extant nontrivially used AbstractString type that does not store its contents densely

Yes. The issue in #53996 is InlineStrings.jl — those are backed by non-mutable structs that don't necessarily land on the heap with a memory address. They do have support for grabbing a pointer through a Ref{<:InlineString}, but it's not a c-compatible one: it's backwards (stride -1), the pointer should really be offset by N for each InlineN, and it's not necessarily null terminated. It's at that c boundary that's at the crux of where many of these failures are coming from, actually.

It's very much worth noting that CodeUnits{T, <:InlineString} is currently broken (and broken loudly) when trying to use codeunits as a dense vector:

julia> pointer(codeunits(inline"hello"))
ERROR: MethodError: no method matching unsafe_convert(::Type{Ptr{UInt8}}, ::String7)

@oscardssmith
Copy link
Member

so the triage suggested change would be to define codeunits(s::InlineString) = codeunits(string(s)) or something like that.

@KristofferC
Copy link
Member

KristofferC commented Sep 28, 2025

so the triage suggested change would be to define codeunits(s::InlineString) = codeunits(string(s)) or something like that.

What does this even mean? string(x) is a no-op for an InlineString (an InlineString is a string). And if you mean String, then that allocates a whole new string for every codeunits call, which breaks the whole reason for the package.

@jakobnissen
Copy link
Member Author

The triage was some time ago, but I remember the reasoning was that it's bad to allocate (especially in a package whose whole reason for existence is avoiding allocations), but bugs are worse. Hence the comment:

This is not a good solution, but it's semantically correct which the status quo isn't.

@KristofferC
Copy link
Member

but bugs are worse

If a package named InlineStrings creates String dozens of times for simple operations, that is also a bug... And the bug here is not even in that package as far as I can see.

@KristofferC
Copy link
Member

KristofferC commented Sep 28, 2025

It's very much worth noting that CodeUnits{T, <:InlineString} is currently broken (and broken loudly) when trying to use codeunits as a dense vector:

julia> pointer(codeunits(inline"hello"))
ERROR: MethodError: no method matching unsafe_convert(::Type{Ptr{UInt8}}, ::String7)

Just pointing out, this is at least not the case today:

julia> pointer(codeunits(inline"hello"))
Ptr{UInt8}(0x000000016f420e50)

I think #51764 changed that.

@KristofferC
Copy link
Member

KristofferC commented Sep 28, 2025

Just a random bug this causes:

julia> using InlineStrings

julia> io = IOBuffer(); write(io, view(codeunits("abcdef"),1:6)); String(take!(io))
"abcdef"

julia> io = IOBuffer(); write(io, view(codeunits(String7("abcdef")),1:6)); String(take!(io))
"\xc0M\xc1m\x01\0"

@KristofferC
Copy link
Member

KristofferC commented Sep 28, 2025

Also, just want to add:

but it's not a C-compatible one: it's backwards (stride -1), the pointer should really be offset by N for each InlineN, and it's not necessarily null-terminated. It's at that c boundary that's at the crux of where many of these failures are coming from, actually.

Even if InlineStrings was made C-compatible (ref JuliaStrings/InlineStrings.jl#82) there is still an issue since write on a DenseVector calls pointer:

julia/base/io.jl

Lines 867 to 877 in c4683c4

GC.@preserve A begin
nb = 0
iter = CartesianIndices(sz′)
for I in iter
p = pointer(A)
for i in 1:length(sz′)
p += elsize(A) * st′[i] * (I[i] - 1)
end
nb += unsafe_write(s, p, elsize(A) * msz)
end
return nb

and the definition of pointer on an AbstractArray is:

pointer(x::AbstractArray{T}) where {T} = unsafe_convert(Ptr{T}, cconvert(Ptr{T}, x))

where the cconvert gets passed along to the inlinestring

cconvert(::Type{Ptr{T}}, s::CodeUnits{T}) where {T} = cconvert(Ptr{T}, s.s)

which returns a Ref (https://github.com/JuliaStrings/InlineStrings.jl/blob/98907bcf790499938372592379c92db65c6dcfc1/src/InlineStrings.jl#L141). However, this Ref is never shielded from GC, so immediately after the pointer(A) call in write, the pointer is invalid even though A itself has been protected from GC. The return value from cconvert is, however, not protected, and that is what we call pointer on. So you get stuff like:

julia> function foo(s)
           GC.@preserve s begin
               p = pointer(s)
               @ccall memchr(p::Ptr{UInt8}, 0x0a::Cchar, 7::Csize_t)::Ptr{Nothing}
           end
       end;

julia> function foo_ptr(s)
        GC.@preserve s begin
            @ccall memchr(s::Ptr{UInt8}, 0x0a::Cchar, 7::Csize_t)::Ptr{Nothing}
        end
       end;

julia> s = "abc\nss\n";

julia> s2 = SVector(Tuple(collect(codeunits(s))));

julia> foo(s2)
Ptr{Nothing}(0x0000000000000000)

julia> foo_ptr(s2)
Ptr{Nothing}(0x000000016f5c4e2b)

where the returned pointer in foo_ptr is nonsense, while using the ccall directly on the array (which does not go via pointer) works fine.

So I would say the issue is slightly broader than what is discussed. It is that anything that is an AbstractArray and isbits (that defines a Ref for its cconvert) gets a free bogus pointer method defined on it, which can now cause memory corruption if it is used. write on a CodeUnit is one case where this triggers are being hit but there might be more.

(based on some discussion with @jakobnissen in Slack)

@adienes
Copy link
Member

adienes commented Sep 28, 2025

breadcrumb: #51962

@vtjnash vtjnash added backport 1.12 Change should be backported to release-1.12 forget me not and removed needs decision A decision on this change is needed labels Oct 2, 2025
@DilumAluthge
Copy link
Member

@vtjnash Can I "assign" this PR to you to shepard, so that it doesn't get forgotten?

@DilumAluthge
Copy link
Member

I'll mark this as "waiting for author", because it looks like there are some new review comments for @jakobnissen to address.

@KristofferC KristofferC mentioned this pull request Oct 6, 2025
47 tasks
@KristofferC KristofferC mentioned this pull request Oct 21, 2025
18 tasks
@adienes adienes removed the backport 1.12 Change should be backported to release-1.12 label Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix This change fixes an existing bug minor change Marginal behavior change acceptable for a minor release

Projects

None yet

Development

Successfully merging this pull request may close these issues.