Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster String allocation #19449

Merged
merged 5 commits into from
Jan 8, 2017
Merged

faster String allocation #19449

merged 5 commits into from
Jan 8, 2017

Conversation

JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Nov 29, 2016

This replaces the Array inside String with a length field, plus directly asking the GC to allocate space for the string data in-line. This is very incomplete so far, and is only meant to demonstrate the basic idea. The following gist has a hacked-up working version that can be used within an existing build: https://gist.github.com/JeffBezanson/6b7f1785bb7f2509cbd5d4ff1380556d

This makes string allocation ~2x faster, use 1 object instead of 2, and use around half the space for short strings. It's not pretty, but I think we should go with something like this to fix the immediate performance problem. Also removing the .data field as soon as possible will make it easier to change the representation in the future if we come up with something nicer.

@JeffBezanson JeffBezanson force-pushed the jb/fasterstring branch 2 times, most recently from 9b881ff to 72ffe2d Compare November 29, 2016 16:14
@JeffBezanson JeffBezanson added breaking This change will break code performance Must go faster strings "Strings!" labels Nov 29, 2016
d = s.data
i = length(d)
@inbounds while i > 0 && is_valid_continuation(d[i])
p = pointer(s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a gcroot intrinsic with a semantic similar to ccall to make this usage gc safe.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, that would be good.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what would be a good api? I'm mainly not sure about how to express the scope a gcroot applies to.....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now we should make the scope the whole enclosing function.

data::Array{UInt8,1}
# required to make String("foo") work (#15120):
String(d::Array{UInt8,1}) = new(d)
type String <: AbstractString
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not immutable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume so that the compiler would pass it correctly. (It'll be a isbits type otherwise)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct; the presence of the string data is not reflected in the layout of the type, so it would not be passed correctly. Of course that's only the beginning of the problems caused by a hack like this, requiring special cases e.g. in deepcopy.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you addressed the aforementioned "problems" including the special deepcopy implementation?

@@ -57,8 +57,8 @@ function unsafe_string(p::Union{Ptr{UInt8},Ptr{Int8}})
ccall(:jl_cstr_to_string, Ref{String}, (Ptr{UInt8},), p)
end

convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data
convert(::Type{Array{UInt8}}, s::AbstractString) = String(s).data
#convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be some mechanism to get a vector of bytes equivalent to a copy of the string after this is done?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that functionality is still pretty useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can definitely still have that. It might even be possible for an array to share string data in-place.

Copy link
Contributor

@yuyichao yuyichao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this also needs a special case in the gc push_root before the pointerfree case. so that gc_setmark will see the right size.

@JeffBezanson
Copy link
Member Author

Some interesting issues so far:

  • chomp!: changing the size of a string in place is not safe, since that can confuse the GC about how the object was allocated.
  • We need non-copying vector-to-string in IOBuffer. Some options are (1) keep 16 bytes available at the start of an IOBuffer's vector so its data can be reinterpreted as a String, or (2) use a String as the storage for an IOBuffer. Both probably have the same problem as chomp!. We might need to use a bit of the length field to mark whether the string was pool allocated.

@JeffBezanson
Copy link
Member Author

Next issue: this representation does not support unsafe_wrap. Wrapping an arbitrary pointer inherently requires an extra indirection, which I suspect is not worth it. We could also potentially add a PointerString type somewhere that does this.

@JeffBezanson JeffBezanson force-pushed the jb/fasterstring branch 2 times, most recently from 133d41d to d72ffe2 Compare December 21, 2016 19:10
@JeffBezanson JeffBezanson changed the title WIP/RFH: possible simple approach to faster String allocation RFC: faster String allocation Dec 21, 2016
@JeffBezanson
Copy link
Member Author

This has passed CI and is ready for review.

The last commit implements the following scheme for copy-avoidance when converting between strings and vectors:

  • A String can be wrapped by a Vector, since Arrays already have the ability to wrap pointers owned by other objects. Accessed by convert(Vector{UInt8}, str). This way you can still access the data as a vector without copying the whole string, only paying the cost to allocate the Vector when needed.
  • jl_gc_alloc_buf returns objects with String-compatible layout; there is an extra word to hold the length. IF a Vector contains such a buffer, it can be converted in-place to a String. This does not apply to all arrays. We need some way to decide when to use this mode of allocation. For now, I'm using it for all byte arrays, since they often become Strings. This can be discussed.

I'm seeing >=30% speedups for #19141, and the spellcheck benchmark.

We need to decide what to do about unsafe_wrap(String, p). We can remove it, or deprecate it to e.g. unsafe_wrap(ArrayString, p).

@stevengj
Copy link
Member

Is there a deprecation mechanism for s.data, rather than giving an error? (This would be possible if . could be overloaded, nudge nudge.)

@@ -437,7 +437,7 @@ function splice_buffer!{T<:Integer}(buf::IOBuffer, r::UnitRange{T}, ins::Abstrac
elseif pos > last(r)
seek(buf, pos - length(r))
end
splice!(buf.data, r + 1, ins.data) # position(), etc, are 0-indexed
splice!(buf.data, r + 1, convert(Vector{UInt8},ins)) # position(), etc, are 0-indexed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be more concise if we could just do Vector(ins).

@@ -368,8 +372,8 @@ unsafe_write(io::IO, x::Ptr{UInt8}, nb::Int) =
write(io::IO, x::UInt8) =
(ccall(:jl_uv_putb, Void, (Ptr{Void}, UInt8), io_pointer(io), x); 1)
function write(io::IO, x::String)
nb = sizeof(x.data)
unsafe_write(io, ccall(:jl_array_ptr, Ptr{UInt8}, (Any,), x.data), nb)
nb = x.len
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably sizeof(x) is defined and would be cleaner than accessing the .len field directly?

all_ascii = (D <: UInt8) || (isascii(eol) &&
isascii(dlm) &&
(!allow_quote || isascii(qchar)) &&
(!allow_comments || isascii(cchar)))
if T === String && all_ascii
return dlm_parse(dbuff.data, eol % UInt8, dlm % UInt8, qchar % UInt8, cchar % UInt8,
if all_ascii && !(D <: UInt8)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D != UInt8? Why this additional condition?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. The whole all_ascii business at the top of this function can probably be removed.

@@ -446,7 +448,7 @@ end
# check for pure ASCII-ness

function ascii(s::String)
for (i, b) in enumerate(s.data)
for (i, b) in enumerate(bytes(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is bytes? I thought we were now using convert(Vector{UInt8}, s) or maybe Vector(s) or Vector{UInt8}(s)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an iterator over bytes that I was trying out. I guess it's not really necessary.

@@ -576,4 +576,4 @@ a starting `crc` integer to be mixed in with the checksum.
"""
function crc32c end
crc32c(a::Array{UInt8}, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, a, sizeof(a))
crc32c(s::String, crc::UInt32=0x00000000) = crc32c(s.data, crc)
crc32c(s::String, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, s, sizeof(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use Union{Array{UInt8},String} for the first argument in a single method?

@stevengj
Copy link
Member

Needs NEWS and probably doc updates.

@stevengj stevengj added the needs docs Documentation for this change is required label Dec 21, 2016
tsz = JL_ARRAY_ALIGN(tsz, JL_CACHE_BYTE_ALIGNMENT); // align whole object
a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype);
JL_GC_PUSH1(&a);
data = jl_gc_alloc_buf(ptls, tot);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the how needs to be initialized to 0 first before calling into the GC again so that the marking code won't try to trace a dangling pointer.

#define jl_buff_tag ((uintptr_t)0x4eade800)
STATIC_INLINE void *jl_gc_alloc_buf(jl_ptls_t ptls, size_t sz)
{
return jl_gc_alloc(ptls, sz, (void*)jl_buff_tag);
void **buf = (void**)jl_gc_alloc(ptls, sz + sizeof(void*), (void*)jl_buff_tag);
return (void*)(buf + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this mis-aligning the buf pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I suppose so. I guess we should only do this for elsz==1.

@tkelman
Copy link
Contributor

tkelman commented Dec 22, 2016

Should accessing the array be a method of convert, reinterpret, or transcode? There was some resistance to using convert between strings and arrays-of-integers during the last string refactoring.

@JeffBezanson
Copy link
Member Author

We've had a conversion from String to Vector{UInt8} that returns .data for quite a while, so I went with that.

@JeffBezanson
Copy link
Member Author

Ah, I see; using convert contradicts the idea that Vector(x) should do what collect does. reinterpret seems like the right function but is pretty verbose.

@stevengj
Copy link
Member

We could define a generic function bytes(x) = reinterpret(UInt8, x), which gives us a concise method for strings and also might be useful elsewhere?

@JeffBezanson JeffBezanson force-pushed the jb/fasterstring branch 2 times, most recently from 81e4251 to 8ea9828 Compare January 4, 2017 19:57
@JeffBezanson
Copy link
Member Author

Does anybody understand the OSX failure here?

@tkelman
Copy link
Contributor

tkelman commented Jan 5, 2017

segfaulting with --precompiled=no

@JeffBezanson
Copy link
Member Author

That's weird. I can't reproduce that on linux.

@JeffBezanson
Copy link
Member Author

Looks like it was a stack overflow loading the system image. I'll try a different approach to serializing the new String type.

@JeffBezanson
Copy link
Member Author

OK, that didn't help.

@tkelman
Copy link
Contributor

tkelman commented Jan 8, 2017

Should be good to merge once appveyor's happy, but since I think it's idle, let's see if anything recent made much difference: @nanosoldier runbenchmarks(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@JeffBezanson
Copy link
Member Author

Looking good. Away we go!

@JeffBezanson JeffBezanson merged commit cfac61d into master Jan 8, 2017
@tkelman tkelman deleted the jb/fasterstring branch January 8, 2017 08:14
arraylist_push(&backref_list, v);
jl_deserialize_struct(s, v, 1);
((jl_typemap_entry_t*)v)->next = (jl_typemap_entry_t*)jl_nothing;
*pn = v;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing gc_wb

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the gc is off here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to create a gc stack but it is still necessary to use write barrier since there can still be old and young objects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't a collection have to happen for an object to become old?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so if you are sure that the parent is always allocated in the same "deserialization session" it should be enough with a comment about it. Looks like it should be fine here since *pn is pointing to either the stack or a v allocated in a previous iteration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@Sacha0 Sacha0 added deprecation This change introduces or involves a deprecation needs news A NEWS entry is required for this change labels May 20, 2017
Sacha0 added a commit to Sacha0/julia that referenced this pull request May 20, 2017
@Sacha0 Sacha0 removed the needs news A NEWS entry is required for this change label May 25, 2017
tkelman pushed a commit that referenced this pull request Jun 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code deprecation This change introduces or involves a deprecation needs docs Documentation for this change is required performance Must go faster strings "Strings!"
Projects
None yet
Development

Successfully merging this pull request may close these issues.