-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restore non-copying String behavior, add unsafe_string*, ... #16731
Conversation
Another possible name, based on @nalimilan's suggestion, would be |
If we're going to be renaming both these things, +1 to using a more general |
@@ -159,7 +159,7 @@ function strftime(fmt::AbstractString, tm::TmStruct) | |||
if n == 0 | |||
return "" | |||
end | |||
return String(pointer(timestr), n) | |||
return unsafe_string(pointer(timestr), n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think String(timestr[1:n])
would be better here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the same pattern is used in several other places. Should they also be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably. But there are also a lot of unsafe_string(pointer(buffer))
usages that are not so easily changed, because they use nul-termination to compute the string length.
The |
end | ||
# load file | ||
if !isempty(ARGS) && !isempty(ARGS[1]) | ||
# program | ||
repl = false | ||
# remove filename from ARGS | ||
global PROGRAM_FILE = String(shift!(ARGS)) | ||
global PROGRAM_FILE = shift!(ARGS) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a typeassert just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARGS
is a Vector{String}
, so a typeassert seems superfluous. The only reason for the String
call here was that this used to be UTF8String(shift!(ARGS))
and in those days the conversion was needed to avoid having the type of PROGRAM_FILE
depend on the arguments. I think it just got auto search-and-replaced by String
when @StefanKarpinski made the switch.
+1 for the cleanup! I wonder about |
This function is labelled "unsafe" because it will crash if `pointer` is not | ||
a valid memory address to data of the requested length. | ||
""" | ||
unsafe_array_wrapper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cosmetic point: maybe more logical to document function unsafe_array_wrapper end
before its methods definitions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't document it before it is defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can define it with the zero-method function end
syntax though, just for the purposes of documenting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. (It would be good if this were explained in the manual. There's an obscure comment about the function end
syntax being "preferred" with no explanation.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it would. we should write down a list of new things that are poorly documented that you've been noticing.
Renamed to |
open(fname) do io | ||
Mmap.mmap(io, Vector{UInt8}, (Int(fsz),)) | ||
fsz = filesize(input) | ||
if use_mmap && fsz > 0 && fsz < typemax(Int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is from the original code, but I think the fsz > 0
check is probably because of #10516, so this check can be removed now.
And I don't understand the typemax(Int)
check.... is there any circumstance in which fsz > typemax(Int)
and readstring(input)
will succeed? And why would mmap
fail if fsz == typemax(Int)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current mmap
actually requires len < typemax(Int) - PAGESIZE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And why don't we use mmap
by default on Windows? ... ah, that is #4664. In general, this code probably needs revisiting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
many people want to see this file removed from base entirely
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes I think this drive to push things out of Base is venturing too much into purism for its own sake. Julia's core market is technical users, and the "batteries included" philosophy is very appealing there. Also, because Julia is so rapidly evolving, it is a struggle to keep packages up to date, whereas including key functionality in Base makes it much easier to update it along with changes to the core language. (It also makes it easier to evaluate changes, because we immediately see their impact.)
But in any case, not a debate for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we constantly get complaints that readcsv is slow. independent developments outside of base are already better. don't want to get into Python's situation of broken batteries that no one uses because the packages are better and the stdlib is frozen. need to limit scope if julia the language is going to get to 1.0 soon enough for it to succeed outside of academic early adopters.
things being developed in different repos is just a workflow and automation problem that is easily solvable.
Thank you, @stevengj! This was a wonderful birthday present. 🎂 🎉 🎈 |
I can't reproduce the Travis method ambiguity error on OSX, and it's weird that this failure doesn't occur on any other platform. Maybe I should just rebase and try again? |
Yay, tests are green. |
Was the original ambiguity problem a Julia bug? That should probably be reported as a separate issue. |
I'm not actually sure, since I couldn't reproduce it. I could see how the original method definitions might be ambiguous, though I didn't think about it too hard since the code needed simplifying anyway. |
I don't understand how this is supposed to work. Is this expected? julia> f{T}(::Union{Type{Array},Type{Array{T}}}, ::Integer) = 1
f (generic function with 1 method)
julia> f(Array, 1)
ERROR: MethodError: no method matching f(::Type{Array}, ::Int64)
Closest candidates are:
f{T}(::Union{Type{Array{T,N}},Type{Array}}, ::Integer)
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46
julia> f{T}(::Union{Type{Array},Type{Array{T}}}, ::Int) = 2
f (generic function with 2 methods)
julia> f(Array, 1)
ERROR: MethodError: f(::Type{Array}, ::Int64) is ambiguous. Candidates:
svec(Tuple{#f,Type{Array},Int64},svec(T),f{T}(::Union{Type{Array{T,N<:Any}},Type{Array}}, ::Int64) at REPL[4]:1)
svec(Tuple{#f,Type{Array},Int64},svec(T),f{T}(::Union{Type{Array{T,N<:Any}},Type{Array}}, ::Integer) at REPL[2]:1)
in eval(::Module, ::Any) at ./boot.jl:225
in macro expansion at ./REPL.jl:92 [inlined]
in (::Base.REPL.##1#2{Base.REPL.REPLBackend})() at ./event.jl:46 I think what I don't understand here is:
If I'm not missing something obvious here, I'll report it as a separate issue. |
The type parameter cannot be determined. |
Ref #13702 |
Got it. Though the issue referenced there, #3738, has been closed. I suppose that was a special case, and that case has been fixed? If these things don't work, maybe this line: function unsafe_wrap{T,N}(::Union{Type{Array},Type{Array{T}},Type{Array{T,N}}}
p::Ptr{T}, dims::NTuple{N,Int}, own::Bool=false) should be changed also? Or does the presence of |
Create a string from the address of a C (0-terminated) string encoded as UTF-8. | ||
A copy is made so the pointer can be safely freed. If `length` is specified, the | ||
string does not have to be 0-terminated. | ||
See also [`unsafe_string_wrapper`](:func:`unsafe_string_wrapper`), which takes a pointer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not called that any more
""" | ||
unsafe_wrap(Array, pointer, dims, own=false) | ||
|
||
Wrap a native pointer as a Julia `Array `object. The pointer element type determines the array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``Array object.
A big +1 to merging this. |
…unsafe_string_wrapper, and unsafe_array_wrapper; restore non-copying behavior of String(::Vector{UInt8}) constructor (closes JuliaLang#16470, closes JuliaLang#16713)
…opefully remove occasional method ambiguity errors: overflow is already checked by convert, and sizes ≤ 0 are already checked in jl_ptr_to_array
Rebased. |
I'm guessing that the libgit2 error is unrelated. |
#16555. merge away |
@stevengj This PR in particular introduced https://github.com/JuliaLang/julia/blob/master/stdlib/DelimitedFiles/src/DelimitedFiles.jl#L229 This makes |
@bkamins, |
The problem is that
|
Oh, right, I forgot that It seems a shame to make a copy here, since the input could potentially be quite large. |
This PR closes #16470 and #16713. It:
String(a::Vector{UInt8})
constructor, which once again takes ownership of the array.bytestring(a)
is now deprecated toString(copy(a))
.String(ptr, len)
tounsafe_string(ptr, len)
(this function makes a copy from a pointer).pointer_to_string(ptr, len)
tounsafe_string_wrapper(ptr, len)
unsafe_wrap(String, ptr, len)
, which is hopefully more descriptive and now begins withunsafe
as people seem to prefer. (This function does not copy the date, but simply "wraps" aString
around the data.)pointer_to_array
tounsafe_array_wrapper
unsafe_wrap(Array, ...)
for consistency.(Note that I had to work around #16730 to document theString(a::Vector{UInt8})
constructor by callingdoc!
manually.)In the course of making this change, I had to manually inspect every call to
String(...)
, and I cleaned up many of the call sites somewhat.