-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add isvalid(Type, value) methods, to replace is_valid_* #11241
Conversation
+1 😃 |
I believe in plagiarizing good ideas wherever I find them! 😀 |
Very nice. Needs docs and tests, but I approve of the direction. |
Noting that this will need to go into Compat.jl as well. |
|
||
export isgraphemebreak | ||
|
||
# also exported by Base: | ||
export normalize_string, graphemes, is_valid_char, is_assigned_char, charwidth, | ||
export normalize_string, graphemes, is_valid_char, is_assigned_char, charwidth, isvalid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I like about this commit is that isvalid
is something that is more general than just strings. For this reason, does it actually make sense to export it here (from UTF8proc
) in addition to from Base
? The other things exported on this line are string-specific.
Also, in addition to the interfaces you provide, what about being able to call e.g. isvalid(s::UTF16String)
(without explicitly providing the type as an argument). This interface would match some current uses of isvalid
in julia (for instance, in base/sparse/cholmod.jl
).
Worth considering whether this relates to #3631. +1. Underscores are generally a sign of missing abstraction! |
Hmm I see |
Shall I add that as another PR? |
I think the name |
@StefanKarpinski maybe so, but I agree with @JeffBezanson , that |
Slightly awkward fit, but conceptually, this is |
Yes, |
@shashi Do you mean I need to modify |
@tkelman I've added all the docs and test cases (much more complete than was there for the |
@garrison I just realized though, in the future, I really want those to simply return true, i.e. if those types only contain valid characters or strings. |
OK, docs and tests added, |
@ScottPJones, see https://github.com/JuliaLang/Compat.jl. Generally, for new features, people are encouraged to add versions of those features to Compat.jl in order to make the feature usable on Julia v0.3. |
OK, but how would I get changes from that, which would be on a separate fork, as part of this PR? Or should it be a separate PR? Also, I think a few of my other PRs then could use additions to Compat.jl, do I need to make a separate PR for each one (so that the Compat.jl changes can be merged as each PR [hopefully soon!] gets merged)? Thanks! |
The additions to Compat.jl happen separately (as a PR to that repository) and after the Julia changes are merged. That way, you can calculate the exact version number that the feature was added, and then you only enable the compatibility definitions before that version number. You'll end up adding something like this to Compat: if VERSION < v"0.4.0-dev+XXXX" # calculated by the linked script
export isvalid
isvalid(…) = …
…
end |
OK, so I really can't make the PRs for Compat.jl changes until somebody has pity on me and starts merging in some of my outstanding PRs... 😢 |
I'm just following on the periphery here, but for this PR it'd probably be good to deprecate the methods this replaces. The @deprecate is_valid_ascii(s::Union(Array{UInt8,1},ByteString)) isvalid(ASCIIString, s) It also exports the old symbol, so you should also remove it from the list of explicit exports. And then make sure that the old usages are all removed from Base. |
Do people really want it deprecated right away? I'm happy to do that, or should it be a following PR? |
You should do everything at once. The Compat.jl stuff can come after this is merged. |
OK, I think everything is all set... |
is_valid_utf16(s::UTF16String) = is_valid_utf16(s.data) | ||
isvalid(str::UTF16String) = isvalid(UTF16String, str.data) | ||
is_valid_utf16(str::UTF16String) = isvalid(UTF16String, str.data) | ||
is_valid_utf16(data::AbstractArray{UInt16}) = isvalid(UTF16String, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deprecations are done per-method, and the methods are defined by the deprecate
macro. That way when it comes time to remove them, we just need to delete the section in deprecated.jl and don't need to do any archaeology. Take a look at what macroexpand(:(Base.@deprecate f(x, y) g(2x, 2y)))
does as a simple example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to deprecate these methods, all you need to do is delete the =
, put a @deprecate
in front of it, and move it to deprecated.jl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so all trace of the old method goes away (except in deprecated.jl), right?
I'd already added them to deprecated.jl, just didn't know I could get rid of the definitions in utf16.jl...
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, exactly.
(We really need to have more folks who are qualified to review changes to the unicode/strings codebase. I'm not that person, but I can occasionally make quick glances through change sets like this.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes... if they weren't my own changes, I think I'd be a very good person for reviewing Unicode/strings changes 😀 (I spent 20 years being the gatekeeper to the core code at ISC, so I know whats involved!)
All set, documented and tested... anything else, kind audience? |
I’ve now squashed this, all tests, documentation, and deprecating old is_valid_xxx functions is done... |
@@ -397,6 +411,10 @@ | |||
|
|||
Create a string from the address of a NUL-terminated UTF-32 string. A copy is made; the pointer can be safely freed. If ``length`` is specified, the string does not have to be NUL-terminated. | |||
|
|||
.. function:: is_valid_utf32(s) -> Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is deprecated now, right? I guess we can leave the deprecated functions' docs in, but I'm not sure if we have a doctest to remind us to clear out obsolete docs anywhere.
Was this done in here yet? I don't see it... |
OK, all done, now shows 1 argument |
Anybody up to reviewing this? I can do the Compat.jl changes once it's in... Thanks! |
Looks like this should be okay to merge, but #11241 (comment) wasn't done? |
Yes, it was, days ago... See line 327 of utf32.jl. |
You mean line 102? Then are the methods in |
Oops, sorry, I was looking at my integrated version (that also has #11004)... |
Done! Thanks! |
@@ -95,7 +95,8 @@ sizeof(s::UTF16String) = sizeof(s.data) - sizeof(UInt16) | |||
unsafe_convert{T<:Union(Int16,UInt16)}(::Type{Ptr{T}}, s::UTF16String) = | |||
convert(Ptr{T}, pointer(s)) | |||
|
|||
function is_valid_utf16(data::AbstractArray{UInt16}) | |||
isvalid(::Type{UTF16String}, str::UTF16String) = isvalid(UTF16String, str.data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is covered by the union version in utf32.jl, right?
Do people think this is NEWS-worthy? |
I haven't followed this in any detail, but it seems so to me. |
The single argument form `isvalid(value)` can now be used for values of type `Char`, `ASCIIString`, | ||
`UTF8String`, `UTF16String` and `UTF32String`. | ||
The two argument form `isvalid(type, value)` can be used with the above types, with values | ||
of type `Vector{UInt8}`, `Vector{UInt16}`, `Vector{UInt32}`, and `Vector{Char}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should put ([#11241])
at the end of the NEWS item (to reference the PR) and run julia doc/NEWS-update.jl
(to insert the appropriate URL at the end).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks, didn't know about that... (learning as fast as I can!), all done now.
The single argument form `isvalid(value)` can now be used for values of type `Char`, `ASCIIString`, | ||
`UTF8String`, `UTF16String` and `UTF32String`. | ||
The two argument form `isvalid(type, value)` can be used with the above types, with values | ||
of type `Vector{UInt8}`, `Vector{UInt16}`, `Vector{UInt32}`, and `Vector{Char}`. ([#11241]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, you should put parenthetical citations inside punctuation (like this).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did... ([#11241]).
Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see, before the . I'll fix...
I was about to hit merge on this one because I think it's ready, are there any remaining objections to that? |
LGTM. |
Great, this is an excellent cleanup. Will be good to get these new methods into Compat.jl when you get a chance. |
Add isvalid(Type, value) methods, to replace is_valid_*
Thanks! I'll get this into Compat.jl now. |
This adds a new generic function, intended to eventually replace the following:
is_valid_ascii
is_valid_utf8
is_valid_utf16
is_valid_utf32
is_valid_char
This was suggested by @quinnj on the thread for #11140