Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Unicode related error messages #11573

Merged
merged 1 commit into from
Jun 5, 2015

Conversation

ScottPJones
Copy link
Contributor

This improves the error messages for the Unicode support, in utf8.jl, utf16.jl, and utf32.jl
They will save the position in the string where the error occurred, as well as the character or code unit that caused the error, and the show method for the UnicodeError exception type displays that extra information.

@tkelman tkelman added the domain:unicode Related to unicode characters and encodings label Jun 4, 2015
@@ -39,7 +29,7 @@ function next(s::UTF16String, i::Int)
elseif length(s.data)-1 > i && utf16_is_lead(s.data[i]) && utf16_is_trail(s.data[i+1])
return utf16_get_supplementary(s.data[i], s.data[i+1]), i+2
end
throw(ArgumentError("invalid UTF-16 character index"))
throw(UnicodeError(UTF_ERR_INVALID_INDEX,0,0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the GC frame concern still apply here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuyichao Please get that fixed! 😀

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've actually got a new idea this morning and would like to try out. (Sth like a throw local noinline). Maybe that can bring down the gap between #11284 and #11508 .

@tkelman tkelman added the domain:error handling Handling of exceptions by Julia or the user label Jun 4, 2015
@@ -84,6 +84,8 @@ include("iterator.jl")
include("osutils.jl")

# strings & printing
include("utferror.jl")
include("utftype.jl")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally superficial, sorry... I think this would be a tiny bit better as utftypes.jl ?

p.s. thank you thank you thank you for opening this separately. 👍

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

Would be nice to test at least a few of these error paths if we can do that yet. Address the comments inline and I'd say this is an easy merge, thanks!

UTF32String(data::Vector{UInt32}) = UTF32String(reinterpret(Char, data))

const empty_utf16 = UTF16String(UInt16[0])
const empty_utf32 = UTF32String(UInt32[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these wait to be added until the commit that first uses them?

@ScottPJones
Copy link
Contributor Author

The full testing really happens when you have utfcheck.jl, which I'm getting ready right now, but I did add some more in test/unicode.jl

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

The full testing really happens when you have utfcheck.jl, which I'm getting ready right now, but I did add some more in test/unicode.jl

Yes, that makes sense that the coverage of this will go way up with the more extensive changes that will come separately. Tests of the error paths would've caught the typos like having UTF_ERR_ODD_32 instead of UTF_ERR_ODD_BYTES_32. Push any updates as they're ready.

Would be nice to make your commit messages a bit more descriptive too, if possible.

@@ -0,0 +1,30 @@
# This file is a part of Julia. License is MIT: http://julialang.org/license

##\brief Error messages for Unicode / UTF support
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you did there. At least the backslash is less confusing here than @. @jakebolewski what do you think, can we live with this? Or merge anyway and delete it afterwards if we really don't like the doxygen-style comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see the point about looking like a macro... that was done because in Python, for a docstring, you must use @ instead of \, because \ is an escape character inside a string.
Maybe for Julia, to put doxygen information into doc strings, we'd need to use a different start character, to avoid the confusion with macros, and then convert it to @ before passing them to
the doxygen processor.
The only difference now between "plain text" and working doxygen (it actually is the valid Python doxygen style), is the added \. Whether or not doxygen use becomes prevalent (see #11123) in the rest of julia, that information is very useful, so I'd hope you'd accept having a few extra \ characters...
If nobody starts documenting the code in Base, it never is going to happen!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments are awesome. Extra annotations in the comments that don't do anything yet and don't mean anything to the majority of us who've probably never used doxygen are just noise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but to me, they help a lot when working on the code, even without the very nice stuff that doxygen builds from them... (have you ever tried doxygen on a very large project? It really is helpful...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, but you're contributing to a pre-existing large project that doesn't use doxygen (yet, maybe that will change but that's not for this PR to worry about), and at least 2 core contributors have told you to knock it off with the doxygen annotations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@one-more-minute You've convinced me... (as a matter of fact, I've worked for a while doing just that, pulling information out of unstructured data... I should have thought of that!).
However, don't you need at least some standardization of tags, so you could pick out which part documents the parameters, which part the return value, which part the errors thrown (if any)?
For example: I've seen ### Arguments:, ### Returns:, Inputs:, Outputs:, ### Arguments, ### Keyword Arguments, etc... as you see, no consistency...
It doesn't even have to be complete consistency, it can be a many-to-one mapping... fairly free form, as long as there aren't too many options for the metadata parser to figure out, or for people to understand.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would at least give other Julia contributors time to evaluate the format, good or bad.

Well I've been evaluating by spending more time than I'd like reviewing your PR's, and so far I don't like the doxygen markup. But I don't care enough to let it block this PR which is otherwise worth merging, so I'm just going to do that now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I was just going to change this to using something more along the lines of Mike's idea... 😀 I guess I can do that in a later PR, as soon as the dust as settled a bit on the parsing Markdown ideas... Thanks! (and thanks for taking the time, and pushing back on the doxygen format, because now I will push for something better for documentation, which will still allow me to generate output for doxygen, but which fits better with Julia, and which I hope you do like)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, don't you need at least some standardization of tags

Sure, I think it's completely reasonable to settle on some conventions eventually (these things usually happen naturally anyway). I think @catawbasam is right about side issues, though, so let's keep that discussion focussed in #8966 so that things can move forward here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkelman and I've been spending more time than I'd like changing things so frequently 😀 but nobody has been holding a gun to either of our heads to make you review, or me to respond to your reviews. A long process yes, and I'm sorry for the time it has taken, but I think in the end, the final result for julia is much better. Thanks again for your time.

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

Alright aside from the doxygen nits I think we can merge this one anyway.

Anyone have objections or additional feedback?

@tkelman
Copy link
Contributor

tkelman commented Jun 4, 2015

By the way, reading through my old emails, here's one pretty easy way of making multi-line commit messages from the command line:

git commit -m 'The first line which should be <50 chars

^ be sure to put that first extra newline there, by convention
then go nuts, as descriptive as you like, until you are done
at which point you close the quote'

tkelman added a commit that referenced this pull request Jun 5, 2015
Improve Unicode related error messages
@tkelman tkelman merged commit 47a0eb1 into JuliaLang:master Jun 5, 2015
@ScottPJones ScottPJones deleted the spj/utferror branch June 5, 2015 11:02
@Sacha0 Sacha0 mentioned this pull request Oct 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:error handling Handling of exceptions by Julia or the user domain:unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants