Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement titlecase function. Fix #19465. #19469

Merged
merged 1 commit into from
Dec 6, 2016

Conversation

helgee
Copy link
Contributor

@helgee helgee commented Dec 1, 2016

No description provided.

@@ -403,6 +403,25 @@ julia> lowercase("STRINGS AND THINGS")
lowercase(s::AbstractString) = map(lowercase, s)

"""
titlecase(s::AbstractString)

Returns `s` with all initial characters converted to titlecase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you define "initial character"? Maybe also explain what's exactly titlecase.

```
"""
function titlecase(s::AbstractString)
words = split(lowercase(s))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why call lowercase? That's not mentioned in the docstring.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the behaviour I know from e.g. Python. The question is whether we want this.

This might be more useful

julia> str = "ISS - international space station"
julia> titlecase(str)
"ISS - International Space Station"

than this

julia> str = "ISS - international space station"
julia> titlecase(str)
"Iss - International Space Station"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, the first one sounds more useful, and you can always get the second one by calling lowercase first (while the contrary isn't possible).

@stevengj stevengj added the unicode Related to unicode characters and encodings label Dec 1, 2016
@helgee
Copy link
Contributor Author

helgee commented Dec 1, 2016

Thanks for the feedback @nalimilan and @stevengj.

@test titlecase('lj') == 'Lj'
@test titlecase("ljubljana") == "Ljubljana"
@test titlecase("aBc ABC") == "ABc ABC"
@test titlecase("abcD EFG\n\thij") == "AbcD EFG\n\tHij"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a couple of spaces after D to make sure multiple spaces are preserved.

@@ -403,6 +403,31 @@ julia> lowercase("STRINGS AND THINGS")
lowercase(s::AbstractString) = map(lowercase, s)

"""
titlecase(s::AbstractString)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to add a placeholder to doc/stdlib/strings.rst for titlecase to be included in the manual, and re-run julia doc/genstdlib.jl

@stevengj
Copy link
Member

stevengj commented Dec 1, 2016

Needs a NEWS.md item, too

@helgee helgee force-pushed the titlecase branch 2 times, most recently from fdbfac9 to 30b4b7f Compare December 1, 2016 18:36
@helgee
Copy link
Contributor Author

helgee commented Dec 1, 2016

Done.

@stevengj
Copy link
Member

stevengj commented Dec 1, 2016

LGTM.

@@ -62,6 +62,8 @@ Library improvements

* New `accumulate` and `accumulate!` functions, which generalize `cumsum` and `cumprod`. Also known as a [scan](https://en.wikipedia.org/wiki/Prefix_sum) operation ([#18931]).

* New `titlecase` function, which capitalizes the first character of each word within a string ([#19469]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should doc/NEWS-update.jl be run in order to create the link?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we are holding off on running that... we will update a bunch of links at once in a separate PR. Updating the NEWS links in individual PRs creates too many merge conflicts.

Clarify docstring.

Single-pass version without call to lowercase.

Test subsequent spaces.

NEWS.md entry and RST docs.

Add NEWS.md links.
@stevengj
Copy link
Member

stevengj commented Dec 1, 2016

(@helgee, no need to rebase and squash the commits in the future. Github does that for you when we hit the "squash and merge" button.)

@helgee
Copy link
Contributor Author

helgee commented Dec 1, 2016

@stevengj Good to know. IIRC Tony asked me to squash last time. Shall I revert the NEWS.md link changes?

@StefanKarpinski
Copy link
Member

GitHub added the feature relatively recently, so last time it probably was necessary. No need to revert anything though.

@stevengj
Copy link
Member

stevengj commented Dec 2, 2016

Travis failure seems to be an unrelated problem with cmake on OSX that has been appearing in several PRs.

@stevengj
Copy link
Member

stevengj commented Dec 2, 2016

Hmm, maybe we should be following the algorithm in UTR#21.

@helgee
Copy link
Contributor Author

helgee commented Dec 2, 2016

That algorithm does not handle the desired exceptions (e.g. acronyms) though.

@nalimilan
Copy link
Member

Indeed, I'm torn. Anyway better follow other rules from that algorithm unless we have reasons not to.

@stevengj
Copy link
Member

stevengj commented Dec 2, 2016

Yes, I agree that we shouldn't convert to lowercase, since you could always do titlecase(lowercase(s)) if you want that effect. However, the other recommendations are worth looking at carefully.

@helgee
Copy link
Contributor Author

helgee commented Dec 5, 2016

AFAICT does the currently implemented lowercase also not follow the spec. I do not know anything about Turkish but the following behaviour in Greek

julia> lowercase("OΔΥΣΣΕΥΣ")
"oδυσσευσ" # wrong
"oδυσσευς" # would be correct

is wrong, i.e. the lowercase sigma at the end is the non-final form σ but should be the final form ς instead.

EDIT: Python handles it correctly.

@stevengj
Copy link
Member

stevengj commented Dec 5, 2016

Good point; we should consider a fix to all of them, but for now the simple titlecase implementation is probably sufficient.

@helgee
Copy link
Contributor Author

helgee commented Dec 6, 2016

I will open a new issue for the special case conventions.

Is there anything left to do here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
unicode Related to unicode characters and encodings
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants