diff --git a/base/char.jl b/base/char.jl index 08bed28e86bc9..9daae1ea607fd 100644 --- a/base/char.jl +++ b/base/char.jl @@ -18,7 +18,8 @@ representable in a given `AbstractChar` type. Internally, an `AbstractChar` type may use a variety of encodings. Conversion to `UInt32` will not reveal this encoding because it always returns the Unicode value of the character. (Typically, the raw encoding can be obtained -via [`reinterpret`](@ref).) +via [`reinterpret`](@ref).) Character I/O uses UTF-8 by default for all +character types, regardless of their internal encoding. """ AbstractChar @@ -148,8 +149,7 @@ hash(x::Char, h::UInt) = # fallbacks: isless(x::AbstractChar, y::AbstractChar) = isless(Char(x), Char(y)) ==(x::AbstractChar, y::AbstractChar) = Char(x) == Char(y) -hash(x::AbstractChar, h::UInt) = - hash_uint64(((UInt32(x) + UInt64(0xd060fad0)) << 32) ⊻ UInt64(h)) +hash(x::AbstractChar, h::UInt) = hash(Char(x), h) widen(::Type{T}) where {T<:AbstractChar} = T -(x::AbstractChar, y::AbstractChar) = Int(x) - Int(y) diff --git a/base/strings/basic.jl b/base/strings/basic.jl index 88ad487a058ea..f96c44b5da61b 100644 --- a/base/strings/basic.jl +++ b/base/strings/basic.jl @@ -14,8 +14,8 @@ about strings: * String indexing is done in terms of these code units: * Characters are extracted by `s[i]` with a valid string index `i` * Each `AbstractChar` in a string is encoded by one or more code units - * Only the index of the first code unit of a `AbstractChar` is a valid index - * The encoding of a `AbstractChar` is independent of what precedes or follows it + * Only the index of the first code unit of an `AbstractChar` is a valid index + * The encoding of an `AbstractChar` is independent of what precedes or follows it * String encodings are [self-synchronizing] – i.e. `isvalid(s, i)` is O(1) [self-synchronizing]: https://en.wikipedia.org/wiki/Self-synchronizing_code diff --git a/base/strings/util.jl b/base/strings/util.jl index ef06c1ee51dfa..44cd811df6938 100644 --- a/base/strings/util.jl +++ b/base/strings/util.jl @@ -410,7 +410,7 @@ If `count` is provided, replace at most `count` occurrences. or a regular expression. If `r` is a function, each occurrence is replaced with `r(s)` where `s` is the matched substring (when `pat`is a `Regex` or `AbstractString`) or -character (when `pat` is a `AbstractChar` or a collection of `AbstractChar`). +character (when `pat` is an `AbstractChar` or a collection of `AbstractChar`). If `pat` is a regular expression and `r` is a `SubstitutionString`, then capture group references in `r` are replaced with the corresponding matched text. To remove instances of `pat` from `string`, set `r` to the empty `String` (`""`). diff --git a/doc/src/manual/strings.md b/doc/src/manual/strings.md index d39147ed780cb..37ba9a9a43359 100644 --- a/doc/src/manual/strings.md +++ b/doc/src/manual/strings.md @@ -28,8 +28,9 @@ There are a few noteworthy high-level features about Julia's strings: additional `AbstractString` subtypes (e.g. for other encodings). If you define a function expecting a string argument, you should declare the type as `AbstractString` in order to accept any string type. - * Like C and Java, but unlike most dynamic languages, Julia has a first-class type representing - a single character, called `AbstractChar`. This is just a special kind of 32-bit primitive type whose numeric value represents a Unicode code point. + * Like C and Java, but unlike most dynamic languages, Julia has a first-class type for representing + a single character, called `AbstractChar`. The built-in `Char` subtype of `AbstractChar` + is a 32-bit primitive type that can represent any Unicode character. * As in Java, strings are immutable: the value of an `AbstractString` object cannot be changed. To construct a different string value, you construct a new string from parts of other strings. * Conceptually, a string is a *partial function* from indices to characters: for some index values,