Remove string [] indexing #12710

huonw · 2014-03-05T10:49:44Z

It is byte indexing (not character indexing) which is encouraging poor UTF-8 hygiene, and the behaviour can be regained either with the .bytes() iterator, or just s.as_bytes() to get a &[u8] view (at zero cost).

The text was updated successfully, but these errors were encountered:

huonw · 2014-03-05T10:49:51Z

nominating

bstrie · 2014-03-05T14:54:43Z

Seconded. In a UTF-8 world, what we think of as a "string" is emphatically not an array.

thestinger · 2014-03-05T15:26:13Z

+1, it's rarely correct to index strings at all, and especially not by bytes

thestinger · 2014-03-05T15:27:07Z

I would also be for removing the len method and using as_bytes().len() instead. Strings don't need to implement the Container trait as they're not actually a container of a specific type.

sfackler · 2014-03-05T17:01:44Z

+1

pongad · 2014-03-05T21:02:07Z

I volunteer to work on this. By the way, we do have a function to decode character points right?

pongad · 2014-03-06T00:02:21Z

~str in patterns seems to be blocking this. I'll try work on that and come back to this later.

huonw · 2014-03-06T00:05:18Z

Why would string pattern matching affect string indexing?

pongad · 2014-03-06T00:09:01Z

I'm not sure myself. After removing string indexing, I got "internal
compiler errors". Tracked it to #[lang=uniq_str_eq].

On Wed, Mar 5, 2014 at 7:05 PM, Huon Wilson [email protected]:

Why would string pattern matching affect string indexing?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/12710#issuecomment-36810589
.

nikomatsakis · 2014-03-06T16:57:07Z

This will probably fall out of the changes we plan for DST, or at
least could easily do so.

pnkfelix · 2014-03-13T20:50:42Z

Accepted for 1.0, P-backcompat-libs.

lilyball · 2014-05-05T16:35:19Z

@thestinger Removal of [] indexing makes some small sense, but removal of len() is too far. If you remove len() then you have to remove slice() as well, and that's just too painful (going through .as_bytes() to slice loses all information about whether the result is valid utf-8 and is extremely wasteful to get back to &str).

brson · 2014-06-17T17:50:20Z

Any hints what has to change in the code to fix this? I took a peek but got lost in trait lookups, while I thought slice indexing was built-in.

nikomatsakis · 2014-06-17T20:18:23Z

@brson hmm, I think the PRIMARY thing that has to change is middle::ty::ty_index(), which says that a str is indexable to type u8

nikomatsakis · 2014-06-17T20:19:14Z

the function middle::mem_categorization::element_kind could also be updated to remove that case

brson · 2014-06-18T17:11:05Z

Thanks, @nikomatsakis. I've started a patch.

Being able to index into the bytes of a string encourages poor UTF-8 hygiene. To get a view of `&[u8]` from either a `String` or `&str` slice, use the `as_bytes()` method. Closes rust-lang#12710. [breaking-change]

alexchandel · 2014-07-02T05:29:52Z

Rust strings feel much less intuitive to me than Python's str. I'm fine with UTF-8, but in Python str/strings are unicode and indexing/slicing them yields a unicode character. Python has a separate bytes class for encoded strings, whether they're UTF-8, 16, or 32 encoded, and uses the b"" syntax to designate byte-string literals versus u"" for unicode string literals.

Being able to index into the bytes of a string encourages poor UTF-8 hygiene. To get a view of `&[u8]` from either a `String` or `&str` slice, use the `as_bytes()` method. Closes #12710. [breaking-change] If the diffstat is any indication this shouldn't have a huge impact but it will have some. Most changes in the `str` and `path` module. A lot of the existing usages were in tests where ascii is expected. There are a number of other legit uses where the characters are known to be ascii.

huonw · 2014-07-02T05:48:44Z

in Python str/strings are unicode

So are Rust strings, the major difference is we do not disguise the underlying representation.

(Also, it's not exactly clear what your point is? If it is that indexing should be removed, then @brson's patch #15085 is being tested as we speak.)

huonw added the I-nominated label Mar 5, 2014

pnkfelix added P-backcompat-libs and removed I-nominated labels Mar 13, 2014

pnkfelix added this to the 1.0 milestone Mar 13, 2014

pongad mentioned this issue Apr 5, 2014

std::str::StrSlice::char_at() should returns Option(char) #12882

Closed

This was referenced May 2, 2014

RFC: Rename StrBuf to String rust-lang/rfcs#60

Merged

Syntax for slices rust-lang/rfcs#13

Closed

brson mentioned this issue Jun 21, 2014

rustc: Remove &str indexing from the language. #15085

Merged

bors closed this as completed in #15085 Jul 2, 2014

kagiasoldaccount mentioned this issue Jul 19, 2014

Rust0.11 tailhook/rust-argparse#1

Merged

japaric mentioned this issue Sep 24, 2014

Use slicing syntax for str/String #17502

Closed

nalimilan mentioned this issue Dec 10, 2014

Restrict indexing into strings to a special ByteIndex or StringIndex type JuliaLang/julia#9297

Closed

eddyb mentioned this issue Feb 18, 2017

Ch8 edits after technical review rust-lang/book#450

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove string [] indexing #12710

Remove string [] indexing #12710

huonw commented Mar 5, 2014

huonw commented Mar 5, 2014

bstrie commented Mar 5, 2014

thestinger commented Mar 5, 2014

thestinger commented Mar 5, 2014

sfackler commented Mar 5, 2014

pongad commented Mar 5, 2014

pongad commented Mar 6, 2014

huonw commented Mar 6, 2014

pongad commented Mar 6, 2014

nikomatsakis commented Mar 6, 2014

pnkfelix commented Mar 13, 2014

lilyball commented May 5, 2014

brson commented Jun 17, 2014

nikomatsakis commented Jun 17, 2014

nikomatsakis commented Jun 17, 2014

brson commented Jun 18, 2014

alexchandel commented Jul 2, 2014

huonw commented Jul 2, 2014

Remove string [] indexing #12710

Remove string [] indexing #12710

Comments

huonw commented Mar 5, 2014

huonw commented Mar 5, 2014

bstrie commented Mar 5, 2014

thestinger commented Mar 5, 2014

thestinger commented Mar 5, 2014

sfackler commented Mar 5, 2014

pongad commented Mar 5, 2014

pongad commented Mar 6, 2014

huonw commented Mar 6, 2014

pongad commented Mar 6, 2014

nikomatsakis commented Mar 6, 2014

pnkfelix commented Mar 13, 2014

lilyball commented May 5, 2014

brson commented Jun 17, 2014

nikomatsakis commented Jun 17, 2014

nikomatsakis commented Jun 17, 2014

brson commented Jun 18, 2014

alexchandel commented Jul 2, 2014

huonw commented Jul 2, 2014