Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Copy all AsciiExt methods to the primitive types directly in order to deprecate it later #44042

Merged
merged 16 commits into from
Nov 5, 2017

Conversation

LukasKalbertodt
Copy link
Member

@LukasKalbertodt LukasKalbertodt commented Aug 22, 2017

EDIT: this PR is ready now. I edited this post to reflect the current status of discussion, which is (apart from code review) pretty much settled.


This is my current progress in order to prepare stabilization of #39658. As discussed there (and in #39659), the idea is to deprecated AsciiExt and copy all methods to the type directly. Apparently there isn't really a reason to have those methods in an extension trait¹.

This is work in progress: copy&pasting code while slightly modifying the documentation isn't the most exciting thing to do. Therefore I wanted to already open this WIP PR after doing basically 1/4 of the job (copying methods to &[u8], char and &str is still missing) to get some feedback before I continue. Some questions possibly worth discussing:

  1. Does everyone agree that deprecating AsciiExt is a good idea? Does everyone agree with the goal of this PR? => apparently yes
  2. Are my changes OK so far? Did I do something wrong?
  3. The issue of the unstable-attribute is currently set to 0. I would wait until you say "Ok" to the whole thing, then create a tracking issue and then insert the correct issue id. Is that ok?
  4. I tweaked eq_ignore_ascii_case(): it now takes the argument other: u8 instead of other: &u8. The latter was enforced by the trait. Since we're not bound to a trait anymore, we can drop the reference, ok? => I reverted this, because the interface has to match the AsciiExt interface exactly.

¹ Could it be that we can't write impl [u8] {}? This might be the reason for AsciiExt. If that is the case: is there a good reason we can't write such an impl block? What can we do instead? => we couldn't at the time this PR was opened, but Simon made it possible.

/cc @SimonSapin @zackw

@LukasKalbertodt
Copy link
Member Author

I don't think the Travis-fail is my fault ... Right? 😕

@steveklabnik
Copy link
Member

/Users/travis/.travis/job_stages: line 166: shell_session_update: command not found

I do not believe so, seems... strange?

@bluss
Copy link
Member

bluss commented Aug 23, 2017

“The” impl<T> [T] is located in liballoc (collections). I'd be thrilled if things have changed, but I think that it still restricts it so no intrinsic methods can come from core for those types. Same with impl str.

@SimonSapin
Copy link
Contributor

  • Actual deprecations should only land in Nightly after the replacement has reached the stable channel, so that a single code base using these features can be made at any point in time to compile without warnings on all three channels.

    (Or does the compile now support something like #[deprecated(since = "1.23")] that is silent in earlier versions?)

  • Rather than duplicating code, could one of the each pair of equivalent API call the other?

@kennytm
Copy link
Member

kennytm commented Aug 23, 2017

@LukasKalbertodt Any error other than x86_64-gnu-llvm-3.7 (the first job) from the pull request CI is spurious, especially the Macs. Don't worry about that.

@LukasKalbertodt
Copy link
Member Author

@kennytm Ok, thanks!

@SimonSapin

  • Ok yes, that makes sense. I won't add the #[deprecated] in this PR.
  • Yes, I planned to implement AsciiExt with the native methods once they are all implemented. So yes, there shouldn't really be duplicate code once I'm finished.

@bluss (and potentially everyone) I'm afraid I don't quite understand. There is this one impl-block in liballoc/slice.rs:

#[lang = "slice"]
#[cfg(not(test))]
impl<T> [T] { ... }

If I try to add impl [u8], I get this error:

error[E0390]: only a single inherent implementation marked with `#[lang = "slice"]` is allowed for the `[T]` primitive
   --> src/liballoc/slice.rs:169:1
    |
169 | / impl [u8] {
170 | |     fn foobar(&self) {}
171 | | }
    | |_^
    |
help: consider using a trait to implement these methods
   --> src/liballoc/slice.rs:169:1
    |
169 | / impl [u8] {
170 | |     fn foobar(&self) {}
171 | | }
    | |_^

If I add #[lang = "slice"] to the new impl block, I get:

error[E0152]: duplicate lang item found: `slice`.
    --> src/liballoc/slice.rs:176:1
     |
176  | / impl<T> [T] {
177  | |     /// Returns the number of elements in the slice.
178  | |     ///
179  | |     /// # Example
...    |
1506 | |     }
1507 | | }
     | |_^
     |
note: first defined here.
    --> src/liballoc/slice.rs:170:1
     |
170  | / impl [u8] {
171  | |     fn foobar(&self) {}
172  | | }
     | |_^

So... is that just impossible? Is that what you were saying, @bluss?

@SimonSapin
Copy link
Contributor

A creative hack to work around impl [u8] {} not being allowed could be something like:

// The existing impl block
#[lang = "slice"]
impl<T> [T] {
    // ...

    // New methods:
    fn is_ascii(&self) where Self: AsRef<[u8]> {
        let self_as_bytes = self.as_ref();
        // ...
    }
}

Given the available impls of AsRef, that is_ascii method would effectively only be available for [u8]. However the way it shows up in rustdoc would not be pretty.

@aidanhs aidanhs added the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Aug 23, 2017
@LukasKalbertodt
Copy link
Member Author

So am I right and it is indeed not possible to add methods to [u8] (except for hacks, like the one @SimonSapin proposed)? If yes, does it even make sense to continue working on this PR?

I see the following possibilities:

  1. We close this PR, doing nothing ...
    a. ... and keep AsciiExt and eventually stabilize the <ctype.h> methods from <ctype.h> functions for AsciiExt #39658
    b. ... until it is possible to add methods to [u8] someday in the future
  2. I continue working on this PR, using Simon's hack to implement it for [u8] ...
    a. ... and we stabilize it exactly like that later
    b. ... and never stabilize the hack; stabilize the methods on u8, char and str; replace the hack once it is possible to add methods to [u8] directly
  3. I continue working on this PR without adding the methods to [u8]. But as long as those methods aren't there and stabilized, we can't deprecated AsciiExt.

In 1b, 2b and 3 we assume that the possibility to add methods to [u8] directly will be implemented eventually. What do you think: will that be the case? How hard is it to implement that? Are there already discussions about this topic?

@SimonSapin
Copy link
Contributor

I believe it is not possible purely in library code, just like impl<T> [T] {} wouldn’t be allowed if it weren’t for the special support in the compiler in the form of #[lang = "slice"]. Probably we could add another such special case in the compiler, but I don’t know how to do that.

@aidanhs
Copy link
Member

aidanhs commented Aug 31, 2017

r? @sfackler since highfive flaked and this seems library related. Not sure if you'll need compiler team input though...

@sfackler
Copy link
Member

sfackler commented Sep 2, 2017

This might be a bit tricky to land unstable since inherent methods override trait-provided methods. So every existing use of AsciiExt traits is going to start hitting stability issues.

cc @rust-lang/libs.

@sfackler sfackler added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Sep 2, 2017
@aidanhs aidanhs added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 7, 2017
@alexcrichton
Copy link
Member

The libs team discussed this during triage the other day and the conclusion was that this is good to go, @sfackler do you know what specifically needs to be updated here?

@alexcrichton alexcrichton added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Sep 14, 2017
@SimonSapin
Copy link
Contributor

@alexcrichton There’s still two issues to resolve:

  • Modifying a compiler to add a bytes_slice lang item (or something) to allow impl [u8] {} to exist next to impl<T> [T] {}. (Or mentoring from someone who’d know how to do that.)
  • Decide if we’re ok with insta-stable inherent methods. If not, having them shadow trait methods is gonna make things stop compiling for users of the trait methods.

@sfackler
Copy link
Member

We're fine landing these as immediately stable since they're just being moved around. Not sure if we should block on figuring out the impl [u8] issues first or not.

@SimonSapin
Copy link
Contributor

Am I missing something? I don’t see how we can just decide to not block on having a technical solution to get past this compilation error:

error[E0390]: only a single inherent implementation marked with `#[lang = "slice"]` is allowed for the `[T]` primitive

@LukasKalbertodt
Copy link
Member Author

We do have the

impl [T] {
    fn foo() where Self: AsRef<[u8]> {}
}

solution. But as mentioned above, it has several disadvantages and really feels like a hack. Additionally, changing it to impl [u8] later is probably a breaking change (I think). So I'd very much prefer to make impl [u8] possible before merging this change.

I could write the code to make impl [u8] possible, but I would need mentoring, since right now, I don't have a clue what needs to be done. Of course, I'm also fine with someone else implementing this ^_^

@sfackler
Copy link
Member

The not-blocking on it solution would be to keep using the extension trait for [u8].

@bors
Copy link
Contributor

bors commented Sep 18, 2017

☔ The latest upstream changes (presumably #44678) made this pull request unmergeable. Please resolve the merge conflicts.

@alexcrichton
Copy link
Member

ping @LukasKalbertodt would you be willing to keep up w/ this PR?

@kennytm
Copy link
Member

kennytm commented Sep 21, 2017

I think we are still waiting for the team's decision on whether to introduce impl [u8], or workaround with where Self: AsRef<[u8]>, or keep the extension trait.

@SimonSapin
Copy link
Contributor

I’ll try to find a mentor in a couple weeks at the "impl days" after RustFest Zurich who could help with adding a lang item or something for impl [u8].

@LukasKalbertodt
Copy link
Member Author

@alexcrichton Sure! I just wasn't sure if we decided already (what @kennytm said) ^_^

I'll rebase in the next couple of days and then move/copy the methods for str and char. Then we can see whether we want to wait for the lang item (thanks @SimonSapin! That would be awesome!) or if we want to merge without methods on [u8].

Otherwise changes to the compiler are unable to introduce new
warnings: some crates tested by cargotest deny all warnings and
thus, the CI build fails.

Thanks SimonSapin for the patch!
@LukasKalbertodt
Copy link
Member Author

This is cargotest, which ensures some important packages like xsv, ripgrep, servo etc. are compilable with your changes to guarantee stability.

Interesting, I didn't know that.

I can send a PR, but it might be easier to include this patch in this PR.

I just pushed the patch here; I guess it's easier ;-)

Thanks a lot for quickly coming up with the patch and testing it! I tested locally again, too, and no issues so far. So I guess... this time, bors might actually return with a nice result ^_^

@kennytm
Copy link
Member

kennytm commented Nov 5, 2017

@bors r=alexcrichton

@bors
Copy link
Contributor

bors commented Nov 5, 2017

📌 Commit ea55596 has been approved by alexcrichton

@bors
Copy link
Contributor

bors commented Nov 5, 2017

⌛ Testing commit ea55596 with merge 94ede93...

bors added a commit that referenced this pull request Nov 5, 2017
…r=alexcrichton

Copy all `AsciiExt` methods to the primitive types directly in order to deprecate it later

**EDIT:** [this PR is ready now](#44042 (comment)). I edited this post to reflect the current status of discussion, which is (apart from code review) pretty much settled.

---

This is my current progress in order to prepare stabilization of #39658. As discussed there (and in #39659), the idea is to deprecated `AsciiExt` and copy all methods to the type directly. Apparently there isn't really a reason to have those methods in an extension trait¹.

~~This is **work in progress**: copy&pasting code while slightly modifying the documentation isn't the most exciting thing to do. Therefore I wanted to already open this WIP PR after doing basically 1/4 of the job (copying methods to `&[u8]`, `char` and `&str` is still missing) to get some feedback before I continue. Some questions possibly worth discussing:~~

1. ~~Does everyone agree that deprecating `AsciiExt` is a good idea? Does everyone agree with the goal of this PR?~~ => apparently yes
2. ~~Are my changes OK so far? Did I do something wrong?~~
3. ~~The issue of the unstable-attribute is currently set to 0. I would wait until you say "Ok" to the whole thing, then create a tracking issue and then insert the correct issue id. Is that ok?~~
4. ~~I tweaked `eq_ignore_ascii_case()`: it now takes the argument `other: u8` instead of `other: &u8`. The latter was enforced by the trait. Since we're not bound to a trait anymore, we can drop the reference, ok?~~ => I reverted this, because the interface has to match the `AsciiExt` interface exactly.

¹ ~~Could it be that we can't write `impl [u8] {}`? This might be the reason for `AsciiExt`. If that is the case: is there a good reason we can't write such an impl block? What can we do instead?~~ => we couldn't at the time this PR was opened, but Simon made it possible.

/cc @SimonSapin @zackw
@bors
Copy link
Contributor

bors commented Nov 5, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 94ede93 to master...

@bors bors merged commit ea55596 into rust-lang:master Nov 5, 2017
@LukasKalbertodt
Copy link
Member Author

Yeah, finally :) My first non-tiny PR

Thanks everyone for all the kind help!


With this landed, I think there are the following related things:

  1. Remove all the #[cfg(stage0)] stuff once a new stage0 is released (around the 24th, right?). I can already prepare a patch and create a PR once the stage has been updated.
  2. Deprecate the AsciiExt trait. When do we want to do that?
  3. Stabilize some of the ascii_ctype methods, as discussed in <ctype.h> functions for AsciiExt #39658. Do we want to do that immediately?

@kennytm
Copy link
Member

kennytm commented Nov 5, 2017

Step 1 will be done by a team member (e.g. #45285), so you don't need to submit that PR yourself.

@SimonSapin
Copy link
Contributor

🎉🎉🎉🎉

Thank you Lukas for pushing through all this, and sorry it took so much time and so many attempts.

The deprecation warning should wait until the replacement reaches the stable channel. That is after 1.23.0 is released.

#39658 (comment) says “stabilize these methods only on the u8 and char types” so I think it’s fine to send a PR doing that for the libs team to approve.

/// but without allocating and copying temporaries.
#[stable(feature = "ascii_methods_on_intrinsics", since = "1.21.0")]
#[inline]
pub fn eq_ignore_ascii_case(&self, other: &[u8]) -> bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we can move this and similar non-alloc methods to libcore?

With to_*_lowercase()/_uppercase() methods missing in libcore, this is the only method available for case-insensitive string comparison, which is a common operation in many application, even embedded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this would be good to have, but it’s tricky. I’ve opened #45803.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#49896 does this (though #49698 also made the Unicode to_lowercase and to_uppercase available in libcore).

bors added a commit that referenced this pull request Dec 9, 2017
rustdoc: Include `impl [u8]` in the docs

The impl was added in #44042 but wasn't visible in the docs.
petertseng added a commit to exercism/rust that referenced this pull request Jan 15, 2018
In Rust 1.23, the `AsciiExt` methods are implemented directly on the
concerned types, so attempting to import `AsciiExt` results in an
unused import.

Rust language PR:
rust-lang/rust#44042

PSA:
https://users.rust-lang.org/t/psa-dealing-with-warning-unused-import-std-ascii-asciiext-in-today-s-nightly/13726

This started warning as an unused import on nightly sometime between
2017-11-02 and 2017-11-09:

https://travis-ci.org/exercism/rust/builds/296391447
https://travis-ci.org/exercism/rust/builds/299770403

This started warning on beta exactly when 1.23.0 became Beta (it was
accepted on 1.22.0):

https://travis-ci.org/exercism/rust/builds/303139434
https://travis-ci.org/exercism/rust/builds/306431846

As 1.23 is now stable and our repo disallows warnings on stable, we will
need this.
@albel727
Copy link

albel727 commented Mar 5, 2018

And so char::is_ascii_* methods take (&self), unlike char::is_alphabetic / is_lowercase / is_whitespace and a whole bunch of other prior methods, which take (self), like a Copy type that is smaller than a reference on most arches, such as char, pretty much deserves.

Not only this is inconsistent (and possibly has worse performance too, like in the iter() vs iter.cloned() case), but it also means that is_ascii_* methods can't be used in shorthand way in methods like str::split(char::is_ascii_punctuation), which expect fn(char) -> bool.

Since str::split() docs even contain an example of using str::split(char::is_uppercase), my sheer surprise when the same didn't work with is_ascii_punctuation knew no limits.

it's of course the fault of AsciiExt trait, which takes &self in methods, because impl AsciiExt for &'a str | &'a mut str was harder to write than just impl AsciiExt for str, and so impl AsciiExt for char suffered from &self too.

But why nobody even considered changing this before stabilizing it in 1.24? 😕

@Kimundi
Copy link
Member

Kimundi commented Mar 5, 2018

I think this was a unfortunate oversight. A lot of the discussion about those methods was about how to handle deprecating AsciiExt correctly, and moving the methods around. Since for most types keeping methods at &self is the only sensible way, changing to self might have just slipped.

I'm not sure what we can do at this point outside of possibly changing it for the next epoch, but could you open an issue for this?

@LukasKalbertodt
Copy link
Member Author

I did consider changing it to self (and I even did for a short time, as it seems so natural). Sadly, this would render the new functions incompatible to the AsciiExt ones (which, as you noted, take &self as parameter). I changed it back because I got compiler errors in some code which was written with AsciiExt in mind (I can't remember the exact code, though...). This whole change's backwards compatibility was already questionable (see #46510 for example); changing &self to self just would not be possible without breaking a lot of stuff.

@kennytm
Copy link
Member

kennytm commented Mar 5, 2018

@Kimundi Epoch only affects syntax and doesn't allow breaking API compatibility, meaning this change requires a real Rust 2.0 unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.