-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update to rust-url 1.0 #2428
Update to rust-url 1.0 #2428
Conversation
r? @huonw (rust_highfive has picked a reviewer for you, use r? to override) |
Thanks @SimonSapin! |
☔ The latest upstream changes (presumably #2486) made this pull request unmergeable. Please resolve the merge conflicts. |
Thanks again for the PR @SimonSapin! I'm gonna close this for now but we can reopen/resubmit once rust-url 1.0 is out |
Version 1.0, rewrite the data structure and API **Do not merge yet**, I’d like to get feedback on API design. This is a large rewrite of the data structures and API that aims to eventually become version 1.0.0. Motivations are: * I designed rust-url when I was relatively new to Rust, coming from Python where memory allocation and ownership are not concerns. The URL Standard also writes algorithms to optimize for human comprehension, not computer resources. As a result, each component of the `Url` struct is a `String` or a `Vec<String>`. * At the same time, I tried to use enums to enforce some invariants in the type system. For example, “non-relative” like `data:` don’t have a host or port number. In practice I think this makes the API a pain to use more than it’s worth. * The spec [has changed](whatwg/url@b266a43) a lot (notably around dealing with protocols/schemes outside the small list it knows about), and I’m not sure the API can be adapted backward-compatibly anyway. I think the API can be more "rusty", and a bunch of long-standing issues be fixed along the way. The big idea is that the `Url` struct now contains a single `String` of the URL’s serialization (the only heap memory allocation), plus some indices into it (and some other meta-data). The indices allow accessor methods for URL components to return `&str` in O(1) time. The fields of `Url` are now private, as they need to maintain a number of invariants. Getting invariants wrong is not a memory-safety hazard (e.g. string indexing is still checked) but can cause non-sensical results. Rather than multiple ad-hoc methods like `serialize_authority` and `serialize_without_fragment`, `Url` implements indexing with ranges of `Position`, which is an enum indicating a position between URL components. For example, `serialize_without_fragment` is now `&url[..Position::AfterQuery]`, takes O(1) time, and returns a `&str` slice. While we’re making breaking changes, this is an opportunity to clean up APIs that didn’t turn out so great: * Drop `UrlParser`. The builder pattern works well for `std::process::Command`, but for URLs in practice almost only `base_url` is ever used. v0.5.2 already has [`base_url.join(str)`](https://github.com/servo/rust-url/blob/v0.5.2/src/lib.rs#L867-L874), which stays the same. * Drop all the `lossy_percent_decode_*` methods. They don’t seem to be used at all. Maybe accessors could return a `PercentEncodedStr` wrapper that dereferences to `&str` and has a `percent_decode()` method? * In the `percent_encoding` module, there was “duplicated” APIs that returned a new `String` or `Vec<u8>` for convenience, or pushed to an existing `&mut String` or `&mut Vec<u8>` for flexibility. This was a bit awkward, so I’ve changed it to return iterators of `char` or `u8` which can be used with `.extend(…)` or `.collect()`. I’m not sure this an improvement and might revert it. Feedback appreciated. This new design is great for accessing components, but not so much for modifying them. We need to make changes in the middle of the string, and with percent-encoding we don’t know the encoded size of the new thing until we’ve encoded it. Then we need to fix up indices. As a result, the `set_*` methods have lots of fiddly code where it’s easy to make subtle mistakes. This code should be well-tested before this PR is merged, but it’s not at all yet. To see what the new API looks like more easily than by reading the code, you can find the output of `cargo doc` at https://simonsapin.github.io/rust-url-1.0-docs/url/ Too see what the API usage is like in practice, I’ve updated some crates that use rust-url for this PR: * https://github.com/alexcrichton/git2-rs/pull/115 * rust-lang/cargo#2428 * rwf2/cookie-rs#42 * hyperium/hyper#740 * https://github.com/cyderize/rust-websocket/pull/70 * servo/servo#9840 Since this PR isn’t merged yet, to do the same in your crates you’ll need a clone of the git repository at the right branch: ``` git clone https://github.com/servo/rust-url --branch 1.0 ``` … and a Cargo [path override](http://doc.crates.io/guide.html#overriding-dependencies) in the directory of your crate: `.cargo/config` ``` path = [ "../rust-url", ] ``` Since we’re making breaking changes, this is the opportunity to make *more* (or different) breaking changes before version 1.0 is published and we commit to this new API. Please comment here to say what are your pain points with the 0.5.x API, what could be improved, what I’ve done in this PR that doesn’t seem like a good idea, etc. Thanks! <!-- Reviewable:start --> <!-- (Simon) Hidden for now [<img src="https://reviewable.io/review_button.svg" height="40" alt="Review on Reviewable"/>](https://reviewable.io/reviews/servo/rust-url/176) --> <!-- Reviewable:end -->
@alexcrichton |
This depends on functionality that I ended up removing from url 1.0 and plan to add back in 1.1 soon: servo/rust-url#188 I’ll resubmit this PR once that’s done. |
Do not merge yet: rust-url 1.0 is not published yet and may still get breaking changes. The goal of this PR for now is to demonstrate API usage.
Depends on servo/rust-url#176 and https://github.com/alexcrichton/git2-rs/pull/115