Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to parse IP addresses without having to utf8-validate #94821

Closed
jyn514 opened this issue Mar 10, 2022 · 15 comments · Fixed by #94890
Closed

Add a way to parse IP addresses without having to utf8-validate #94821

jyn514 opened this issue Mar 10, 2022 · 15 comments · Fixed by #94890
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@jyn514
Copy link
Member

jyn514 commented Mar 10, 2022

The only standard way I've found to parse an IP address is IpAddr::from_str. That function immediately converts the string to bytes:

Parser { state: input.as_bytes() }

It would be great to expose some way (not necessarily Parser::new, but something similar) that doesn't require doing unnecessary utf8 validation.

@rustbot label: +T-libs +C-enhancement

@rustbot rustbot added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Mar 10, 2022
@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

Hi @jyn514, where is the utf8 validation happening? As you said, the string slice gets converted to a byte slice but then it looks like the parser just works with u8s which get cast to chars at some points in the algorithm

@jyn514
Copy link
Member Author

jyn514 commented Mar 11, 2022

@kckeiks it happens before you ever pass the string to from_str: I have a byte sequence, and in order to parse it, I'm forced to call std::str::from_utf8.

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

IpAddr implements From for arrays [u8; 4] and [u8; 16]

@jyn514
Copy link
Member Author

jyn514 commented Mar 11, 2022

I don't have an array. I have a dynamically sized &[u8], and I'm not sure whether it's a valid IP address or not.

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

Gotcha. If this &[u8] is a valid ipv6 address (so it has at least 16 bytes) but the first four bytes make up a valid ipv4 address too then we'd have to make a decision. We could require that the &[u8] be of size 4 or 16 then we'd know how to interpret it?

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

@rustbot claim

@jyn514
Copy link
Member Author

jyn514 commented Mar 11, 2022

4 and 16 are misleading I think. This is a normal IP address, like "127.0.0.1", it just happens to have come from the network and not be utf8 validated. "127.0.0.1" is 9 bytes, and in general, IP addresses can be variable length.

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

If the address, say "127.0.0.1", comes from the network, as inside an IPv4 header, then it must be a 32 bit value, it would be of the form 01111111 00000000 00000000 000000001 but utf8 encoded IP addresses would have variable length as you said so maybe that is what you're referring to receiving on the network?

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

We could add something like this and add in the docs that this parses IP addresses and assumes that they're utf8 encoded?

impl<'a> TryFrom<&'a [u8]> for IpAddr {
    type Error = AddrParseError;

    fn try_from(b: &'a [u8]) -> Result<IpAddr, Self::Error> {
        Parser::from_bytes(b).parse_with(|p| p.read_ip_addr())
    }
}

@jyn514
Copy link
Member Author

jyn514 commented Mar 11, 2022

this parses IP addresses and assumes that they're utf8 encoded

It doesn't. No part of that code requires b to be utf8 encoded - if it contains a non-utf8 byte, then it's not a valid IP address anyway, and it returns AddrParseError like normal.

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

It doesn't. No part of that code requires b to be utf8 encoded - if it contains a non-utf8 byte, then it's not a valid IP address anyway, and it returns AddrParseError like normal.

I disagree. The slice of bytes must represent an IP address string in dot notation so the slice of bytes "encodes" this string; some comment should be added to the docs to explain what the slice of bytes is.

For instance, this slice of bytes is bad and would cause a parse error. It doesn't contain non-utf8 byte but it doesn't encode an IP address string in dot notation.

&[127, 0, 0 1]

instead it should be

&[49, 50, 55, 46, 48, 46, 48, 46, 49] 

This is what as_bytes() does - we get the underlying bytes encoding the string.

Parser { state: input.as_bytes() }

@jyn514
Copy link
Member Author

jyn514 commented Mar 11, 2022

Oh, I see what you mean - you're saying that it should be a byte string, and not an array of the ip numbers. Yes, I agree that's what the code is doing.

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

So is this issue for exposing a new way to parse ip byte strings or a slice of the ip numbers? Not sure how to move forward.

Thanks for working with me on this btw.

@jyn514
Copy link
Member Author

jyn514 commented Mar 11, 2022

I don't understand the question. I have b"127.0.0.1" and I want to turn it into an IpAddr.

@kckeiks
Copy link
Contributor

kckeiks commented Mar 11, 2022

This is a normal IP address, like "127.0.0.1", it just happens to have come from the network and not be utf8 validated.

This made me think that you were taking about an array/slice of the numbers of the ip address and not a byte string.

I don't understand the question. I have b"127.0.0.1" and I want to turn it into an IpAddr.

I'll add the snippet above and add some comments from this discussion. Actually, I'm gonna let someone else have a crack at this. There's some useful info here and enough to get started. @rustbot release-assignment

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Aug 26, 2022
… r=joshtriplett

Support parsing IP addresses from a byte string

Fixes rust-lang#94821

The goal is to be able to parse addresses from a byte string without requiring to do any utf8 validation. Since internally the parser already works on byte strings, this should be possible and I personally already needed this in the past too.

~~I used the proposed approach from the issue by implementing `TryFrom<&'a [u8]>` for all 6 address types (3 ip address types and 3 socket address types). I believe implementing stable traits for stable types is insta-stable so this will probably need an FCP?~~

Switched to an unstable inherent method approach called `parse_ascii` as requested.

cc `@jyn514`
compiler-errors added a commit to compiler-errors/rust that referenced this issue Aug 27, 2022
… r=joshtriplett

Support parsing IP addresses from a byte string

Fixes rust-lang#94821

The goal is to be able to parse addresses from a byte string without requiring to do any utf8 validation. Since internally the parser already works on byte strings, this should be possible and I personally already needed this in the past too.

~~I used the proposed approach from the issue by implementing `TryFrom<&'a [u8]>` for all 6 address types (3 ip address types and 3 socket address types). I believe implementing stable traits for stable types is insta-stable so this will probably need an FCP?~~

Switched to an unstable inherent method approach called `parse_ascii` as requested.

cc ``@jyn514``
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Aug 28, 2022
… r=joshtriplett

Support parsing IP addresses from a byte string

Fixes rust-lang#94821

The goal is to be able to parse addresses from a byte string without requiring to do any utf8 validation. Since internally the parser already works on byte strings, this should be possible and I personally already needed this in the past too.

~~I used the proposed approach from the issue by implementing `TryFrom<&'a [u8]>` for all 6 address types (3 ip address types and 3 socket address types). I believe implementing stable traits for stable types is insta-stable so this will probably need an FCP?~~

Switched to an unstable inherent method approach called `parse_ascii` as requested.

cc `@jyn514`
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Aug 28, 2022
… r=joshtriplett

Support parsing IP addresses from a byte string

Fixes rust-lang#94821

The goal is to be able to parse addresses from a byte string without requiring to do any utf8 validation. Since internally the parser already works on byte strings, this should be possible and I personally already needed this in the past too.

~~I used the proposed approach from the issue by implementing `TryFrom<&'a [u8]>` for all 6 address types (3 ip address types and 3 socket address types). I believe implementing stable traits for stable types is insta-stable so this will probably need an FCP?~~

Switched to an unstable inherent method approach called `parse_ascii` as requested.

cc ``@jyn514``
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 28, 2022
… r=joshtriplett

Support parsing IP addresses from a byte string

Fixes rust-lang#94821

The goal is to be able to parse addresses from a byte string without requiring to do any utf8 validation. Since internally the parser already works on byte strings, this should be possible and I personally already needed this in the past too.

~~I used the proposed approach from the issue by implementing `TryFrom<&'a [u8]>` for all 6 address types (3 ip address types and 3 socket address types). I believe implementing stable traits for stable types is insta-stable so this will probably need an FCP?~~

Switched to an unstable inherent method approach called `parse_ascii` as requested.

cc ```@jyn514```
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 28, 2022
… r=joshtriplett

Support parsing IP addresses from a byte string

Fixes rust-lang#94821

The goal is to be able to parse addresses from a byte string without requiring to do any utf8 validation. Since internally the parser already works on byte strings, this should be possible and I personally already needed this in the past too.

~~I used the proposed approach from the issue by implementing `TryFrom<&'a [u8]>` for all 6 address types (3 ip address types and 3 socket address types). I believe implementing stable traits for stable types is insta-stable so this will probably need an FCP?~~

Switched to an unstable inherent method approach called `parse_ascii` as requested.

cc ````@jyn514````
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Aug 28, 2022
… r=joshtriplett

Support parsing IP addresses from a byte string

Fixes rust-lang#94821

The goal is to be able to parse addresses from a byte string without requiring to do any utf8 validation. Since internally the parser already works on byte strings, this should be possible and I personally already needed this in the past too.

~~I used the proposed approach from the issue by implementing `TryFrom<&'a [u8]>` for all 6 address types (3 ip address types and 3 socket address types). I believe implementing stable traits for stable types is insta-stable so this will probably need an FCP?~~

Switched to an unstable inherent method approach called `parse_ascii` as requested.

cc `````@jyn514`````
@bors bors closed this as completed in 52016a1 Aug 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants