Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing string to integer accepting '0x' + hexadecimal #1098

Open
comex opened this issue Apr 29, 2015 · 15 comments
Open

Parsing string to integer accepting '0x' + hexadecimal #1098

comex opened this issue Apr 29, 2015 · 15 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@comex
Copy link

comex commented Apr 29, 2015

In many cases, it is useful for a program that parses integers (e.g. from the command line) to accept both decimal and hexadecimal constants, by letting the user prefix the latter with 0x. Currently, FromStr::from_str (also the destination of str::parse) only accepts base 10; from_str_radix allows an arbitrary base to be specified, but doesn't have any special magic to parse prefixes.

If I were writing from_str from scratch, I would suggest having it accept 0x, perhaps along with 0o and 0b like Rust source, because even if many applications don't have any particular reason to expect hexadecimal, there's very little advantage to rejecting it*. At this point, I guess that would be something of a breaking change, so a new method would be needed; still, I claim such a method belongs in the standard library for the same reason. Alternatively, from_str_radix could accept radix 0 like some other languages' equivalents, although this seems like a hack.

Other languages' behavior:

  • C atoi only accepts base 10.
  • The C strtol family has a mandatory base parameter; base=0 is decimal by default but allows 0x and 0755 (octal); base=16 accepts 0x
  • Python int() is base 10 only by default, but has a base parameter; same behavior as C except 0b and 0o are also accepted
  • Ruby: '0x123'.to_i is 0, but Integer('0x123') is 291. I have no idea. The latter seems to accept the same as Python.
  • JavaScript: parseInt by default accepts 0x and, in some browsers, 0-for-octal, which ES5 says not to.

(I certainly don't propose interpreting leading zeroes as octal.)

  • One possible reason to reject alternate forms would be to give each integer a unique input representation, but of course -0 and 0, and 008 and 8 parse to the same value.
@lilyball
Copy link
Contributor

I think we should expose some mechanism to parse these prefixes. I'm not convinced that FromStr::from_str() is that mechanism, but I don't have a particularly strong opinion about that. I'm concerned that automatically parsing prefixes might be surprising to some people, because those prefixes really only make sense to developers and aren't so familiar to other users (and this function may be used to parse user input, or to parse integral values in data formats that don't allow for prefixes). But I'm not sure that parsing prefixes like that would necessarily be a problem for anybody either. And of course anyone that wants to guarantee that they don't parse the prefixes can use from_str_radix() instead (which will of course not parse prefixes).


An alternative to consider is an extension to FromStr that adds an associated type Config and a method from_str_config(&str, Config) (tentative name) that allows types to provide a generic mechanism to configure parsing. The integral types could vend a struct for Config that has a field parse_prefix that enables prefix parsing (and perhaps a radix field of type Option<bool> as well). Then from_str(s) could be considered equivalent to from_str_config(s, <Config as Default>::default()) (although it can't literally be that since it's already stabilized). I can't decide if this is a good idea or just overly complicated. It depends on whether there are any other types that want to have options for string parsing (and whether there's any actual benefit to doing this in a standardized way in this trait, instead of just adding ad-hoc methods to each type that wants to be custom).

@lilyball
Copy link
Contributor

Incidentally, we don't currently support hexadecimal float literals, but if we wanted to add support for those, that would have the same concerns that parsing prefixes on integral literals does (and may want a way to enable/disable the prefix handling when parsing strings).

@SimonSapin
Copy link
Contributor

I’d prefer the "default" FromStr impl for integer types not to do this. This can be useful, but it should be explicit / opt-in.

When this kind of thing is default/implicit, we end up with headaches like 0x7f.0.0.1 being accidentally a valid IPv4 address, but only on some systems. web-platform-tests/wpt#1104

@lilyball
Copy link
Contributor

Hah, nice example. That's a really good point. I'm now in full agreement that it should be opt-in behavior.

The simplest change here would be to follow C's lead and allow a 0 value given to from_str_radix() to mean autodetect, but I agree with @comex that it's kind of an ugly hack. We could make the radix an optional value, but that makes the common case (of providing a radix value) more awkward, and it's not clear if this actually has any benefit over just using 0.

I am still a bit partial to my "alternative" idea from before, of having a from_str_config() method. This would allow us to provide a way to control not only integral prefix parsing, but whether a leading 0 should be treated as octal or whether that should require 0o, and similarly whether we should accept a binary prefix of 0b. But I am a bit worried that this is straying to far towards behavior that should be done with a "real" parser.

@Mignet
Copy link

Mignet commented Apr 13, 2016

So,what's the decision?

@SimonSapin
Copy link
Contributor

I think it would be a breaking change to do this now in existing APIs. But new APIs could be added.

@nrc nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Aug 30, 2016
@inferiorhumanorgans
Copy link

BTW with Ruby:

2.3.3 :001 > '0x123'.to_i
 => 0 
2.3.3 :002 > '0x123'.to_i(16)
 => 291 

@oxalica
Copy link

oxalica commented Feb 15, 2019

The default parser should be simple and generic. Having special cases for radix 2, 8 or 16 is terrible when we don't need it.
Adding this feature is simple while removing it is not.

If required, implementing another parse_with_prefix() is better. Though I think it should not be a part of std.

@dns2utf8
Copy link

dns2utf8 commented Jan 5, 2020

For future reference: use from_str_radix

@inferiorhumanorgans
Copy link

For future reference: use from_str_radix

Why?

@dns2utf8
Copy link

dns2utf8 commented Jan 5, 2020

Because you can use this example for example.

Cheers 😃

@inferiorhumanorgans
Copy link

Right, that misses the point of this issue. The ask was that there would be one function that will handle common prefixes.

@dns2utf8
Copy link

dns2utf8 commented Jan 6, 2020

Ah, I see more something along the lines of parse_int::parse()

@Timmmm
Copy link

Timmmm commented Jun 6, 2023

I think this is probably what most people want:

fn parse_int(s: &str) -> std::result::Result<u64, std::num::ParseIntError> {
    if let Some(s) = s.strip_prefix("0x") {
        u64::from_str_radix(s, 16)
    } else if let Some(s) = s.strip_prefix("0o") {
        u64::from_str_radix(s, 8)
    } else if let Some(s) = s.strip_prefix("0b") {
        u64::from_str_radix(s, 2)
    } else {
        u64::from_str_radix(s, 10)
    }
}

Playground

I had a look at the parse_int() crate and it seems ok too. It also supports stripping _, and it uses num_traits to make it generic. Also it can optionally support 0777 style octals 🤮.

@dns2utf8
Copy link

I am the author of parse_int where you can enable the "implicit-octal" feature if needed, but it is not enabled by default. Feel free to open an issue if needed on the repo:
https://gitlab.com/dns2utf8/parse_int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests

9 participants