Parsing string to integer accepting '0x' + hexadecimal #1098

comex · 2015-04-29T06:44:44Z

In many cases, it is useful for a program that parses integers (e.g. from the command line) to accept both decimal and hexadecimal constants, by letting the user prefix the latter with 0x. Currently, FromStr::from_str (also the destination of str::parse) only accepts base 10; from_str_radix allows an arbitrary base to be specified, but doesn't have any special magic to parse prefixes.

If I were writing from_str from scratch, I would suggest having it accept 0x, perhaps along with 0o and 0b like Rust source, because even if many applications don't have any particular reason to expect hexadecimal, there's very little advantage to rejecting it*. At this point, I guess that would be something of a breaking change, so a new method would be needed; still, I claim such a method belongs in the standard library for the same reason. Alternatively, from_str_radix could accept radix 0 like some other languages' equivalents, although this seems like a hack.

Other languages' behavior:

C atoi only accepts base 10.
The C strtol family has a mandatory base parameter; base=0 is decimal by default but allows 0x and 0755 (octal); base=16 accepts 0x
Python int() is base 10 only by default, but has a base parameter; same behavior as C except 0b and 0o are also accepted
Ruby: '0x123'.to_i is 0, but Integer('0x123') is 291. I have no idea. The latter seems to accept the same as Python.
JavaScript: parseInt by default accepts 0x and, in some browsers, 0-for-octal, which ES5 says not to.

(I certainly don't propose interpreting leading zeroes as octal.)

One possible reason to reject alternate forms would be to give each integer a unique input representation, but of course -0 and 0, and 008 and 8 parse to the same value.

The text was updated successfully, but these errors were encountered:

lilyball · 2015-04-29T16:29:46Z

I think we should expose some mechanism to parse these prefixes. I'm not convinced that FromStr::from_str() is that mechanism, but I don't have a particularly strong opinion about that. I'm concerned that automatically parsing prefixes might be surprising to some people, because those prefixes really only make sense to developers and aren't so familiar to other users (and this function may be used to parse user input, or to parse integral values in data formats that don't allow for prefixes). But I'm not sure that parsing prefixes like that would necessarily be a problem for anybody either. And of course anyone that wants to guarantee that they don't parse the prefixes can use from_str_radix() instead (which will of course not parse prefixes).

An alternative to consider is an extension to FromStr that adds an associated type Config and a method from_str_config(&str, Config) (tentative name) that allows types to provide a generic mechanism to configure parsing. The integral types could vend a struct for Config that has a field parse_prefix that enables prefix parsing (and perhaps a radix field of type Option<bool> as well). Then from_str(s) could be considered equivalent to from_str_config(s, <Config as Default>::default()) (although it can't literally be that since it's already stabilized). I can't decide if this is a good idea or just overly complicated. It depends on whether there are any other types that want to have options for string parsing (and whether there's any actual benefit to doing this in a standardized way in this trait, instead of just adding ad-hoc methods to each type that wants to be custom).

lilyball · 2015-04-29T16:30:30Z

Incidentally, we don't currently support hexadecimal float literals, but if we wanted to add support for those, that would have the same concerns that parsing prefixes on integral literals does (and may want a way to enable/disable the prefix handling when parsing strings).

SimonSapin · 2015-04-29T19:03:04Z

I’d prefer the "default" FromStr impl for integer types not to do this. This can be useful, but it should be explicit / opt-in.

When this kind of thing is default/implicit, we end up with headaches like 0x7f.0.0.1 being accidentally a valid IPv4 address, but only on some systems. web-platform-tests/wpt#1104

lilyball · 2015-04-30T23:25:16Z

Hah, nice example. That's a really good point. I'm now in full agreement that it should be opt-in behavior.

The simplest change here would be to follow C's lead and allow a 0 value given to from_str_radix() to mean autodetect, but I agree with @comex that it's kind of an ugly hack. We could make the radix an optional value, but that makes the common case (of providing a radix value) more awkward, and it's not clear if this actually has any benefit over just using 0.

I am still a bit partial to my "alternative" idea from before, of having a from_str_config() method. This would allow us to provide a way to control not only integral prefix parsing, but whether a leading 0 should be treated as octal or whether that should require 0o, and similarly whether we should accept a binary prefix of 0b. But I am a bit worried that this is straying to far towards behavior that should be done with a "real" parser.

Mignet · 2016-04-13T15:09:38Z

So,what's the decision?

SimonSapin · 2016-04-13T17:52:42Z

I think it would be a breaking change to do this now in existing APIs. But new APIs could be added.

inferiorhumanorgans · 2017-11-30T11:37:21Z

BTW with Ruby:

2.3.3 :001 > '0x123'.to_i
 => 0 
2.3.3 :002 > '0x123'.to_i(16)
 => 291

oxalica · 2019-02-15T07:07:09Z

The default parser should be simple and generic. Having special cases for radix 2, 8 or 16 is terrible when we don't need it.
Adding this feature is simple while removing it is not.

If required, implementing another parse_with_prefix() is better. Though I think it should not be a part of std.

dns2utf8 · 2020-01-05T21:54:18Z

For future reference: use from_str_radix

inferiorhumanorgans · 2020-01-05T22:26:46Z

For future reference: use from_str_radix

Why?

dns2utf8 · 2020-01-05T23:33:11Z

Because you can use this example for example.

Cheers 😃

inferiorhumanorgans · 2020-01-05T23:43:56Z

Right, that misses the point of this issue. The ask was that there would be one function that will handle common prefixes.

dns2utf8 · 2020-01-06T14:52:06Z

Ah, I see more something along the lines of parse_int::parse()

Timmmm · 2023-06-06T21:03:12Z

I think this is probably what most people want:

fn parse_int(s: &str) -> std::result::Result<u64, std::num::ParseIntError> {
    if let Some(s) = s.strip_prefix("0x") {
        u64::from_str_radix(s, 16)
    } else if let Some(s) = s.strip_prefix("0o") {
        u64::from_str_radix(s, 8)
    } else if let Some(s) = s.strip_prefix("0b") {
        u64::from_str_radix(s, 2)
    } else {
        u64::from_str_radix(s, 10)
    }
}

Playground

I had a look at the parse_int() crate and it seems ok too. It also supports stripping _, and it uses num_traits to make it generic. Also it can optionally support 0777 style octals 🤮.

dns2utf8 · 2023-06-11T16:31:02Z

I am the author of parse_int where you can enable the "implicit-octal" feature if needed, but it is not enabled by default. Feel free to open an issue if needed on the repo:
https://gitlab.com/dns2utf8/parse_int

nrc added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Aug 30, 2016

thejpster mentioned this issue Jul 16, 2019

Handle hex numbers rust-embedded/svd#70

Closed

rezural mentioned this issue Sep 16, 2020

Support alternative integer shapes, such as binary and hexadecimal nushell/nushell#2536

Closed

ojeda mentioned this issue Feb 16, 2021

Add Rust int param types Rust-for-Linux/linux#87

Merged

hcsch mentioned this issue Aug 10, 2021

Implement lexing and parsing for all WGSL number literal types gfx-rs/naga#1184

Merged

cuiwc mentioned this issue Jul 3, 2024

Trim 0b/0o/0x prefixes when parsing numbers. mdsteele/rust-ar#35

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parsing string to integer accepting '0x' + hexadecimal #1098

Parsing string to integer accepting '0x' + hexadecimal #1098

comex commented Apr 29, 2015

lilyball commented Apr 29, 2015

lilyball commented Apr 29, 2015

SimonSapin commented Apr 29, 2015

lilyball commented Apr 30, 2015

Mignet commented Apr 13, 2016

SimonSapin commented Apr 13, 2016

inferiorhumanorgans commented Nov 30, 2017

oxalica commented Feb 15, 2019

dns2utf8 commented Jan 5, 2020

inferiorhumanorgans commented Jan 5, 2020

dns2utf8 commented Jan 5, 2020

inferiorhumanorgans commented Jan 5, 2020

dns2utf8 commented Jan 6, 2020

Timmmm commented Jun 6, 2023

dns2utf8 commented Jun 11, 2023

Parsing string to integer accepting '0x' + hexadecimal #1098

Parsing string to integer accepting '0x' + hexadecimal #1098

Comments

comex commented Apr 29, 2015

lilyball commented Apr 29, 2015

lilyball commented Apr 29, 2015

SimonSapin commented Apr 29, 2015

lilyball commented Apr 30, 2015

Mignet commented Apr 13, 2016

SimonSapin commented Apr 13, 2016

inferiorhumanorgans commented Nov 30, 2017

oxalica commented Feb 15, 2019

dns2utf8 commented Jan 5, 2020

inferiorhumanorgans commented Jan 5, 2020

dns2utf8 commented Jan 5, 2020

inferiorhumanorgans commented Jan 5, 2020

dns2utf8 commented Jan 6, 2020

Timmmm commented Jun 6, 2023

dns2utf8 commented Jun 11, 2023