From 141fb3b8e7a21ced99615cd2e205f16b27451885 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Tue, 27 May 2014 23:54:16 -0700 Subject: [PATCH 1/2] RFC: Remove internationalization from format! --- active/0000-remove-format-intl.md | 109 ++++++++++++++++++++++++++++++ 1 file changed, 109 insertions(+) create mode 100644 active/0000-remove-format-intl.md diff --git a/active/0000-remove-format-intl.md b/active/0000-remove-format-intl.md new file mode 100644 index 00000000000..f3ba9db3428 --- /dev/null +++ b/active/0000-remove-format-intl.md @@ -0,0 +1,109 @@ +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR #: (leave this empty) +- Rust Issue #: (leave this empty) + +# Summary + +Remove internationalization features from format!, and change the set of escapes +accepted by format strings. The `plural` and `select` methods would be removed, +`#` would no longer need to be escaped, and `{{`/`}}` would become escapes for +`{` and `}`, respectively. + +# Motivation + +Internationalization is difficult to implement correctly, and doing so will +likely not be done in the standard library, but rather in an external library. +After talking with others much more familiar with internationalization, it has +come to light that our ad-hoc "internationalization support" in our format +strings are woefully inadequate for most real use cases of support for +internationalization. + +Instead of having a half-baked unused system adding complexity to the compiler +and libraries, the support for this functionality would be removed from format +strings. + +# Detailed design + +The primary internationalization features that `format!` supports today are the +`plural` and `select` methods inside of format strings. These methods are +choices made at format-time based on the input arguments of how to format a +string. This functionality would be removed from the compiler entirely. + +As fallout of this change, the `#` special character, a back reference to the +argument being formatted, would no longer be necessary. In that case, this +character no longer needs an escape sequence. + +The new grammar for format strings would be as follows: + +``` +format_string := [ format ] * +format := '{' [ argument ] [ ':' format_spec ] '}' +argument := integer | identifier + +format_spec := [[fill]align][sign]['#'][0][width]['.' precision][type] +fill := character +align := '<' | '>' +sign := '+' | '-' +width := count +precision := count | '*' +type := identifier | '' +count := parameter | integer +parameter := integer '$' +``` + +The current syntax can be found at http://doc.rust-lang.org/std/fmt/#syntax to +see the diff between the two + +## Choosing a new escape sequence + +Upon landing, there was a significant amount of discussion about the escape +sequence that would be used in format strings. Some context can be found on some +[old pull requests][1], and the current escape mechanism has been the source of +[much confusion][2]. With the removal of internationalization methods, and +namely nested format directives, it is possible to reconsider the choices of +escaping again. + +[1]: https://github.com/mozilla/rust/pull/9161 +[2]: https://github.com/mozilla/rust/issues/12814 + +The only two characters that need escaping in format strings are `{` and `}`. +One of the more appealing syntaxes for escaping was to double the character to +represent the character itself. This would mean that `{{` is an escape for a `{` +character, while `}}` would be an escape for a `}` character. + +Adopting this scheme would avoid clashing with Rust's string literal escapes. +There would be no "double escape" problem. More details on this can be found in +the comments of an [old PR][1]. + +# Drawbacks + +The internationalization methods of select/plural are genuinely used for +applications that do not involve internationalization. For example, the compiler +and rustdoc often use plural to easily create plural messages. Removing this +functionality from format strings would impose a burden of likely dynamically +allocating a string at runtime or defining two separate format strings. + +Additionally, changing the syntax of format strings is quite an invasive change. +Raw string literals serve as a good use case for format strings that must escape +the `{` and `}` characters. The current system is arguably good enough to pass +with for today. + +# Alternatives + +The major internationalization approach explored has been l20n, which has shown +itself to be fairly incompatible with the way format strings work today. +Different internationalization systems, however, have not been explored. Systems +such as gettext would be able to leverage format strings quite well, but it +was claimed that gettext for internationalization is inadequate for modern +use-cases. + +It is also an unexplored possibility whether the current format string syntax +could be leveraged by l20n. It is unlikely that time will be allocated to polish +off an internationalization library before 1.0, and it is currently seen as +undesirable to have a half-baked system in the libraries rather than a +first-class well designed system. + +# Unresolved questions + +* Should internationalization support be left in `std::fmt` as a "poor man's" + implementation for those to use as they see fit? From 86456e4b660243850e02dc6e1b134145983e33d3 Mon Sep 17 00:00:00 2001 From: Alex Crichton Date: Wed, 28 May 2014 11:07:36 -0700 Subject: [PATCH 2/2] Change internationalization to localization in terminology to reflect what is actually being removed --- active/0000-remove-format-intl.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/active/0000-remove-format-intl.md b/active/0000-remove-format-intl.md index f3ba9db3428..6c0bdbc730c 100644 --- a/active/0000-remove-format-intl.md +++ b/active/0000-remove-format-intl.md @@ -4,19 +4,19 @@ # Summary -Remove internationalization features from format!, and change the set of escapes +Remove localization features from format!, and change the set of escapes accepted by format strings. The `plural` and `select` methods would be removed, `#` would no longer need to be escaped, and `{{`/`}}` would become escapes for `{` and `}`, respectively. # Motivation -Internationalization is difficult to implement correctly, and doing so will +Localization is difficult to implement correctly, and doing so will likely not be done in the standard library, but rather in an external library. -After talking with others much more familiar with internationalization, it has -come to light that our ad-hoc "internationalization support" in our format +After talking with others much more familiar with localization, it has +come to light that our ad-hoc "localization support" in our format strings are woefully inadequate for most real use cases of support for -internationalization. +localization. Instead of having a half-baked unused system adding complexity to the compiler and libraries, the support for this functionality would be removed from format @@ -24,7 +24,7 @@ strings. # Detailed design -The primary internationalization features that `format!` supports today are the +The primary localization features that `format!` supports today are the `plural` and `select` methods inside of format strings. These methods are choices made at format-time based on the input arguments of how to format a string. This functionality would be removed from the compiler entirely. @@ -59,7 +59,7 @@ see the diff between the two Upon landing, there was a significant amount of discussion about the escape sequence that would be used in format strings. Some context can be found on some [old pull requests][1], and the current escape mechanism has been the source of -[much confusion][2]. With the removal of internationalization methods, and +[much confusion][2]. With the removal of localization methods, and namely nested format directives, it is possible to reconsider the choices of escaping again. @@ -77,8 +77,8 @@ the comments of an [old PR][1]. # Drawbacks -The internationalization methods of select/plural are genuinely used for -applications that do not involve internationalization. For example, the compiler +The localization methods of select/plural are genuinely used for +applications that do not involve localization. For example, the compiler and rustdoc often use plural to easily create plural messages. Removing this functionality from format strings would impose a burden of likely dynamically allocating a string at runtime or defining two separate format strings. @@ -90,20 +90,20 @@ with for today. # Alternatives -The major internationalization approach explored has been l20n, which has shown +The major localization approach explored has been l20n, which has shown itself to be fairly incompatible with the way format strings work today. -Different internationalization systems, however, have not been explored. Systems +Different localization systems, however, have not been explored. Systems such as gettext would be able to leverage format strings quite well, but it -was claimed that gettext for internationalization is inadequate for modern +was claimed that gettext for localization is inadequate for modern use-cases. It is also an unexplored possibility whether the current format string syntax could be leveraged by l20n. It is unlikely that time will be allocated to polish -off an internationalization library before 1.0, and it is currently seen as +off an localization library before 1.0, and it is currently seen as undesirable to have a half-baked system in the libraries rather than a first-class well designed system. # Unresolved questions -* Should internationalization support be left in `std::fmt` as a "poor man's" +* Should localization support be left in `std::fmt` as a "poor man's" implementation for those to use as they see fit?