Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardisation proposal feedback #518

Closed
38 of 44 tasks
vitaut opened this issue Jun 7, 2017 · 41 comments
Closed
38 of 44 tasks

Standardisation proposal feedback #518

vitaut opened this issue Jun 7, 2017 · 41 comments

Comments

@vitaut
Copy link
Contributor

vitaut commented Jun 7, 2017

I'll capture standardization proposal comments here, then prioritize and address them.

  • Use of basic_cstring_view.
    Expressed by: Lee Howes, Nicol Bolas, Bengt Gustafsson, Thiago Macieira
    Pros: less code bloat if already have a null-terminated string
    Cons: overhead for non-null-terminated strings
    Alternatives:

    1. basic_string_view
    2. separate overload for const char* and basic_string_view.
  • Compile-time format string processing
    Expressed by: LEWG

  • Named arguments.
    Expressed by: Bengt Gustafsson
    Joël Lamotte: mention in a section about potential extensions

  • Internationalization (translation) facilities.
    Expressed by: Bengt Gustafsson
    This seems out of scope of the library, but can be built on top of it.

  • Make passing a locale easier. Currently it requires creating a buffer.
    Expressed by: Bengt Gustafsson

  • Reordering of format specifiers.
    Expressed by: Bengt Gustafsson
    May introduce ambiguity in the grammar.

  • Pass a range (string_view) instead of a cursor to format_value.
    Expressed by: Bengt Gustafsson

  • Make creating a formatting object from format specification string easy.
    Expressed by: Bengt Gustafsson, LEWG
    Experimental branch: https://github.com/fmtlib/fmt/tree/ext

    Option 1 - separate class:

     template <>
     class formatter<MyClass> {
      public:
       explicit formatter(context &ctx) {
         // Parse the format string.
       }
     
       void format(buffer &buf, MyClass &value) {
         // Format value.
       }
     
      private:
       // Formatting state.
     };

    Option 2 - lambda:

     template <typename T, typename U>
     using enable_format = typename std::enable_if<std::is_same<T, U>::value>::type;
     
     template <typename T, typename Char, typename = enable_format<T, MyClass>>
     auto parse_format(basic_context<Char> &ctx) {
       // Parse the format string.
       return [/* Formatting state */](basic_buffer<Char> &buf, MyClass &value) {
         // Format value.
       };
     }
  • Add a way to define a global error handler for formatting errors (where currently exceptions are thrown) to make it possible, for example, to "ignore the error, default to default format (corresponding to {}), and log the error to the non-critical error log".
    Expressed by: Sergey Ignatchenko

  • Refer to d0355r4 and mention integration possibilities. According to Howard Hinnant there should be no problem plugging P0355 into P0645.

  • Make sure impl can pre-compute output size.
    Expressed by: LEWG

  • Look at using or explain why not to use an output iterator.
    Expressed by: LEWG

  • Overloads of formatting functions for wide strings.
    Expressed by: Howard Hinnant.

  • Benchmark results.
    Expressed by: LEWG

  • Consider using floating-point format with a round-trip guarantee (as the one described in P0067) as default.
    Expressed by: [fr_dav], Sean Parent (https://www.reddit.com/r/cpp/comments/81ta6q/the_fmt_formatting_library_is_now_available_in/dv7gpj9/), http://disq.us/p/1r3a4tt

  • Consider using standalone functions found by ADL instead of specialized struct template for the extension API: https://www.reddit.com/r/cpp/comments/856p3z/text_formatting_at_the_iso_c_standards_meeting_in/dvwdxyf/
    Expressed by: matthieum, Eric Niebler, LEWG, Sean Middleditch

  • Clarify that compile-time checks are possible for user-defined types with custom parsers.
    Expressed by: LEWG

  • Add to_chars to benchmark.
    Expressed by: Nicolai Josuttis

  • Add format_to_n that takes an output iterator and a size / maximum inserted length
    Expressed by: LEWG, Bengt Gustafsson

  • Reduce use of nested namespaces
    Expressed by: Titus Winters

  • Choose a better name for count
    Expressed by: LEWG

  • Custom formatting of built-in types, e.g. to escape strings when formatting JavaScript code
    Expressed by: Sean Parent (http://disq.us/p/1r3a4tt).

  • Give a realistic example of the visitation API.
    Expressed by: LEWG, Nicolai Josuttis

  • Rename visit to something more specific like visit_format_arg
    Expressed by: Mateusz Pusz.

  • Disallow consecutive zeros in arg-id ({000}).
    Expressed by: Zhihao Yuan

  • Specify formatting behavior for arithmetic types using to_chars where possible.
    Expressed by: Zhihao Yuan

  • Clarify what is the current locale.
    Expressed by: Zhihao Yuan

  • Default floating point formatting should be shortest round-trip format
    Expressed by: Zhihao Yuan

  • Add missing specializations of formatter for basic_string, basic_string_view, and char*.
    Expressed by: LEWG

  • Use format_args instead of make_*format_args.
    Expressed by: Zhihao Yuan

  • Clarify which specifiers (particularly sign) apply to nan & inf.
    Expressed by: Antony Polukhin

  • Provide formatters for more standard types.
    Expressed by: Lars Gullik Bjønnes

  • Consider adding ostream operator<< support.
    Expressed by: Lars Gullik Bjønnes

  • Clarify behavior when mixing character and string types
    Expressed by: Zhihao Yuan

  • Consider reducing the number of overloads by parameterizing formatting functions on format string type similarly to path(Source).
    Expressed by: LWG

  • Clarify why is_integral and is_arithmetic are needed or remove.
    Expressed by: Stephan T. Lavavej

  • Clarify when formatter::parse should stop consuming the input (greedy, '}', etc.)
    Expressed by: Tomasz Kamiński

  • Allow the use of # with hexfloat a format to print leading 0x.
    Expressed by: Tomasz Kamiński

  • Consider changing the behavior of = to produce "+ 0x6" instead of "+0x 6" on format("{: =+#8x}", 6) or remove = altogether.
    Expressed by: Tomasz Kamiński, Zhihao Yuan

  • It should be possible to format integer as char with the c format specifier, e.g. format("{:c}", 65).
    Expressed by: Zhihao Yuan

  • Add a context type with an iterator used by formatted_size to be able to do something like:

    template<>
    struct formatter<MyType, char> { ... };
    
    extern template formatter<MyType, char>::format(MyType, format_context&);
    extern template formatter<MyType, char>::format(MyType, formatted_size_context&);

    and define format functions in a source file.

    Expressed by: Tomasz Kamiński

  • Add a specifier for textual formatting of bools, e.g. s or t. This is the default but would be nice to have an explicit specifier for consistency.
    Expressed by: Zhihao Yuan

  • Consider making the default pointer format not add "0x" prefix unless the # specifier is passed.
    Expressed by: LWG

  • Add a specifier to format negative zero without '-'.
    Expressed by: Alan Talbot

@vitaut
Copy link
Contributor Author

vitaut commented Jul 22, 2017

basic_cstring_view bites the dust in 2f4f49f.

Performance impact looks negligible, even on the int_generator benchmark:

  • Null-terminated: 0.54859 s
  • string_view: 0.560266 s

There is a slight increase in binary code due to passing size around but this can be mitigated with extra overloads if there is interest (probably only relevant for embedded).

@polyvertex polyvertex mentioned this issue Aug 27, 2017
@vitaut
Copy link
Contributor Author

vitaut commented Sep 10, 2017

The parse function in the extension API now accepts a range (string_view):

template <>
struct formatter<MyType> {
  const char* parse(std::string_view format) {
    // Parse format specifiers, store them in the current formatter object
    // and return a pointer past the end of the parsed range.
  }
  // ...
};

See also http://zverovich.net/2017/08/20/format-api-improvements.html.

Reordering of specifiers doesn't seem like a good idea. For example, is # a fill or an alternative form flag in "{:#^10}"? Marking as complete (won't fix).

@vitaut
Copy link
Contributor Author

vitaut commented Oct 22, 2017

Internationalization seems out of scope. Marking as complete (won't fix) as well.

@vitaut
Copy link
Contributor Author

vitaut commented Nov 8, 2017

Design review comments: https://issues.isocpp.org/show_bug.cgi?id=322

@vitaut
Copy link
Contributor Author

vitaut commented Nov 11, 2017

Added section "Compile-time processing of format strings" http://fmtlib.net/Text%20Formatting.html#CompileTimeFormat.

@vitaut
Copy link
Contributor Author

vitaut commented Nov 19, 2017

Need to document parse context and explain how the new extension API can be used to implement constexpr checks for user-defined types.

@vitaut
Copy link
Contributor Author

vitaut commented Nov 25, 2017

As discussed with Howard Hinnant, added a note regarding integration with D0355 "Extending to Calendars and Time Zones" to the Extensibility section

@vitaut
Copy link
Contributor Author

vitaut commented Nov 29, 2017

Documented parse_context in fmtlib/fmt.dev@afa5eb8. Now do we need both parse_context and context? If yes, the latter should probably be renamed to format_context.

@vitaut
Copy link
Contributor Author

vitaut commented Nov 29, 2017

Added a note about named arguments in fmtlib/fmt.dev@cc4cc49.

@vitaut
Copy link
Contributor Author

vitaut commented Dec 2, 2017

context is still useful for implementing different formatters, e.g. printf formatter. Clarified in fmtlib/fmt.dev@a34409a
and unhacked handling of user-defined types.

@vitaut
Copy link
Contributor Author

vitaut commented Dec 9, 2017

Need to rename "numeric" to "arithmetic" because this is the term used in the standard.

@vitaut
Copy link
Contributor Author

vitaut commented Jan 13, 2018

Preliminary work to support iterators: c095445

@vitaut
Copy link
Contributor Author

vitaut commented Jan 15, 2018

As of 1029119 the library supports back_insert_iterators including non-contiguous ones. A bit more work is required for arbitrary output iterators.

@vitaut
Copy link
Contributor Author

vitaut commented Jan 18, 2018

Output iterators and pre-computation of output size (with fmt::count) are fully supported now. The paper is updated except for the following parts:

  • basic_context API
  • locales: since buffers are gone there should be another way to pass a locale
  • explicit conversion from arg_store to format_args since implicit is no longer working due to templatization

@vitaut
Copy link
Contributor Author

vitaut commented Jan 21, 2018

Re locales: most formatting is locale-independent but for n specifier will rely on global locale as std::to_string does.

@vitaut
Copy link
Contributor Author

vitaut commented Jan 27, 2018

Added a benchmark: http://fmtlib.net/Text%20Formatting.html#Benchmarks

Would be good to show the reduction in code bloat due to vformat.

@vitaut
Copy link
Contributor Author

vitaut commented Jan 31, 2018

Added an appendix comparing binary code sizes: http://fmtlib.net/Text%20Formatting.html#BinaryCode

@vitaut
Copy link
Contributor Author

vitaut commented Mar 28, 2018

There seems to be a strong preference for having format in the std namespace rather than a subnamespace (https://twitter.com/vzverovich/status/975058968316956673):
screenshot 2018-03-28 07 52 00_preview

Most names are already unambiguous but need to double check.

@vitaut
Copy link
Contributor Author

vitaut commented Mar 31, 2018

Added format_to_n.

@vitaut
Copy link
Contributor Author

vitaut commented Mar 31, 2018

Renamed count to formatted_size. Other options: format_size, output_size: https://twitter.com/vzverovich/status/979520967528521728

@vitaut
Copy link
Contributor Author

vitaut commented Apr 4, 2018

Dropped nested namespace fmt in fmtlib/fmt.dev@8f6b2f3. Potentially non-unique names:

template <class charT>
  class basic_parse_context;

template <class OutputIterator, class charT>
  class basic_context;

using parse_context = basic_parse_context<char>;

using context = basic_context<implementation-defined>;
using wcontext = basic_context<implementation-defined>;

template <class Context, class... Args>
  format_arg_store<Context, Args...> make_args(const Args&... args);

template <class... Args>
  format_arg_store<context, Args...> make_args(const Args&... args);

@vitaut
Copy link
Contributor Author

vitaut commented Apr 8, 2018

Update names:

basic_context -> basic_format_context
context -> format_context
wcontext -> wformat_context
make_args -> make_format_args

leaving parse_context for now.

@vitaut
Copy link
Contributor Author

vitaut commented Apr 15, 2018

Use of ADL instead of specialization is problematic because the parse function doesn't take an argument of formatted type T. matthieum suggested passing the type info via an extra argument:

namespace ns {
struct S {};

struct format_spec {};

constexpr std::pair<fmt::parse_context::iterator, format_specs>
  parse(fmt::type<S>, fmt::parse_context& ctx) { ... }

template <typename FormatContext>
auto format(const format_specs& spec, const S&, FormatContext& ctx) { ... }
}

vs

namespace ns {
struct S {};
}

namespace fmt {
template <>
struct formatter<ns::S> {
  constexpr auto parse(parse_context& ctx) { ... }

  template <typename FormatContext>
  auto format(const ns::S&, FormatContext& ctx) { ... }
};
}

https://godbolt.org/g/p1wQPr

However, this will complicate the API because parse will now have to return both the end iterator and the parsed specs (or communicate the end iterator by other means e.g. via the parse_context) and format will have to additionally take parsed specs.

@vitaut
Copy link
Contributor Author

vitaut commented Apr 29, 2018

Creating a formatting object from format specification string should be easy now:

formatter<T> f;
parse_context ctx(fmt);
f.parse(ctx);

@vitaut
Copy link
Contributor Author

vitaut commented Apr 29, 2018

The header <charconv> is not yet available on my system, so marking "Add to_chars to benchmark" as complete (won't fix).

@seanmiddleditch
Copy link

@vitaut - I'm having some trouble understanding why the ADL support requires such a large change to the parse API. If I'm understanding the fmlib internals correctly, then I would imagine that you could instead just use ADL to find/construct the correct formatter instance and leave the formatter interface itself completely untouched.

Replace typename Context::template formatter_type<T>::type f; with auto f = select_formatter(type<T>{});, using:

// default implementation in fmt/core.h
template<typename T> auto fmt::format_value(type<T>) { return formatter<T>{}; }

// selector in fmt/core.h
template<typename T> auto fmt::select_formatter(type<T> t) {
  using fmt::format_value;
  return format_value(t);
}

// user-defined type in user's mystuff.h
namespace mystuff {
  struct foo {};
  struct my_formatter { ... };
  auto format_value(type<foo>) { return my_formatter{}; }
}

Example: https://gcc.godbolt.org/z/IGxGva

(obviously would need a little bit extra to deal with char_type but that's the gist, and you can probably find a more appropriate name than format_value I'm sure)

That is roughly the kind of approach I took with formatxx. Instead of constructing a formatter type I instead take a function pointer, but the idea is the same: a wrapper uses ADL to find the right function pointer which is later used to do the actual formatting work.

@vitaut
Copy link
Contributor Author

vitaut commented Sep 21, 2018

you could instead just use ADL to find/construct the correct formatter instance and leave the formatter interface itself completely untouched.

That's a great idea, thanks a lot! This is indeed much better than having separate parse and format functions. It still adds an extra step compared to the current approach because the user needs to provide both the formatter class and the factory function. On the other hand there is no need for specialization.

@vitaut
Copy link
Contributor Author

vitaut commented Sep 23, 2018

Added a visitation API example in fmtlib/fmt.dev@3a663df.

@vitaut
Copy link
Contributor Author

vitaut commented Sep 23, 2018

With compile-time checks, it will be possible to guarantee that formatting functions don't throw (if the underlying iterator operations don't throw), so providing a global error handler is unnecessary.

@vitaut
Copy link
Contributor Author

vitaut commented Nov 8, 2018

Disallowed leading zeros in cases other than 0 as in format("{000}", 42) to prevent confusion with octal as per LEWG (Zhihao's) feedback.

@vitaut
Copy link
Contributor Author

vitaut commented Nov 16, 2018

Reopened the ADL question per additional feedback from LEWG and others. Problems with ADL:

  • slower than specialization: can be worked around by using inline friends (abseil does this)
  • parse function doesn't take the object: can be worked around with an additional getter function

@vitaut
Copy link
Contributor Author

vitaut commented Nov 24, 2018

Cannot use format_args instead of make_format_args because the latter needs to return an intermediate store object that has the actual array of arguments.

@vitaut
Copy link
Contributor Author

vitaut commented Dec 28, 2018

Escaping (incl. for built-in types) can be done by providing a custom formatter for a wrapper type.

@vitaut
Copy link
Contributor Author

vitaut commented Jan 5, 2019

Removed is_arithmetic and is_integral. They are non-essential and can be replaced with visit.

@cgmb
Copy link

cgmb commented Feb 8, 2019

I just wanted to give a big +1 to the locale section of the proposal. Locale-independent floating point formatting is important for text-based interchange formats, and it's a real pain at the moment. Thank you for your efforts on this process.

@vitaut
Copy link
Contributor Author

vitaut commented Feb 8, 2019

Thanks for the feedback, @cgmb.

@vitaut
Copy link
Contributor Author

vitaut commented Mar 16, 2019

Clarified when parse should stop consuming the input in R7 of the paper (Formatter requirements): http://fmtlib.net/P0645R7.html#gensection-8

@vitaut
Copy link
Contributor Author

vitaut commented Mar 22, 2019

Parameterizing formatting functions is not a good idea because (1) we don't want multiple instantiations of type-erased functions and (2) we want implicit conversions into string_view. Marking as won't fix.

@vitaut
Copy link
Contributor Author

vitaut commented Apr 14, 2019

Lars is working on a paper that proposes formatters for more standard types, so marking this as fixed.

@vitaut
Copy link
Contributor Author

vitaut commented Jun 10, 2019

Closing issues addressed by https://isocpp.org/files/papers/P1652R0.html

@vitaut
Copy link
Contributor Author

vitaut commented Aug 10, 2019

http://eel.is/c++draft/format

@vitaut vitaut closed this as completed Aug 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants