Skip to content

Add dedicated field types for durations and byte sizes #31244

@jpountz

Description

@jpountz

I'm opening this feature request as a follow-up of a conversation with @ruflin. Today users typically use numeric types (eg. long, float, scaled_float) with a convention regarding units (sometimes made explicit in the name of the field, eg. transferred_bytes or duration_ms) in order to store durations or byte sizes, but we could make the experience better by having native support for these fields in Elasticsearch:

  • because Elasticsearch would internally store eg. nanos for durations and bytes for byte sizes, Elasticsearch would handle conversions automatically and it would be transparent to applications, which would just need to make sure that their values have explicit units so that they are not rejected
  • better query parsing support: query parsers could understand things like +bytes_transferred:[1MB TO 1GB] +duration:[1s TO 1d]
  • Better storage efficiency: for such data types, the order of magnitude is often much more useful than the exact value, and it might be ok to only guarantee eg. a 0.1% accuracy, which would in-turn allow to store all reasonable values (up to thousands of terabytes or thousands of years) using only 16 bits per value.

One risk is that we end up with lots of feature requests to support distances, weights, etc. Where do we draw the line? It's been suggested that we only have one field that we configure with what it is going to store but it might not be practical given that some units have their own specificities, eg. k means 1024 for byte sizes and 1000 for weights, some durations are not fixed (months, years, etc.). At first sight it looks cleaner to have one type per unit, which doesn't mean they can't share code internally.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions