Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support nanosecond resolution in fluentd #461

Closed
mr-salty opened this issue Oct 22, 2014 · 29 comments
Closed

support nanosecond resolution in fluentd #461

mr-salty opened this issue Oct 22, 2014 · 29 comments
Labels
feature request *Deprecated Label* Use enhancement label in general v0.14

Comments

@mr-salty
Copy link
Contributor

Currently fluentd timestamps have second granularity; this is a feature request to support nanoseconds (or milliseconds, although personally I prefer the former).

See also #96 - there was some discussion there of opening a new issue but I don't see one so I thought I would do it. Proposals there were passing milli/nanoseconds independent of the existing timestamp, or treating the existing timestamp as floating point (currently good enough for 1/4ns granularity assuming unix epoch time).

@repeatedly
Copy link
Member

https://github.com/fluent/fluentd/commits/default-time-unit-option

I have a branch to support millisecond resolution but I'm not sure this approach is good or not.
And maybe, there are some plugins which assume time is second.
I heard @sonots plugins passed the tests with milliseconds but need to check major plugins.

Yeah, I also think Fluentd should support millisecond(and nanoseconds?) resolution.

@mr-salty
Copy link
Contributor Author

I'm not sure what the best approach is; initially I liked the idea of just making the existing timestamp floating point, but there is a risk that some plugins will not deal well with that. I also took a quick look and saw a few instances where existing plugins and tests use to_i to truncate.

I can volunteer to help with development but it might take a few weeks before I have time to do it. I just wanted to get the ball rolling.

I'd definitely vote for ns instead of ms; it's easy enough for someone to truncate if they want the latter.

@mr-salty mr-salty reopened this Oct 22, 2014
@repeatedly repeatedly added the feature request *Deprecated Label* Use enhancement label in general label Oct 30, 2014
@mr-salty
Copy link
Contributor Author

mr-salty commented Nov 3, 2014

I realized mis-spoke above about floating point being able to handle nanoseconds - I was mixing up 'long double' (80-bit extended precision double, 64 bit mantissa) and standard double (64 bits, 53 bit mantissa). Given the current seconds since epoch, the latter can only represent microseconds, and that's generally what we have to work with. So, to support nanoseconds we'd need another representation like separate (integer) seconds and nanoseconds.

@mr-salty mr-salty closed this as completed Nov 3, 2014
@mr-salty
Copy link
Contributor Author

mr-salty commented Nov 3, 2014

argh, I hate the close-and-comment button :(

@mr-salty mr-salty reopened this Nov 3, 2014
@szhem
Copy link

szhem commented Nov 5, 2014

What about representing the millisecond/nanosecond resolution as unix time (with millisecond/nanosecond presition)?
At least it should be portable.

@repeatedly
Copy link
Member

What about representing the millisecond/nanosecond resolution as unix time

How to represent nanosecond as unix time?
Using multiple-precision integer instead of 64bit integer?

@szhem
Copy link

szhem commented Nov 6, 2014

How to represent nanosecond as unix time?

In the most cases millisecond/microsecond resolution should be quite enough (comparing to seconds).

To have support of nanosecond time there are multiple options:

  1. multiple-precision integer, as you have mentioned (I'm not sure it is the best solution, as it will be quite difficult to support across the different languages)
  2. [unix_time].[nano_part], where unix_time - is unix time in seconds (or millis), nano_part - nano-fraction of the next second, for example 1415551534.745872935 means Sun, 09 Nov 2014 16:45:34.745 GMT (millisecond precision) or Sun, 09 Nov 2014 16:45:34.745873 GMT (microsecond precision) or Sun, 09 Nov 2014 16:45:34.745872935 GMT (nanosecond precision). This approach will not require bignum arithmetic libraries from the corresponding languages.

@tagomoris
Copy link
Member

I think there are two major problem about to represent nanosecond timestamp:

  • protocol layer representation:
    • How in/out_forward and other plugins represents time in default?
    • What types should these plugins accept? (May new representation only? MUST both?)
  • internal representation:
    • Array of separated two-value of seconds and nanoseconds? time #=> [sec, nsec]
    • Custom made class? time #=> AnyAwesomeTime, time.to_i #=> sec
    • A (any-sized) double?
    • Or any other?

@szhem
Copy link

szhem commented Nov 7, 2014

Honestly speaking, I don't think that it's really necessary to have nanotime representation.
Milliseconds/microseconds precision is enough for the most of use cases.
In that case the existing protocol already supports it and we have to make sure, that main plugins work correctly with such a precision.

As for the nanoseconds - I would prefer custom made class as it adds more flexibility comparing to other approaches.

P.S. What about adding "version" field to the protocol? It should help for the plugin implementers to decide how to parse the incoming message?

@repeatedly
Copy link
Member

Milliseconds/microseconds precision is enough for the most of use cases.

I agree. millisecond precision is enough for almost users but nanoseconds support is also good.
Maybe it's a future work. High priority is millisecond support.

What about adding "version" field to the protocol

Does this mean forward plugin protocol?

@szhem
Copy link

szhem commented Nov 10, 2014

Maybe it's a future work.

Agree too.

Does this mean forward plugin protocol?

That's too, but mainly - internal event representation (agree that "protocol" is not a correct word in that case). The version is a pretty useful feature to support backward compatibility too,

For example, fluentd can automatically convert v2 events which support nanoseconds into v1 events which support only seconds before passing the event to a plugin, if this plugin does not support v2 events handling.

@mr-salty
Copy link
Contributor Author

Another option is 64-bit integer "nanoseconds since the epoch" like Ruby Time, or will we have trouble making sure it doesn't get converted to float somewhere? In any case, I think if we use a new representation it should use nanoseconds, because it won't be any more difficult than micros.

microseconds does have the advantage that it could work with the existing protocol, although there are definitely places in the existing code that truncate the time to integer.

@repeatedly
Copy link
Member

Maybe 64bit integer as nanoseconds seems hard to use.
Implementing Fluentd's Time class or simply float object seems better.
I will ask other Ruby experts about which is the best for supporting milliseconds/nanoseconds.

@repeatedly
Copy link
Member

We have a plan to release fluentd v0.12 at Dec 12.
After that, I will implement this feature.

@davidwartell
Copy link

+1 for milliseconds in fluentd I am using Fluentd to store logs in elastic search and events are out of order without better time resolution.

I worked around this using http://stackoverflow.com/q/27928479/2848111

@repeatedly
Copy link
Member

We have two discussion points.

millisecond or nanosecond?

From our experience, milliseconds is enough for popular use-cases.
The merit of nanosecond covers almost use-cases.

How to implement it?

For millisecond, using floating point instead of integer is enough.
For nanosecond, maybe we have 2 ways for serialization format.

  • [sec, usec] pair

This is straightforward approach.
Ruby's Time has at(time, usec) method so easy to handle in fluentd world

  • 64bit integer like ktime

This is light-weight approach so serialized data is smaller than above.

In fluentd, we will implement Fluent::Time like object to keep backward compatibility.

@repeatedly
Copy link
Member

I will try to check the performance of nanosecond approach.

mr-salty added a commit to GoogleCloudPlatform/fluent-plugin-google-cloud that referenced this issue May 10, 2015
- timeNanos is deprecated because the 53 bit mantissa used for doubles
  does not have sufficient precision to represent the current time in
  nanoseconds past the unix epoch (which currently requires 61 bits).
- instead, we now support formats with split seconds and nanoseconds:
  - timestampSeconds, timestampNanos
  - timestamp { seconds, nanos }
  In both cases, the "seconds" part represents seconds since the unix epoch,
  and the "nano" part is the nanosecond component only (0..999999999).
  Managed VMs is using the latter, but it can only be ingested via
  json/msgpack/etc, while the former is suitable for use in an in_tail regex.

This should be considered an interim solution until
fluent/fluentd#461 is resolved.
pgrm added a commit to logTank/fluent-plugin-http that referenced this issue May 12, 2015
Use raw Time.now.to_f instead of Engine.now until
fluent/fluentd@c9be93c
is merged or fluent/fluentd#461 resolved
@browny
Copy link

browny commented Jun 2, 2015

does fluentd support millisecond now?

@repeatedly
Copy link
Member

Not now. We have a plan to add sub-second support since v0.14.
We are finished to develop v0.12 recently and we are now developing v0.14.

@repeatedly
Copy link
Member

memo: Need faster TimeParser because cache doesn't work with sub-second support.

@repeatedly
Copy link
Member

We consider adding nano-second type to msgpack ext in fluentd.
In this way, msgpack chunk handles nano-second time natively.

@repeatedly
Copy link
Member

@mururu now working on this issue.
If the implementation keeps backward compatibility, we have a plan to backport the feature to v0.12.

@repeatedly
Copy link
Member

Continue to discuss this issue via #653 PR!

@mr-salty
Copy link
Contributor Author

mr-salty commented Nov 6, 2015

Hi, do you still plan to backport this to 0.12 per the comment above?

@repeatedly
Copy link
Member

@mr-salty Sorry, I missed your comment.
I don't have a backport plan.
If you want to test this feature, please try v0.14.0-pre.1 version.

@repeatedly
Copy link
Member

Close because this is implemented since v0.14.

@yissachar
Copy link

@repeatedly Is there anything special that needs to be done to make this work? I can't find any docs covering this.

I'm using v0.14.8 and have the following config:

<source>
  @type tail
  read_from_head true
  path /var/log/my.log
  pos_file /var/log/fluentd.pos
  time_format %Y-%m-%d %H:%M:%S.%L
  tag foo.*
</source>

<match foo.**>
  @type aws-elasticsearch-service
  logstash_format true
  flush_interval 5s

  <endpoint>
    url <my-url>
    region <my-region>
  </endpoint>
</match>

But I don't end up with millisecond precision - I still only get resolution up to the second. As you can see from my config, I'm using the AWS Elasicsearch plugin, but I don't see why that would make a difference.

@cosmo0920
Copy link
Contributor

@yissachar aws-elasticsearch-service plugin uses v0.12 API. Using v0.14 API plugin can handle nanosecond precision, but using v0.12 API plugin cannot handle it.
Could you report and request to use v0.14 API at https://github.com/atomita/fluent-plugin-aws-elasticsearch-service repository?

@yissachar
Copy link

@cosmo0920 Thanks for the tip. I've opened uken/fluent-plugin-elasticsearch#222

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request *Deprecated Label* Use enhancement label in general v0.14
Projects
None yet
Development

No branches or pull requests

8 participants