Skip to content

S3 cost optimisation: Remember the last index value that was used #160

@tomfitzhenry

Description

@tomfitzhenry

The problem

Suppose you use the defaults:

  • s3_object_key_format of %{path}%{time_slice}_%{index}.%{file_extension}
  • time_slice_format of %Y%m%d%H

And suppose you flush every 30 seconds. So 120 files per hour.

The first file will check whether foo-2016010109_1.gz exists via an S3 HEAD request, see it doesn't exist, and then upload to that filename.

The next file will be first check whether foo-2016010109_1.gz exists via an S3 HEAD request, see it exists, and so increment the index to foo-2016010109_2.gz, check whether it exists via with an S3 HEAD request, see it doesn't exist, and then upload to the filename.

This will continue. When we get to the final file of the hour (the 120th file), we'll first do 119 HEAD requests!

That's 1+2+...+119 = 7140 S3 requests over the hour. And that's per input log file, per instance.

S3 HEAD requests are "$0.004 per 10,000 requests". So the monthly cost of the above, for 5 log files for 100 instances amounts to 7140_5_100_24_30*$0.004/1000 = $1028

More generally, 1+2+...+n is O(n^2) and we can reduce this to O(n).

Solutions

(a) The user can modify time_slice_format to include %M. Or the default could include %M.
(b) fluent-plugin-s3 could remember the last index it uploaded to, and so not have to check whether the n-1 earlier files already exist: fluentd would know they do.

If either solution was implemented, we'd've reduced the number of HEAD requests from O(n^2) to O(n). (Technically (a) doesn't reduce the solution to O(n^2), it just makes our n tiny.)

So rather than 7140 S3 requests per hour per log file per instance, we'd only do 120.

This reduces the monthly cost from $1028 to $17.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions