S3 cost optimisation: Remember the last index value that was used

## The problem

Suppose you use the defaults:
- `s3_object_key_format` of `%{path}%{time_slice}_%{index}.%{file_extension}` 
- `time_slice_format` of `%Y%m%d%H`

And suppose you flush every 30 seconds. So 120 files per hour.

The first file will check whether `foo-2016010109_1.gz` exists via an S3 HEAD request, see it doesn't exist, and then upload to that filename.

The next file will be first check whether `foo-2016010109_1.gz` exists via an S3 HEAD request, see it exists, and so increment the index to `foo-2016010109_2.gz`, check whether it exists via  with an S3 HEAD request, see it doesn't exist, and then upload to the filename.

This will continue. When we get to the final file of the hour (the 120th file), we'll first do 119 HEAD requests!

That's 1+2+...+119 = 7140 S3 requests over the hour. And that's per input log file, per instance.

S3 HEAD requests are "$0.004 per 10,000 requests". So the monthly cost of the above, for 5 log files for 100 instances amounts to 7140_5_100_24_30*$0.004/1000 = $1028

More generally, 1+2+...+n is O(n^2) and we can reduce this to O(n).
## Solutions

(a) The user can modify `time_slice_format` to include `%M`. Or the default could include `%M`.
(b) fluent-plugin-s3 could remember the last index it uploaded to, and so not have to check whether the n-1 earlier files already exist: fluentd would know they do.

If either solution was implemented, we'd've reduced the number of HEAD requests from O(n^2) to O(n). (Technically (a) doesn't reduce the solution to O(n^2), it just makes our n tiny.)

So rather than 7140 S3 requests per hour per log file per instance, we'd only do 120.

This reduces the monthly cost from $1028 to $17.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

S3 cost optimisation: Remember the last index value that was used #160

The problem

Solutions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

S3 cost optimisation: Remember the last index value that was used #160

Description

The problem

Solutions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions