Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Idea notes for next generation Buffer/Input/Output plugins #309

Closed
tagomoris opened this issue Apr 26, 2014 · 24 comments
Closed

[WIP] Idea notes for next generation Buffer/Input/Output plugins #309

tagomoris opened this issue Apr 26, 2014 · 24 comments

Comments

@tagomoris
Copy link
Member

Related with #287

@tagomoris
Copy link
Member Author

  • Base classes are Fluent::Plugin::Buffer::Base and Input::Base, Output::Base
    • Plugins are like Fluent::Plugin::Awesome::Output
      • in fluent-plugin-awesome gem
      • in lib/fluent/plugin/awesome/output.rb
  • How about file path collision in fluentd repository?
  • Using Fluentd:: prefix or not?
  • Change plugin gem naming rule?

@tagomoris
Copy link
Member Author

Callback based start-up checking:

def start
  @thread = Thread.new { @started = true  }  # use actor actually
end

def check_start?
  @started
end

We need some feature to support checking threads actually started or not, by callback (from fluentd engine, or test library).

@tagomoris
Copy link
Member Author

Buffer plugins should have some additional hook points to:

  • count emitted records
  • check chunk size
  • check elapsed time

Finally, both of output plugins and buffer plugins should be able to determine when buffers should be flushed.

@tagomoris
Copy link
Member Author

Output plugin should be able to overwrite buffer chunk key, to:

  • determine forwarding host by chunk key in out_forward
  • become buffer file human readable in out_file

#314

@repeatedly
Copy link
Member

v11 branch has new implementations. > https://github.com/fluent/fluentd/tree/v11/lib/fluentd/plugin
We can merge these API ideas.

@repeatedly repeatedly added v1 and removed feature labels Jun 27, 2014
@tagomoris
Copy link
Member Author

APIs to open files to read/write (and return fd) is very helpful to re-open files by SIGHUP, or re-read configurations from files.

@sonots
Copy link
Member

sonots commented Dec 15, 2014

Output plugin should be able to overwrite buffer chunk key, to:
become buffer file human readable in out_file

For out_file, I have a doubt as do we really need the File Buffer before writing file? Both is file. From the point of view, I have a re-implementation of out_file named out_file2 (not released yet) => https://github.com/sonots/fluent-plugin-file2. This plugin provides only human readable files, and thread (process)-safety.

Plugins are like Fluent::Plugin::Awesome::Output

If a base class is Fluent::Plugin::Output::Base, I think Fluent::Plugin::Output::Awesome is better. Its filename becomes lib/fluent/plugin/output/awesome.rb. I do not like that filenames of bundled plugins become as lib/fluent/plugin/stdout/output.rb, lib/fluent/plugin/copy/output.rb, which gives us lots of directories.

@tagomoris
Copy link
Member Author

memo: #157

@tagomoris
Copy link
Member Author

@sonots What we should use for gem name for plugins in lib/fluent/plugin/output/awesome?
fluent-plugin-output-awesome?
How about gem package which contain both of input and output?

@sonots
Copy link
Member

sonots commented Dec 17, 2014

I feel plugin developers should release gem separately like fluent-plugin-output-awesome because it makes easy to identify the purpose of the plugin for users, and makes possible to categorize it automatically in our plugin repository.

But, for developers who want to bundle both input and output, we may still accept gem names like fluent-plugin-awesome. But, I don't know how to categorize it.


EDIT: Well, first of all, I do not think we should change the directory layout of plugins, and the naming convention of gems. It provides confusion for users and I do not think users wish it.

Just providing new API, and users will use it. I think this scenario is best.

@tagomoris
Copy link
Member Author

buffer_dir_path instead of buffer_path, to manage specified directory exclusively against other plugin instance. All files in specified directory are to be flushed by specifying plugin instance.

  • That is useful for fluent-plugin-forest to flush all buffer files at startup
  • buffer_path has priority than buffer_dir_path.

@tagomoris
Copy link
Member Author

New configuration parameter set for buffer plugins:

  • buffer_space_limit (buffer plugin)
    • to specify total size of buffer chunks (== buffer_chunk_limit * buffer_queue_limit)
    • to replace specifying buffer_queue_limit with this parameter to show disk/memory usage explicitly
  • <buffer> section to specify parameters of buffer plugins (output plugin)
    • by using config_section

@tagomoris
Copy link
Member Author

Fluent::PluginSupport::* mixin modules instead of Actor

  • Fluent::PluginSupport::Thread
  • Fluent::PluginSupport::ChildProcess (for exec_filter and others)
  • Fluent::PluginSupport::Timer
  • Fluent::PluginSupport::Socket (provides listener, tcpsocket(?), sslsocket(?))

@tagomoris
Copy link
Member Author

Common API to generate paths with timestamp placeholders like %Y, %m and %d.
Related:

  • buffer chunk id
  • time slice key generation

@tagomoris
Copy link
Member Author

Benchmark plugins:

  • [in/out] forward(w/, w/o SSL) (done: in)
  • [in/out] scribe
  • [out] td, webhdfs, bigquery, s3, mongo, file, exec
  • [in] exec, http, syslog, unix (done: exec, http, syslog)
  • [filter] exec, grep, record_transformer
  • [out? filter?] some of *-counter
  • [meta] forest(?)

@tagomoris
Copy link
Member Author

Simple datastore for plugin instances & plugin types, to store on-memory data at shutdown (and to resume it at startup).

  • Serialization format? (msgpack or json?)
  • Datastore? (plain text file, sqlite or pluggable?)

My idea:

  • format: JSON for human readability
  • datastore: Pluggable datastore, but plain text file only at first

Implemented:

  • key-value (JSON compatible value)
  • pluggable datastore

@tagomoris
Copy link
Member Author

v11 actors: async_actor, background_actor, io_actor, socket_actor, timer_actor

Notes:

  • async and background are actually same? (run once or many times)
  • socket actor should be separated into tcp, udp and unix socket utils (and sslsocket?)
  • file io actor is needed? it seems to be required only by out_file

@tagomoris
Copy link
Member Author

API & configuration param to flush records in buffers ASAP.

@repeatedly
Copy link
Member

Supporting pluggable compression.
Currently, we have own compression mechanizm in each plugin.
This is not good for plugin ecosystem.

FYI, s3 example: https://github.com/fluent/fluent-plugin-s3#use-your-compression-algorithm

@tagomoris
Copy link
Member Author

Idea for compression:

  • CompressedBuffer super class as base class to support compression
  • GzipBuffer < CompressedBuffer in built-in class
  • new Buffer API should have hook point to execute compression for specified size

And forward protocol should be extended:

  • add a new flag to opts to specify compressed buffers and its algorithm
  • Only MessagePackEventStream should be supported with compression flag

@tagomoris
Copy link
Member Author

Plugin lifecycle with v0.12 APIs:

  1. initialize
  2. configure
  3. start
  4. before_shutdown
  5. shutdown

Plugin lifecycle with v0.14 APIs:

  1. initialize
  2. configure
  3. start
  4. (/) stop
  5. (/) before_shutdown
  6. shutdown
  7. (/) close
  8. (/) terminate

(/) mark means plugin authors not to take care of it

stop is used to:

  • prepare for shutdown sequence
  • tell all threads/event_loops/child_processes/sockets to stop/close ASAP under plugin's responsibility (break from while loop, etc)
  • tell plugins not to emit records any more (not by force)
  • stop timers

before_shutdown is used to:

  • flush buffers configured as flush_at_shutdown

shutdown is used to:

  • detach event watchers
  • send SIGTERM to child processes
  • mark all input (or output to emit records) plugins prohibited to emit records
  • store all data to be saved

close is used to:

  • close buffer files not to be appended any more
  • close all i/o handles
  • stop all event loops
  • save all plugin storage data (last chance)

terminate is used to:

  • join & kill all threads/processes

@tagomoris
Copy link
Member Author

Predefined buffer format

  • To create common format between many kind of plugins

@tagomoris
Copy link
Member Author

New buffer plugin buf_x to:

  • store data in memory under situation without output errors
  • store data in file under situation with output errors

@tagomoris
Copy link
Member Author

This issue is already outdated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants