Skip to content

Google Summer of Code 2015

Eduardo Silva edited this page Feb 20, 2015 · 34 revisions

The following page describe the project ideas that students can develop with our mentors and community in general. Keep in mind that for us the communication is a fundamental key and when applying you have to be very detailed about how do you plan to approach the project.

Fluentd UI

Difficulty: medium

Tags: C Ruby, Ruby, JSON, Web Services

Description

the Fluentd data collector program is a complete command line service, but there are some cases where a UI is required to handle multiple instances and perform complex configurations. Fluentd support many type of inputs and output data and recent version support specific filters to alter the data content. Having a UI aims to simplify the general overview of the Fluentd service instance.

This project requires a complete implementation of a Fluentd UI based on Ruby language with the following requirements:

  • Handle Fluentd configuration schemas (sources and matches).
  • Import Fluentd library and perform basic administration tasks such as start, stop, restart.
  • Import and Export configuration to different formats (e.g: JSON)
  • Templates support: templates aims to simplify the configuration when connecting with different services sush as Logstash, MongoDB, Elastic search. The number of templates and details will be defined later, the important idea is the engine to handle that capability.
  • On the design part, it must use Twitter Bootstrap and be Mobile friendly.
  • The complete implementation must use Ruby libraries and cannot use sources that are not part of the Ruby ecosystem

C# Logger Library

Difficulty: medium

Tags: C, C Ruby, Documentation, Markdown, Networking

Description

Fluentd is not just a data collector, it's also a complete ecosystem where different tools for different targets are provided. At the moment there is a missing component that we aim to implement: the C# Logger Library.

The student applying to this project, requires previous knowledge of C# language. The Library must be full documented and contain several examples of the usage to demonstrate it capabilities. The required features are the following:

  • The core library must support the MsgPack format, binary communication with Fluentd is fundamental for performance and network bandwidth reduction reasons.

  • Library must be capable to perform network operations with the main Fluentd service, it needs to support TCP_FASTOPEN options for Linux and offer KeepAlive persistent connection feature.

  • On the networking side, it must be capable to handle lost connection issues and detect latency problems, offering a set of callbacks to notify and let the caller take decisions over that.

  • Thread Mode Engine: this optional mode, aims to let the developer start a logger instance that runs in a service mode, it needs to be asynchronous and use lock-free features when handling global data structures.

  • Documentation: the library must be fully documented and with examples as stated before. This topic is fundamental for it adoption.

Fluentd Core Profiling stack

Difficulty: Hight

Tags: C, C Ruby, Valgrind, pprof, Kcachegrind

Description

Despite Fluentd is a very fast log collector made in C Ruby, sometimes third party plugins for inputs or outputs, may affect it performance. We aim to implement a preventive mechanism where Fluentd uses can detect these easily and perform performance reports with real data.

The profiling stack project for Fluentd, needs to add the following features to the core product:

  • Be able to be activated on runtime: think about a service which is running in Production and is facing a low latency when processing thousands of events. On that case let the user to activate the profiling mode and gather some statistics for a fixed amount of time. It activation is on-demand, it would never need a restart.

  • Profiling for our context means gather metrics and timings associated to key calls of the stack. This cover from the Networking inputs, Routing, Filtering and Outputs. The software must be able to report where the bottleneck is.

  • The output data needs to be compatible with common formats, one example is the Valgrind format through the callgrind tool.

This is a very hard project and before to apply for it the student needs to understand the Fluentd internals, the suggestion is to write some Input and Output plugins, as well play with different configuration schemas. It needs good research skills.

Fluentd Cluster mode

Difficulty: High

Tags: C, C Ruby, Networking, TCP, High Availability, Cluster techniques

Description

A Fluentd cluster mode aims to add the feature that each instance can talk to each other and behave like a full cluster. The expectations is that a Fluentd cluster can support the following features:

  • Auto Balancing: as a data collector, the common load and input data may be high. The Auto balancing mode aims to be aware about Fluentd neighbors and delegate the processing requirements to them. This can be done through different balancing options metrics and criteria such as: last recently used, less overloaded service.

  • Fail over: a connection should never fail, if some connection cannot be handled because our slaves (or delegated) services are not alive, perform buffering from that request and re-try once the service get aware about some service instance that can fullfill the requirement.

  • Master/Slave: the Fluentd instances can be Master or Slaves, also a master have the capability to switch a Slave mode.

  • Remote configuration: a master node must be able to push a configuration to a slave. The slave must be able to disable old configuration rules and apply the new ones without discard any active connection. This is a hard requirement and adds a lot of complexity.

The cluster mode project requires the student to know about Networking and TCP in general. The design and architecture of the implementation of this project needs to take in consideration all requirements described and be open enough to be extended as required.

All messaging between the Cluster nodes must use the MsgPack, that means that you aim to implement a in_cluster input plugin