Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename *_older to *_inactive #2051

Merged
merged 1 commit into from
Jul 19, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion CHANGELOG.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ https://github.com/elastic/beats/compare/v5.0.0-alpha4...master[Check the HEAD d
*Filebeat*

- Stop following symlink. Symlinks are now ignored: {pull}1686[1686]
- Deprecate force_close_files option and replace it with close_removed and close_renamed {issue}1600[1600]

*Winlogbeat*

Expand Down Expand Up @@ -85,6 +84,9 @@ https://github.com/elastic/beats/compare/v5.0.0-alpha4...master[Check the HEAD d
*Topbeat*

*Filebeat*
- Deprecate close_older option and replace it with close_inactive {issue}2051[2051]
- Deprecate force_close_files option and replace it with close_removed and close_renamed {issue}1600[1600]


*Winlogbeat*

Expand Down
30 changes: 15 additions & 15 deletions filebeat/docs/reference/configuration/filebeat-options.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -180,28 +180,28 @@ For comparison, `ignore_older` relies on the modification time of the file. In c

`ignore_older` can be especially useful if you keep log files for a long time and you start filebeat, but only want to send the newest files to elasticsearch and the old files from the last week, but not all files.

To remove the state from the registry file for files which were harvested before, the `clean_idle` configuration option has to be used.
To remove the state from the registry file for files which were harvested before, the `clean_inactive` configuration option has to be used.


Requirement: ignore_older > close_idle
Requirement: ignore_older > close_inactive

Before a file can be ignored by the prospector, it must be closed. To ensure a file is not harvested anymore when it is ignored, ignore_older must be set to a longer duration then `close_idle`. It can happen, that a file is still harvested but already falls under `ignore_older` as the harvester didn't finish yet. The harvester will finish reading and close it after `close_idle` is reached.
Before a file can be ignored by the prospector, it must be closed. To ensure a file is not harvested anymore when it is ignored, ignore_older must be set to a longer duration then `close_inactive`. It can happen, that a file is still harvested but already falls under `ignore_older` as the harvester didn't finish yet. The harvester will finish reading and close it after `close_inactive` is reached.

[[close-options]]
===== close_*

All `close_*` configuration options are used to close the harvester after a certain criteria or time. Closing the harvester means closing the file handler. In case a file is updated again after the harvester is closed, it will be picked up again after <<scan-frequency>>. It is important to understand, in case the file was moved away or deleted during this period, filebeat will not be able to pick up the file again and any data that the harvester didn't read so far is lost.

[[close-idle]]
===== close_idle
[[close-inactive]]
===== close_inactive

After a file was not harvested for the duration of `close_idle`, the file handle will be closed. The counter for the defined period starts when the last log line was read by the harvester, it is not based on the modification time of the file. In case the closed file changes again, a new harvester is started again, latest after `scan_frequency`.
After a file was not harvested for the duration of `close_inactive`, the file handle will be closed. The counter for the defined period starts when the last log line was read by the harvester, it is not based on the modification time of the file. In case the closed file changes again, a new harvester is started again, latest after `scan_frequency`.

It is recommended to set `close_idle` to a value that is larger then the least frequent updates to your log file. In case your log file gets updated every few seconds, you can safely set it to `1m`. If there are log files with very different update rates, multiple prospector configurations with different values can be used.
It is recommended to set `close_inactive` to a value that is larger then the least frequent updates to your log file. In case your log file gets updated every few seconds, you can safely set it to `1m`. If there are log files with very different update rates, multiple prospector configurations with different values can be used.

Setting `close_idle` to a lower value means file handles are closed faster but has the side affect that new log lines are not sent in near real time in case the harvester was closed.
Setting `close_inactive` to a lower value means file handles are closed faster but has the side affect that new log lines are not sent in near real time in case the harvester was closed.

The timestamp for closing a file does not depend on the modification time of the file but an internal timestamp that is update when the file was last harvested. If `close_idle` is set to 5 minutes, the countdown for the 5 minutes starts the last time the harvester read a line from the file.
The timestamp for closing a file does not depend on the modification time of the file but an internal timestamp that is update when the file was last harvested. If `close_inactive` is set to 5 minutes, the countdown for the 5 minutes starts the last time the harvester read a line from the file.

You can use time strings like 2h (2 hours) and 5m (5 minutes). The default is 1h.

Expand All @@ -218,7 +218,7 @@ WINDOWS: In case under windows your log rotation system shows errors because it

WARNING: Only use this options if you understand the potential side affects with potential data loss.

Close removed can be used to close a harvester directly when a file is removed. Normally a file should only be removed after it already falls under `close_idle`. In case files are removed early, without this option filebeat keeps the file open to make sure finishing is completed. In case the file handle should be released immediately after removal, this option can be used.
Close removed can be used to close a harvester directly when a file is removed. Normally a file should only be removed after it already falls under `close_inactive`. In case files are removed early, without this option filebeat keeps the file open to make sure finishing is completed. In case the file handle should be released immediately after removal, this option can be used.


WINDOWS: In case under windows your log rotation system shows error because it can't rotated the files, this is the option to enabled.
Expand All @@ -241,17 +241,17 @@ Close timeout gives every harvester a predefined lifetime. Independent of the lo

The `clean_*` variables are used to clean up the state entries. This helps to reduce the size of the registry file and can prevent a potential <<inode-reuse-issue>>. These options are disabled by default as wrong settings can lead to data duplicatin as complete log files are sent again.

===== clean_idle
===== clean_inactive

WARNING: Only use this options if you understand the potential side affects with potential data loss.

`clean_idle` removes the state of the file after the given period. The state for files can only be removed if the file is already ignored by filebeat, means it's falling under `ignore_older`. The requirement for clean idle is `clean_idle > ignore_older + scan_frequency` to make sure no states are removed when a file is still harvested. Otherwise it could lead to resending the full content constantly as clean_idle removes state for files which are still detected by the prospector. In case a file is updated or appears again, the file is read from the beginning.
`clean_inactive` removes the state of the file after the given period. The state for files can only be removed if the file is already ignored by filebeat, means it's falling under `ignore_older`. The requirement for clean idle is `clean_inactive > ignore_older + scan_frequency` to make sure no states are removed when a file is still harvested. Otherwise it could lead to resending the full content constantly as `clean_inactive` removes state for files which are still detected by the prospector. In case a file is updated or appears again, the file is read from the beginning.

The `clean_idle` configuration option is useful to reduce the size of the registry file, especially if a large amount of new files are generated every day.
The `clean_inactive` configuration option is useful to reduce the size of the registry file, especially if a large amount of new files are generated every day.

In addition this config option is useful to prevent the <<inode-reuse-issue>>. If a file is deleted, the inode can be reused by a newly created file. If the inode is the same, filebeat assumes to know the file and continues at the old position. As this issues gets more probable over time, it is good to cleanup the old states to make sure filebeat does not assume it already knows the file.

NOTE: Every time a file is renamed, the file state will be updated and the counter for `clean_idle` will start at 0 again.
NOTE: Every time a file is renamed, the file state will be updated and the counter for `clean_inactive` will start at 0 again.

===== clean_removed

Expand All @@ -270,7 +270,7 @@ directory is scanned for files using the frequency specified by
`scan_frequency`. Specify 1s to scan the directory as frequently as possible
without causing Filebeat to scan too frequently. We do not recommend to set this value `<1s`.

If you require log lines to be sent in near real time do not use a very low `scan_frequency` but adjust `close_idle` so the file handler stays open and constantly polls your files.
If you require log lines to be sent in near real time do not use a very low `scan_frequency` but adjust `close_inactive` so the file handler stays open and constantly polls your files.

The default setting is 10s.

Expand Down
8 changes: 4 additions & 4 deletions filebeat/docs/troubleshooting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ include::../../libbeat/docs/getting-help.asciidoc[]

== Reduce open file handlers

Filebeat keeps the file handler open in case it reaches the end of a file to read new log lines in near real time. If filebeat is harvesting a large number of files, the number of open files can be become an issue. In most environments, the number of files which are actively updated is low. The configuration `close_idle` should be set accordingly to close files which are not active any more.
Filebeat keeps the file handler open in case it reaches the end of a file to read new log lines in near real time. If filebeat is harvesting a large number of files, the number of open files can be become an issue. In most environments, the number of files which are actively updated is low. The configuration `close_inactive` should be set accordingly to close files which are not active any more.

There are 4 more configuration options which can be used to close file handlers, but all of them should be used carefully as they can side affects. The options are:

Expand All @@ -32,16 +32,16 @@ Before using any of these variables, make sure to study the documentation on eac
[[reduce-registry-size]]
== Reduce Registry File Size

Filebeat keeps all states of the files and persists the states on disk in the `registry_file`. The states are used to continue file reading at a previous position in case filebeat is restarted. In case every day a large amount of new files is constantly produced, the registry file grows over time. To reduce the size of the registry file, there are two configuration variables: `clean_removed` and `clean_idle`.
Filebeat keeps all states of the files and persists the states on disk in the `registry_file`. The states are used to continue file reading at a previous position in case filebeat is restarted. In case every day a large amount of new files is constantly produced, the registry file grows over time. To reduce the size of the registry file, there are two configuration variables: `clean_removed` and `close_inactive`.

In case old files are not touched anymore and fall under `ignore_older`, it is recommended to use `clean_idle`. If on the other size old files get removed from disk `clean_removed` can be used.
In case old files are not touched anymore and fall under `ignore_older`, it is recommended to use `clean_inactive`. If on the other size old files get removed from disk `clean_removed` can be used.

[[inode-reuse-issue]]
== Inode Reuse Issue

Filebeat uses under linux inode and device to identify files. In case a file is removed from disk, the inode can again be assigned to a new file. In the case of file rotation where and old file is removed and a new one is directly created afterwards, it can happen that the new files has the exact same inode. In this case, Filebeat assumes that the new file is the same as the old and tries to continue reading at the old position which is not correct.

By default states are never removed from the registry file. In case of inode reuse issue it is recommended to use the `clean_*` options, especially `clean_idle`. In case your files get rotated every 24 hours and the rotated files rotated files are not updated anymore, `ignore_older` could be set to 48 hours and `clean_idle` 72 hours.
By default states are never removed from the registry file. In case of inode reuse issue it is recommended to use the `clean_*` options, especially `clean_inactive`. In case your files get rotated every 24 hours and the rotated files rotated files are not updated anymore, `ignore_older` could be set to 48 hours and `clean_inactive` 72 hours.

`clean_removed` can be used for files that are removed from disk. Be aware that `clean_removed` also applies if during one scan a file cannot be found anymore. In case the file shows up at a later stage again, it will be sent again from scratch.

Expand Down
8 changes: 4 additions & 4 deletions filebeat/etc/beat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,10 +163,10 @@ filebeat.prospectors:

### Harvester closing options

# Close idle closes the file handler after the predefined period.
# Close inactive closes the file handler after the predefined period.
# The period starts when the last line of the file was, not the file ModTime.
# Time strings like 2h (2 hours), 5m (5 minutes) can be used.
#close_idle: 1h
#close_inactive: 1h

# Close renamed closes a file handler when the file is renamed or rotated.
# Note: Potential data loss. Make sure to read and understand the docs for this option.
Expand All @@ -191,9 +191,9 @@ filebeat.prospectors:

### State options

# Files for the modification data is older then clean_older the state from the registry is removed
# Files for the modification data is older then clean_inactive the state from the registry is removed
# By default this is disabled.
#clean_idle: 0
#clean_inactive: 0

# Removes the state for file which cannot be found on disk anymore immediately
#clean_removed: false
Expand Down
14 changes: 7 additions & 7 deletions filebeat/filebeat.full.yml
Original file line number Diff line number Diff line change
Expand Up @@ -163,10 +163,10 @@ filebeat.prospectors:

### Harvester closing options

# Close idle closes the file handler after the predefined period.
# Close inactive closes the file handler after the predefined period.
# The period starts when the last line of the file was, not the file ModTime.
# Time strings like 2h (2 hours), 5m (5 minutes) can be used.
#close_idle: 1h
#close_inactive: 1h

# Close renamed closes a file handler when the file is renamed or rotated.
# Note: Potential data loss. Make sure to read and understand the docs for this option.
Expand All @@ -191,9 +191,9 @@ filebeat.prospectors:

### State options

# Files for the modification data is older then clean_older the state from the registry is removed
# Files for the modification data is older then clean_inactive the state from the registry is removed
# By default this is disabled.
#clean_idle: 0
#clean_inactive: 0

# Removes the state for file which cannot be found on disk anymore immediately
#clean_removed: false
Expand Down Expand Up @@ -273,8 +273,8 @@ filebeat.prospectors:

#================================ Processors =====================================

# Processors are used to reduce the number of fields in the exported event or to
# enhance the event with external meta data. This section defines a list of processors
# Processors are used to reduce the number of fields in the exported event or to
# enhance the event with external meta data. This section defines a list of processors
# that are applied one by one and the first one receives the initial event:
#
# event -> filter1 -> event1 -> filter2 ->event2 ...
Expand Down Expand Up @@ -380,7 +380,7 @@ output.elasticsearch:
#template.overwrite: false

# If set to true, filebeat checks the Elasticsearch version at connect time, and if it
# is 2.x, it loads the file specified by the template.versions.2x.path setting. The
# is 2.x, it loads the file specified by the template.versions.2x.path setting. The
# default is true.
#template.versions.2x.enabled: true

Expand Down
9 changes: 8 additions & 1 deletion filebeat/harvester/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ var (
Backoff: 1 * time.Second,
BackoffFactor: 2,
MaxBackoff: 10 * time.Second,
CloseOlder: 1 * time.Hour,
CloseInactive: 1 * time.Hour,
MaxBytes: 10 * humanize.MiByte,
CloseRemoved: false,
CloseRenamed: false,
Expand All @@ -41,6 +41,7 @@ type harvesterConfig struct {
Backoff time.Duration `config:"backoff" validate:"min=0,nonzero"`
BackoffFactor int `config:"backoff_factor" validate:"min=1"`
MaxBackoff time.Duration `config:"max_backoff" validate:"min=0,nonzero"`
CloseInactive time.Duration `config:"close_inactive"`
CloseOlder time.Duration `config:"close_older"`
CloseRemoved bool `config:"close_removed"`
CloseRenamed bool `config:"close_renamed"`
Expand All @@ -62,6 +63,12 @@ func (config *harvesterConfig) Validate() error {
logp.Warn("DEPRECATED: force_close_files was set to true. Use close_removed + close_rename")
}

// DEPRECATED: remove in 6.0
if config.CloseOlder > 0 {
config.CloseInactive = config.CloseOlder
logp.Warn("DEPRECATED: close_older is deprecated. Use close_inactive")
}

// Check input type
if _, ok := cfg.ValidInputType[config.InputType]; !ok {
return fmt.Errorf("Invalid input type: %v", config.InputType)
Expand Down
13 changes: 13 additions & 0 deletions filebeat/harvester/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ package harvester

import (
"testing"
"time"

"github.com/stretchr/testify/assert"
)
Expand All @@ -20,3 +21,15 @@ func TestForceCloseFiles(t *testing.T) {
assert.True(t, config.CloseRemoved)
assert.True(t, config.CloseRenamed)
}

func TestCloseOlder(t *testing.T) {

config := defaultConfig
assert.Equal(t, config.CloseOlder, 0*time.Hour)
assert.Equal(t, config.CloseInactive, defaultConfig.CloseInactive)

config.CloseOlder = 5 * time.Hour
config.Validate()

assert.Equal(t, config.CloseInactive, 5*time.Hour)
}
2 changes: 1 addition & 1 deletion filebeat/harvester/log.go
Original file line number Diff line number Diff line change
Expand Up @@ -277,7 +277,7 @@ func (h *Harvester) newLogFileReaderConfig() reader.LogFileReaderConfig {
return reader.LogFileReaderConfig{
CloseRemoved: h.config.CloseRemoved,
CloseRenamed: h.config.CloseRenamed,
CloseOlder: h.config.CloseOlder,
CloseInactive: h.config.CloseInactive,
CloseEOF: h.config.CloseEOF,
Backoff: h.config.Backoff,
MaxBackoff: h.config.MaxBackoff,
Expand Down
2 changes: 1 addition & 1 deletion filebeat/harvester/log_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ func TestReadLine(t *testing.T) {

h := Harvester{
config: harvesterConfig{
CloseOlder: 500 * time.Millisecond,
CloseInactive: 500 * time.Millisecond,
Backoff: 100 * time.Millisecond,
MaxBackoff: 1 * time.Second,
BackoffFactor: 2,
Expand Down
6 changes: 3 additions & 3 deletions filebeat/harvester/reader/log.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ type LogFileReaderConfig struct {
MaxBackoff time.Duration
BackoffFactor int
CloseEOF bool
CloseOlder time.Duration
CloseInactive time.Duration
CloseRenamed bool
CloseRemoved bool
}
Expand Down Expand Up @@ -129,9 +129,9 @@ func (r *logFileReader) errorChecks(err error) error {
return ErrFileTruncate
}

// Check file wasn't read for longer then CloseOlder
// Check file wasn't read for longer then CloseInactive
age := time.Since(r.lastTimeRead)
if age > r.config.CloseOlder {
if age > r.config.CloseInactive {
return ErrInactive
}

Expand Down
4 changes: 2 additions & 2 deletions filebeat/prospector/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ var (
IgnoreOlder: 0,
ScanFrequency: 10 * time.Second,
InputType: cfg.DefaultInputType,
CleanOlder: 0,
CleanInactive: 0,
CleanRemoved: false,
}
)
Expand All @@ -24,7 +24,7 @@ type prospectorConfig struct {
Paths []string `config:"paths"`
ScanFrequency time.Duration `config:"scan_frequency"`
InputType string `config:"input_type"`
CleanOlder time.Duration `config:"clean_older" validate:"min=0"`
CleanInactive time.Duration `config:"clean_inactive" validate:"min=0"`
CleanRemoved bool `config:"clean_removed"`
}

Expand Down
Loading