-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve prospector state handling #2840
Conversation
How about a dummy prospector handling files not handled by any prospector. This dummy prospector can check mod-time from time to time if file was updated at all and therefore update the TTL in registry. This way a file would be kept in registry between restarts in case file is not handled by any prospector due to accidental miss-configuration. |
After some more internal conversations I try to summarise here the different options with its pros and cons: Option 1: Keep obsolete states foreverFor states which will not be picked up by a prospector, TTL is set to Pro
Con
Option 2: Keep TTL for obsolete statesObsolete states will be removed after their TTL from the registry file. Pro:
Con:
Option 3: hidden prospectorA hidden prospector takes care of the obsolete states. Pro
Con
ConclusionCurrently I would opt for option 1 as it seems to me the easiest to understand and the one with the least side affects. In case in the future there are problems with inode reuse there are potential solutions like have a flag on startup to clean up obslete states. But as inode reuse is normally local to a prospector, I don't expect this to happen. |
7f90bde
to
9f9fd99
Compare
82879d2
to
33e2a8c
Compare
Previously each prospector was holding old states initially loaded by the registry file. This was changed to that each prospector only loads the states with a path that matches the glob pattern. This reduces the number of states handled by each prospector and reduces overlap. One consequence of this is that in case a log file "moves" from one prospector to an other through renaming during the runtime of filebeat, no state will be found. But this behaviour is not expected and would lead to other issues already now. The expectation is that a file stays inside the prospector over its full lifetime. In case of a filebeat restart including a config change, it can happen that some states are not managed anymore by a prospector. These "obsolete" states will stay in the registrar until a further config change when a glob pattern would again include these states. As an example: Filebeat is started with the following config. ``` filebeat.prospectors: - paths: ['*.log'] clean_removed: true ``` There is a log file `a.log` and `b.log`. Both files are harvested and states for `a.log` and `b.log` are persisted. Then filebeat is stopped and the config file is modified to. ``` filebeat.prospectors: - paths: ['a.log'] clean_removed: true ``` Filebeat is started again. The prospector will now only handle the state for `a.log`. In case `b.log` is removed from disk, the state for `b.log` will stay in the registry file. In case the config file was with `clean_inactive: 5min`, all TTL are reset on restart to `-2` by the registry and `-1` the prospector or the new `clean_inactive` value. Using `-2` in the registrar can be useful in the future to detect which states are not managed anymore by any prospector. As all TTL are reset on restart, persisting of the TTL is not required anymore but can become useful for cleanup so the information is kept in the registry file. Further changes: * Add tests for matchFile method * Add tests for prospector state init filtering * states are passed to `Prospector.Init` on startup instead of setting it directly in the object * `Prospector.Init` and `Prospectorer.Init` now return an error and filebeat exits on startup problems * Remove lastClean as not needed anymore * Have one log file for each bom file test * Update to new vagrant box with Golang 1.7.3
LGTM. |
Previously each prospector was holding old states initially loaded by the registry file. This was changed to that each prospector only loads the states with a path that matches the glob pattern. This reduces the number of states handled by each prospector and reduces overlap.
One consequence of this is that in case a log file "moves" from one prospector to an other through renaming during the runtime of filebeat, no state will be found. But this behaviour is not expected and would lead to other issues already now. The expectation is that a file stays inside the prospector over its full lifetime.
In case of a filebeat restart including a config change, it can happen that some states are not managed anymore by a prospector. These "obsolete" states will stay in the registrar until a further config change when a glob pattern would again include these states. As an example:
Filebeat is started with the following config.
There is a log file
a.log
andb.log
. Both files are harvested and states fora.log
andb.log
are persisted. Then filebeat is stopped and the config file is modified to.Filebeat is started again. The prospector will now only handle the state for
a.log
. In caseb.log
is removed from disk, the state forb.log
will stay in the registry file.In case the config file was with
clean_inactive: 5min
, all TTL are reset on restart to-2
by the registry and-1
the prospector or the newclean_inactive
value. Using-2
in the registrar can be useful in the future to detect which states are not managed anymore by any prospector. As all TTL are reset on restart, persisting of the TTL is not required anymore but can become useful for cleanup so the information is kept in the registry file.Further changes:
Prospector.Init
on startup instead of setting it directly in the objectProspector.Init
andProspectorer.Init
now return an error and filebeat exits on startup problems