-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dynamic configuration reloading for modules in Metricbeat #3281
Conversation
What if a dynamic module resembles a global module?
No. This is what Connect is for, connecting multiple producers to the publisher pipeline. Just close the client once you're done.
Hmmm... One solution might be to unpack the full module config into a |
go func() { | ||
// Start Config monitoring | ||
configReloader.Start() | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== go configReloader.Start()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
mutex sync.Mutex | ||
} | ||
|
||
func NewConfigReloader(config ReloaderConfig, p publisher.Publisher) ConfigReloader { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why no pointer? Using NewXXX implies a pointer being returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
} | ||
} | ||
|
||
func (r *ConfigReloader) Start() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/Start/Run
? This method implements the run-loop itself. Using Start
implies a go-routine being started, but returns immediately, Run
implies the method blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
close(r.done) | ||
r.wg.Wait() | ||
// Stop all running modules | ||
r.StopModules(r.registry.CopyList()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively StopModules
can be done via defer
in run loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree. moved.
continue | ||
} | ||
|
||
lastScan = lastScanTemp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why set lastScan only if any files have been changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even thought that was lost during refactoring and initially the purpose was to only update lastScan if no error, I think it is still correct like this. If there are no file updates, it does not matter if the comparison of the files is to the most recent lastScan or one of the previous where no files were found. The resulting outcome should be the same and it keeps the code simpler. I will add a comment to the code.
@urso Thanks for feedback:
|
@@ -0,0 +1,252 @@ | |||
package beater |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the code in this file could be generalized into a libbeat package. It seems to me that it contains relatively little Metricbeat specific logic, except for the StartModules
and StopModules
functions which can be easily injected.
I'm thinking that we'll want to make it easy to extend this type of reloading functionality to other beats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fully agree, that we probably should extend this to other beats, especially filebeat. Means this functionality will move to libbeat. I prefer to keep it currently in metricbeat to keep the changes as minimal as possible and do the abstraction in a second step when we add it to other beats.
@@ -10,41 +10,13 @@ | |||
#========================== Modules configuration ============================ | |||
metricbeat.modules: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think metricbeat.modules
needs to be removed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monicasarbu This one is WIP. I will restore the old metricbeat.yml as soon as I added automated tests. This file should not change.
3cc2565
to
802860d
Compare
e94bb57
to
aad532c
Compare
metricbeat.reload: | ||
|
||
# Glob pattern for configuration reloading | ||
path: config/*.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with conf.d/*.yml
. I think that name might provide a hint about the purpose of the directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could create this directory automatically in /etc/metricbeat/conf.d
when we install the rpm/deb. And do the same for Filebeat.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to conf.d
. I like the idea of creating this automatically. Reminds me to check to what the path is relative. Should the default be ${path.conf}/conf.d/*.yml
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW: Creation of the directory I would suggest not to do in this PR and look into later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! My comments are all just minor issues.
|
||
// Config is the root of the Metricbeat configuration hierarchy. | ||
type Config struct { | ||
// Modules is a list of module specific configuration data. | ||
Modules []*common.Config `config:"modules" validate:"required"` | ||
Modules []*common.Config `config:"modules"` | ||
ReloadModules ModulesReloadConfig `config:"reload.modules"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with other parts of the config that are disabled by default should this be a pointer along with the Enabled *bool
? It's a subtle difference, but it means that setting
metricbeat.reload.modules.path: conf.d/*.yml
would implicitly enable the reloading without having to also specify
metricbeat.reload.modules.enabled: true
An example of this is the tls config in the outputs which have an IsEnabled helper method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it to the same logic as the output. +1 on consistency here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, that didn't work as I expected. Because I have a default config, which I already load on the metricbeat level, the config is never nil. I plan to extract the reload part anyways. Ok if I do it in a second step?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fine. With using the pointer you'll need handle defaults when reading the individual values (haven't found a better way).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to make ReloadModules
a pointer too. But with the default present it won't be nil.
continue | ||
} | ||
|
||
// Directories are symlinks are skipped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Directories and symlinks"?
// Check if one of the files was changed recently | ||
// File modification time can be in seconds. -1 is to cover for files which | ||
// were created during this second. | ||
if info.ModTime().Sub(gcd.lastScan) >= -1*time.Second { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use if info.ModTime().After(gcd.lastScan)
here? That seems more readable. Then to account for the one second resolution you could Truncate the last scan time using if info.ModTime().After(gcd.lastScan.Truncate(time.Second))
.
What systems still store the modification time in seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to your suggestion. OS X seems to store mtime in seconds. At least I only see seconds when I use stat
.
|
||
hash, err := hashstructure.Hash(files, nil) | ||
if err != nil { | ||
logp.Err("Error hashing files list: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this method continue if hashing fails? I didn't look to see what can cause an error here, but if it does fail maybe it should return the error OR not return an error and base its decision solely on the updatedFails
flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I initially ignored the error as I didn't see an option how we ever get here an error. But to be sure we catch it in case it happens, I added the log line.
If there is an error, the hash is 0. So in case hashing fails one time because of reason x, scan would still return changed which is the save way to go. I prefer to return changed once too many. If it continously fails, it will in the future depend on updatedFiles
only. The not so nice part about this is that it does not detect removed files with updatedFiles.
Thinking about this again, probably the most consistent behaviour would be directly return files, true, error
in case of an error, to make sure in an error case changed
is true. Will change it to this.
|
||
// Create random temp directory | ||
id := strconv.Itoa(rand.New(rand.NewSource(int64(time.Now().Nanosecond()))).Int()) | ||
dir := os.TempDir() + "/" + id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use ioutil.TempDir("", id)
for this. https://golang.org/pkg/io/ioutil/#TempDir
And you should cleanup the dir with a defer os.RemoveAll(dir)
.
wg sync.WaitGroup | ||
} | ||
|
||
type RunningRegistry struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is used internally only by the ConfigReloader. Can this be unexported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, changed
|
||
type RunningRegistry struct { | ||
List map[uint64]ModuleRunner | ||
mutex sync.Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the mutex is required to access List
then they should have the same visibility. But I like embedding the mutex in the struct then you can call runningReg.Lock()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
return c, nil | ||
} | ||
|
||
func NewRunningRegistry() RunningRegistry { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that this struct contains a sync.Mutex
, returning a *RunningRegistry
would better indicate to the caller that this object needs to be passed by reference than by value (as you should not copy a sync.Mutex
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
|
||
debugr("Number of module wrappers created: %v", len(s)) | ||
|
||
startList := []*ModuleWrapper{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use var startList []*ModuleWrapper
to avoid allocating an empty map if it doesn't get used. ref
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
metricbeat.reload: | ||
|
||
# Glob pattern for configuration reloading | ||
path: config/*.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could create this directory automatically in /etc/metricbeat/conf.d
when we install the rpm/deb. And do the same for Filebeat.
@andrewkroh I pushed 2 new commits. One for the general refactoring and one for the config changes to make it relative. Could you have a look? |
period: 10s | ||
------------------------------------------------------------------------------ | ||
|
||
A path with a glob most be defined on which files should be checked for changes. A period is set on how often |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/s/most/must
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
|
||
# Config reloading allows to dynamically load modules. Each file which is | ||
# monitored must contain one or multiple modules as a list. | ||
metricbeat.reload: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be metricbeat.reload.modules
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitively, thanks for spotting. Fixed.
metricbeat.reload: | ||
|
||
# Glob pattern for configuration reloading | ||
path: conf.d/*.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about setting this to:
path: '${path.config}/conf.d/*.yml'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed
"github.com/mitchellh/hashstructure" | ||
) | ||
|
||
type GlobChangeDetector struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GlobWatcher
, Wacher
, or FileWatcher
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like GlobWatcher. Changing ...
// Directories and symlinks are skipped | ||
if !info.Mode().IsRegular() { | ||
continue | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm... having symlink support would be nice to have. This way op can enable modules selectively via ln -s
... (kinda common practice in init systems, even systemd).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, enabled symlinks
@andrewkroh @monicasarbu @urso I applied the changes. If ok for all of you I will squash and rebase to update the commit message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good.
|
||
// Config is the root of the Metricbeat configuration hierarchy. | ||
type Config struct { | ||
// Modules is a list of module specific configuration data. | ||
Modules []*common.Config `config:"modules" validate:"required"` | ||
Modules []*common.Config `config:"modules"` | ||
ReloadModules ModulesReloadConfig `config:"reload.modules"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to make ReloadModules
a pointer too. But with the default present it won't be nil.
"fmt" | ||
"sync" | ||
"time" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra line here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About Reload: Yeah, tried different versions, but didn't work. Extracting it and having a separate unpack should work (in a later PR). Not sure why I can't comment above -> Github fail.
Will remove the extra line when rebasing.
Currently if new module configs have to be added or changed, Metricbeat needs to be restarted. This change allows to define a configuration directory where new files can be added, removed or modified and updates will automatically processed by Metricbeat. New modules will started / stopped accordingly. This is especially useful in container environments where 1 container is used to monitor all services in other containers on the same host. New containers appear and disappear dynamically which requires changes to which modules are needed and which hosts must be monitored. **Configuration** The configuration in the main metricbeat.yml config file looks as following: ``` metricbeat.reload.modules: enabled: true path: configs/*.yml period: 10s ``` A path with a glob most be defined on which files should be checked for changes. A period is set on how often the files are checked for changes. The configuration inside the files which are found by the glob look as following: ``` - module: system metricsets: ["cpu"] enabled: false perdiod: 1s - module: system metricsets: ["network"] enabled: true period: 10s ``` Each file directly contains a list of modules. Each file can contain one or multiple module definitions. **How does it work** When Metricbeat is started a process is started that monitors the defined glob files for changes. If a change is detected, all config files are reloaded and module wrappers are created. A hash is created for the module wrapper to identify if it is a module that is already running or a new module. After loading all modules, all modules which are not in the list anymore will be stopped and removed. All new ones will be started. The effect of the above is that if 2 identical module configuration exists, only 1 is started. It is not expected to have 2 modules with the exact same configuration. The dynamic modules have no effect on the modules defined in the global configuration file. These modules will never change and keep running. Also if there is a dynamic module with the exact same config as a global module, these two will not be correlated and both will be run. **Changes** * Introduce configuration variables for config reloading * Change metricbeat behaviour that if `metricbeat.config.enabled: true`, on startup also 0 modules can be defined. Also it is possible that at some point in time no modules are running. * Change shutdown to also shut down dynamic modules * Expvar stats for reloading added * Add common configuration files to metricbeat to have additional configs except from modules * Add basic system tests
c075357
to
8fed57c
Compare
jenkins, test it |
Currently if new module configs have to be added or changed, Metricbeat needs to be restarted. This change allows to define a configuration directory where new files can be added, removed or modified and updates will automatically processed by Metricbeat. New modules will started / stopped accordingly. This is especially useful in container environments where 1 container is used to monitor all services in other containers on the same host. New containers appear and disappear dynamically which requires changes to which modules are needed and which hosts must be monitored.
Configuration
The configuration in the main metricbeat.yml config file looks as following:
A path with a glob most be defined on which files should be checked for changes. A period is set on how often the files are checked for changes.
The configuration inside the files which are found by the glob look as following:
Each file directly contains a list of modules. Each file can contain one or multiple module definitions.
How does it work
When Metricbeat is started a process is started that monitors the defined glob files for changes. If a change is detected, all config files are reloaded and module wrappers are created. A hash is created for the module wrapper to identify if it is a module that is already running or a new module. After loading all modules, all modules which are not in the list anymore will be stopped and removed. All new ones will be started.
The effect of the above is that if 2 identical module configuration exists, only 1 is started. It is not expected to have 2 modules with the exact same configuration.
The dynamic modules have no effect on the modules defined in the global configuration file. These modules will never change and keep running. Also if there is a dynamic module with the exact same config as a global module, these two will not be correlated and both will be run.
Changes
metricbeat.config.enabled: true
, on startup also 0 modules can be defined. Also it is possible that at some point in time no modules are running.