-
Notifications
You must be signed in to change notification settings - Fork 135
NR-228258 Logging too many files warning msg #1823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NR-228258 Logging too many files warning msg #1823
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice work @Sivakumar3695 ! I left you some comments for you to consider. 🙇
pkg/integrations/v4/logs/cfg.go
Outdated
} | ||
files, err := filepath.Glob(l.File) | ||
if err != nil { | ||
cfgLogger.WithField("filePath", l.File).Error("Error while reading files under the given file path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This will also tell us if any of the filepaths is haveing any permission issue. Would it be possible to include the err
in the logged message as well? What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have included the error in the logger as below:
cfgLogger.WithField("filePath", l.File).WithError(err).Error("Error while reading files under the given file path")
name string | ||
logCfg LogCfg | ||
expectedCount int | ||
}{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add test cases with:
- One that contains a glob of an existing folder but with no matches (should produce a 0 in the output message). i.e.
./*_test.starwars
- One that contains a glob of a non-existing folder (should throw an error due to not having access to that folder or not existing):
./starwars/*_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a test case for the first one -> with no matches, will produce 0 target files
For the second test case suggested, the current code consumes the error inside the method itself. Hence, that cannot be tested in the unit testing. Instead, I have added a unit test case to check if 0 is returned as Target files count. I hope this is fine. Please feel free to improve or provide alternative solution.
12e51a3
to
949d800
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, thanks a lot Siva! I have made up my mind again regarding the displayed message, feel free to ignore the suggestion if you think it's already alright.
Please note that this is a friendly warning message. If your operating system allows more than {{ .DefaultFileLimit }} file descriptors/inotify watchers | ||
or if you already increased their maximum amount by following the above link, you can safely ignore this message. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that this is a friendly warning message. If your operating system allows more than {{ .DefaultFileLimit }} file descriptors/inotify watchers | |
or if you already increased their maximum amount by following the above link, you can safely ignore this message. | |
Please note that this is a friendly warning message. You can safely ignore this message if your operating system allows more than {{ .DefaultFileLimit }} file descriptors/inotify watchers | |
or if you have already increased their maximum amount by following the above link. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. This would be much more opt than the previous message. I agree with you on this one and have update the message accordingly. Nice suggestion @jsubirat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code LGTM, but I have a minor concern regarding the amount of lines this warning can generate. Actually, the infastructure-agent does not have any multiline log entry and the format can be changed to "json" instead of plain text. Could it happen that the logging directory has many subdirectories? If so, the warning text would be quite long. Is it required to show detailed information of each directory? Could we move the details of each subdirectory to trace level?
@jsubirat @Sivakumar3695 Currently, the Fluent-bit infrastructure agent's supervisor auto removes temporal configuration files if there are more than 50: https://github.com/newrelic/infrastructure-agent/blob/master/pkg/integrations/v4/supervisor_fb.go#L32 Should the fb supervisor auto remove this files too? |
Related to the generated warning message, would it be correctly parsed by fluent-bit itself? Would it take into account the multiline text? Have you considered pointing to public documentation URL or local file path instead? |
Hi @rogercoll
I will try to work on this and will come up with an alternative way to convey the same message. Moving to Json? Idk, probably, I need to check to confirm this. Currently, we print the generated fluent-bit config file in debug and I will look if the same can be done here as well. |
No. These are log files of the users that they want to ship the log data to newrelic. This is not something that can't be removed by the Infra-agent. We provide the warning message to inform the user that "Too many files are opened for log forwarding" and they need to increase the number of files that can be opened by a process in an OS. |
I will check once on whether the multi-line warning message can be correctly processed by fluent-bit. For the public doc link suggested by you, in fact, we provided a public documentation URL in the warning message itself. However, the detailed warning message is only to help users / a GTSE engineer to understand the root cause of the issue by giving insights into the folders containing too many files. |
@jsubirat anything that you would like to add here? |
@rogercoll Would it be fine to move the warning message to a log field as below?
This way, the multi-line warning message in discussion would be wrapped inside a log field similar to how we log fluent-bit configuration file here. @jsubirat We can reduce the log level for this to Trace. WDYT? |
e16ddd4
to
197dca1
Compare
197dca1
to
66c55d6
Compare
Hi @rogercoll.
![]()
![]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Sivakumar3695 ! New code changes look good to me, just added a small comment
ede93ce
to
e09b06a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description:
When there are too many files watched by the FB process, the OS will stop the process. To help users understand what went wrong, a warning log message will be produced which the user can take a look to understand the number of files targeted by each file paths provided for log forwarding.
With the changes in this PR, now, the Infra-Agent (while starting up) will compute the amount of target files for each file path config. If the total target files exceed the recommended maximum, the warning message will be printed in the logs.