-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve symlink handling in Filebeat #1686
Comments
I would try to go without the second one, and see what backlash we get. I think in this case symlinks can provoke a lot of corner cases. |
@tsg What do you mean with the second option? Both changes are required. |
I was thinking to simply don't support symlinks. It's removing Tudor
|
@tsg Yes. Ok, then we are on the same page. First remove it and potentially reintroduce it in a second step. |
We are also seeing this issue. We use github.com/golang/glog for logging. glog typically creates the log file with a long name and places a symlink to the log file in the same folder specified in log_dir flag. In this case, filebeat is reading the file twice. |
Previously symlinks were followed. This had the consequence if a symlink and the file itself existed, the file was read twice. Now symlinks are not followed anymore. This closes elastic#1686
#1767 now removes following symlinks. I suggest to keep it that way and only introduce a config to enable it in case we get feature requests for it. |
* Stop following symlinks. Previously symlinks were followed. This had the consequence if a symlink and the file itself existed, the file was read twice. Now symlinks are not followed anymore. This closes #1686 * Add symlink for windows * Turn around params * Remove symlink comment * Fix for windows symlink
We use filebeat to collect logs from Docker logs in Kubernetes cluster. Kubernetes provides a handy path with all container logs which are symlinks to We could just use original path instead of the Kubernetes symlinked one, but we rely on the filename and creating fields based the filename. So we won't be able to upgrade to version 5 until we can re-enable symlinks. |
@shamil Thanks for reporting this and sharing the insights. Can you share some more details on how you use the filename and what the original filename would look like? I'm not too familiar with kubernetes. |
@ruflin, I'm doing something similar to this: https://github.com/ApsOps/filebeat-kubernetes |
Thanks for the link. The interesting part for me is the following:
As far as I understand this files are symlinks to a file somewhere else. The file name can be used in logstash to add additional data meta data to the event. Some questions:
We had in the past the discussion to follow symlinks and read the original file to prevent some symlinks edge cases. But that seems not to work in your case as this would send the original file name and not the one you have above. |
The symlinks automatically updated by kubernetes. The original files arw regular docker logs in |
Sorry for all the questions. But I'm kind of surprised that it didn't cause and issues so far and want to understand it more in detail. My assumption so far:
Seems like I need to write some tests to see what the actual behaviour is. |
OK, let me explain.
I guess Kubernetes guys thought about possible problems and made the necessary steps to avoid the issues you mentioned. I think if it possible to have non-default option for enabling symlinks, that would be great! Here is an example symlink from
Thanks |
A quote from the logrotate docs:
With filebeat tailing the files and not harvesting the new file (as it is copied and not found) the chance of loosing some log lines is even higher. My current conclusion is that this seems to be an acceptable trade off for people. It would mean loosen the filebeat guarantee that all log lines are sent for symlinks, but the same actually applies to all copytruncate use cases. |
For people interested in this issue, there is now also a PR with a potential implementation: #2478 |
Currently Filebeat treats symlink as normal files. In case a file appears in the glob as symlink and file, the content is read twice. The following changes should be made:
Is the second option needed?
For reference also see: https://discuss.elastic.co/t/filebeat-fails-to-harvest-if-a-file-and-a-symlink-to-that-file-is-in-the-same-directory/49743/3
The text was updated successfully, but these errors were encountered: