diff --git a/packages/filestream/_dev/build/docs/README.md b/packages/filestream/_dev/build/docs/README.md index d6c6f5c99b5..5638b287d82 100644 --- a/packages/filestream/_dev/build/docs/README.md +++ b/packages/filestream/_dev/build/docs/README.md @@ -3,6 +3,13 @@ WARNING: Migrating from the "Custom Logs (Deprecated)" to "Custom Logs (Filestream)" will cause files to be re-ingested because the state is not migrated. +IMPORTANT: The Filestream integration will only start ingesting files +**when they are 1024 bytes in size or larger**. This can be adjusted by +setting "Fingerprint length", however it will influence how files are +identified. Refer to the +[fingerprint](https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-filestream#filebeat-input-filestream-file-identity-fingerprint) +documentation for more details. + In future releases it's expected to have an automated way to migrate the state. However, this is not possible at the moment. The current best option for minimizing the data duplication while migrating to "Custom Logs (Filestream)" is to use the 'Ignore Older' or 'Exclude Files' options. diff --git a/packages/filestream/changelog.yml b/packages/filestream/changelog.yml index ef1e35b7402..2c4e3c6e242 100644 --- a/packages/filestream/changelog.yml +++ b/packages/filestream/changelog.yml @@ -1,4 +1,9 @@ # newer versions go on top +- version: "1.1.4" + changes: + - description: Add warning about only ingesting files >= 1024 bytes + type: enhancement + link: https://github.com/elastic/integrations/pull/14209 - version: "1.1.3" changes: - description: Correct the readme diff --git a/packages/filestream/data_stream/generic/manifest.yml b/packages/filestream/data_stream/generic/manifest.yml index 4585d2170f9..7aaa5ce0f71 100644 --- a/packages/filestream/data_stream/generic/manifest.yml +++ b/packages/filestream/data_stream/generic/manifest.yml @@ -98,7 +98,7 @@ streams: title: Enable symlinks description: | The symlinks option allows Elastic Agent to harvest symlinks in addition to regular files. When harvesting symlinks, Elastic Agent opens and reads the original file even though it reports the path of the symlink. - ** Because this option may lead to data loss, it is disabled by default. ** + **Because this option may lead to data loss, it is disabled by default.** required: false show_user: false - name: resend_on_touch @@ -112,7 +112,7 @@ streams: type: text title: Check Interval description: | - How often Elastic Agent checks for new files in the paths that are specified for harvesting. For example Specify 1s to scan the directory as frequently as possible without causing Elastic Agent to scan too frequently. ** We do not recommend to set this value <1s. ** + How often Elastic Agent checks for new files in the paths that are specified for harvesting. For example Specify 1s to scan the directory as frequently as possible without causing Elastic Agent to scan too frequently. **We do not recommend to set this value <1s.** required: false show_user: false - name: ignore_older @@ -142,7 +142,7 @@ streams: type: bool title: Close on State Changed Renamed description: | - ** Only use this option if you understand that data loss is a potential side effect. ** + **Only use this option if you understand that data loss is a potential side effect.** When this option is enabled, Elastic Agent closes the file handler when a file is renamed. This happens, for example, when rotating files. By default, the harvester stays open and keeps reading the file because the file handler does not depend on the file name. required: false show_user: false @@ -157,7 +157,7 @@ streams: type: bool title: Close Reader EOF description: | - ** Only use this option if you understand that data loss is a potential side effect. ** + **Only use this option if you understand that data loss is a potential side effect.** When this option is enabled, Elastic Agent closes a file as soon as the end of a file is reached. This is useful when your files are only written once and not updated from time to time. For example, this happens when you are writing every single log event to a new file. This option is disabled by default. required: false show_user: false @@ -165,7 +165,7 @@ streams: type: text title: Close Reader After Interval description: | - ** Only use this option if you understand that data loss is a potential side effect. Another side effect is that multiline events might not be completely sent before the timeout expires. ** + **Only use this option if you understand that data loss is a potential side effect. Another side effect is that multiline events might not be completely sent before the timeout expires.** This option is particularly useful in case the output is blocked, which makes Elastic Agent keep open file handlers even for files that were deleted from the disk. For more information see the [documentation](https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html#filebeat-input-filestream-close-timeout). required: false @@ -175,7 +175,7 @@ streams: title: Clean Inactive default: -1 description: | - ** Only use this option if you understand that data loss is a potential side effect. ** + **Only use this option if you understand that data loss is a potential side effect.** When this option is enabled, Elastic Agent removes the state of a file after the specified period of inactivity has elapsed. E.g: "30m", Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h". By default cleaning inactive states is disabled, -1 is used to disable it. required: false @@ -185,7 +185,7 @@ streams: title: Clean Removed description: | When this option is enabled, Elastic Agent cleans files from the registry if they cannot be found on disk anymore under the last known name. - ** You must disable this option if you also disable Close Removed. ** + **You must disable this option if you also disable Close Removed.** required: false show_user: false - name: harvester_limit @@ -209,7 +209,7 @@ streams: title: Backoff Max description: | The maximum time for Elastic Agent to wait before checking a file again after EOF is reached. The default is 10s. - ** Requirement: Set Backoff Max to be greater than or equal to Backoff Init and less than or equal to Check Interval (Backoff Init <= Backoff Max <= Check Interval). ** + **Requirement: Set Backoff Max to be greater than or equal to Backoff Init and less than or equal to Check Interval (Backoff Init <= Backoff Max <= Check Interval).** required: false show_user: false - name: fingerprint @@ -217,8 +217,8 @@ streams: type: bool default: true description: | - ** Changing file_identity methods between runs may result in - duplicated events in the output. ** + **Changing file_identity methods between runs may result in + duplicated events in the output.** Uses a fingerprint generated from the first few bytes (1k is the default, this can be configured via Fingerprint offset and length) to identify a file instead inode + device ID. diff --git a/packages/filestream/docs/README.md b/packages/filestream/docs/README.md index 30922eabca9..c53a93d1242 100644 --- a/packages/filestream/docs/README.md +++ b/packages/filestream/docs/README.md @@ -3,6 +3,13 @@ WARNING: Migrating from the "Custom Logs (Deprecated)" to "Custom Logs (Filestream)" will cause files to be re-ingested because the state is not migrated. +IMPORTANT: The Filestream integration will only start ingesting files +**when they are 1024 bytes in size or larger**. This can be adjusted by +setting "Fingerprint length", however it will influence how files are +identified. Refer to the +[fingerprint](https://www.elastic.co/docs/reference/beats/filebeat/filebeat-input-filestream#filebeat-input-filestream-file-identity-fingerprint) +documentation for more details. + In future releases it's expected to have an automated way to migrate the state. However, this is not possible at the moment. The current best option for minimizing the data duplication while migrating to "Custom Logs (Filestream)" is to use the 'Ignore Older' or 'Exclude Files' options. diff --git a/packages/filestream/manifest.yml b/packages/filestream/manifest.yml index fe7530c2f14..bf7bf9bc2ff 100644 --- a/packages/filestream/manifest.yml +++ b/packages/filestream/manifest.yml @@ -3,7 +3,7 @@ name: filestream title: Custom Logs (Filestream) description: Collect log data using filestream with Elastic Agent. type: integration -version: 1.1.3 +version: 1.1.4 conditions: kibana: version: "^8.15.0 || ^9.0.0"