Skip to content

Conversation

@belimawr
Copy link
Contributor

@belimawr belimawr commented Jun 11, 2025

Proposed commit message

Filestream now logs one line at warn level per scan with the number of files that are too small to be ingested.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

## Disruptive User Impact
## Author's Checklist

How to test this PR locally

  1. Create a few small files

    echo "foo" > /tmp/small-1.log
    echo "bar" > /tmp/small-2.log
    echo "foo bar" > /tmp/small-3.log
  2. Build & start Filebeat with the following configuration:

    filebeat.inputs:
    - type: filestream
      id: log-small-files-cannot-be-ingested
      enabled: true
      paths:
        - /tmp/small*.log
    
    output.discard:
      enabled: true
    
    logging:
      level: debug
      to_stderr: true
  3. Look for the following logs

    {
      "log.level": "debug",
      "@timestamp": "2025-06-12T13:41:17.034-0400",
      "log.logger": "scanner",
      "log.origin": {
        "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*fileScanner).GetFiles",
        "file.name": "filestream/fswatch.go",
        "file.line": 398
      },
      "message": "cannot start ingesting from file \"/tmp/small-1.log\": filesize of \"/tmp/small-1.log\" is 4 bytes, expected at least 1024 bytes for fingerprinting: file size is too small for ingestion",
      "service.name": "filebeat",
      "filestream_id": "log-small-files-cannot-be-ingested",
      "ecs.version": "1.6.0"
    }
    {
      "log.level": "debug",
      "@timestamp": "2025-06-12T13:41:17.034-0400",
      "log.logger": "scanner",
      "log.origin": {
        "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*fileScanner).GetFiles",
        "file.name": "filestream/fswatch.go",
        "file.line": 398
      },
      "message": "cannot start ingesting from file \"/tmp/small-2.log\": filesize of \"/tmp/small-2.log\" is 4 bytes, expected at least 1024 bytes for fingerprinting: file size is too small for ingestion",
      "service.name": "filebeat",
      "filestream_id": "log-small-files-cannot-be-ingested",
      "ecs.version": "1.6.0"
    }
    {
      "log.level": "debug",
      "@timestamp": "2025-06-12T13:41:17.034-0400",
      "log.logger": "scanner",
      "log.origin": {
        "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*fileScanner).GetFiles",
        "file.name": "filestream/fswatch.go",
        "file.line": 398
      },
      "message": "cannot start ingesting from file \"/tmp/small-3.log\": filesize of \"/tmp/small-3.log\" is 8 bytes, expected at least 1024 bytes for fingerprinting: file size is too small for ingestion",
      "service.name": "filebeat",
      "filestream_id": "log-small-files-cannot-be-ingested",
      "ecs.version": "1.6.0"
    }
    {
      "log.level": "warn",
      "@timestamp": "2025-06-12T13:41:17.034-0400",
      "log.logger": "scanner",
      "log.origin": {
        "function": "github.com/elastic/beats/v7/filebeat/input/filestream.(*fileScanner).GetFiles",
        "file.name": "filestream/fswatch.go",
        "file.line": 421
      },
      "message": "3 files are too small to be ingested, files need to be at least 1024 in size for ingestion to start. To change this behaviour set 'prospector.scanner.fingerprint.length' and 'prospector.scanner.fingerprint.offset'. Enable debug logging to see all file names.",
      "service.name": "filebeat",
      "filestream_id": "log-small-files-cannot-be-ingested",
      "ecs.version": "1.6.0"
    }

    They will repeat every scan of the file system (default is 10s)

Related issues

## Use cases
## Screenshots
## Logs

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Jun 11, 2025
@github-actions
Copy link
Contributor

🤖 GitHub comments

Expand to view the GitHub comments

Just comment with:

  • run docs-build : Re-trigger the docs validation. (use unformatted text in the comment!)

@mergify
Copy link
Contributor

mergify bot commented Jun 11, 2025

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit
  • backport-active-all is the label that automatically backports to all active branches.
  • backport-active-8 is the label that automatically backports to all active minor branches for the 8 major.
  • backport-active-9 is the label that automatically backports to all active minor branches for the 9 major.

@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Jun 11, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 11, 2025
@belimawr belimawr added needs_team Indicates that the issue/PR needs a Team:* label backport-active-all Automated backport with mergify to all the active branches labels Jun 11, 2025
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Jun 11, 2025
@belimawr belimawr marked this pull request as ready for review June 11, 2025 15:40
@belimawr belimawr requested a review from a team as a code owner June 11, 2025 15:40
@belimawr belimawr requested review from faec and leehinman June 11, 2025 15:40
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add the input id to the log message (maybe it is a field added to the logger?). If we had that, the user would know which input needs to be changed.

Right now, you have to switch to debug mode, and then you just get the file name, which means you have to work backwards from the globs to determine which input needs to be modified.

@belimawr
Copy link
Contributor Author

Would it be possible to add the input id to the log message (maybe it is a field added to the logger?). If we had that, the user would know which input needs to be changed.

Right now, you have to switch to debug mode, and then you just get the file name, which means you have to work backwards from the globs to determine which input needs to be modified.

Thanks Lee, that was a very good point. I added it on 76ccd18, but I don't like having to use the global logger in so many tests 😞. All the "testing loggers" I found in elastic-agent-libs/logp build upon the development logger that ends up logging everything 😭 .

I'll put this PR in draft and write a noop logger for elastic-agent-libs/logp.

@belimawr belimawr marked this pull request as draft June 11, 2025 21:16
@belimawr belimawr marked this pull request as ready for review June 12, 2025 16:47
@belimawr belimawr requested a review from a team as a code owner June 12, 2025 16:47
@belimawr belimawr requested a review from leehinman June 12, 2025 16:47
Copy link
Contributor

@leehinman leehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

One optional suggestion.

@belimawr belimawr merged commit b91d891 into elastic:main Jun 13, 2025
200 of 203 checks passed
@belimawr belimawr deleted the log-files-too-small branch June 13, 2025 15:30
@github-actions
Copy link
Contributor

@Mergifyio backport 8.17 8.18 8.19 9.0

@mergify
Copy link
Contributor

mergify bot commented Jun 13, 2025

backport 8.17 8.18 8.19 9.0

✅ Backports have been created

Details

@belimawr belimawr added backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches and removed backport-active-all Automated backport with mergify to all the active branches labels Jun 13, 2025
mergify bot pushed a commit that referenced this pull request Jun 13, 2025
(cherry picked from commit b91d891)

# Conflicts:
#	NOTICE.txt
#	go.mod
#	go.sum
mergify bot pushed a commit that referenced this pull request Jun 13, 2025
(cherry picked from commit b91d891)

# Conflicts:
#	NOTICE.txt
#	filebeat/input/filestream/prospector_creator.go
#	go.mod
#	go.sum
mergify bot pushed a commit that referenced this pull request Jun 13, 2025
mergify bot pushed a commit that referenced this pull request Jun 13, 2025
(cherry picked from commit b91d891)

# Conflicts:
#	NOTICE.txt
#	go.mod
#	go.sum
belimawr added a commit that referenced this pull request Jun 13, 2025
… level (#44811)

(cherry picked from commit b91d891)

# Conflicts:
#	NOTICE.txt
#	go.mod
#	go.sum

---------

Co-authored-by: Tiago Queiroz <[email protected]>
belimawr added a commit that referenced this pull request Jun 13, 2025
belimawr added a commit that referenced this pull request Jun 16, 2025
…n level (#44808)

# Conflicts:
#	NOTICE.txt
#	filebeat/input/filestream/prospector_creator.go
#	go.mod
#	go.sum

---------

Co-authored-by: Tiago Queiroz <[email protected]>
belimawr added a commit that referenced this pull request Jun 16, 2025
…n level (#44809)

# Conflicts:
#	NOTICE.txt
#	go.mod
#	go.sum

---------

Co-authored-by: Tiago Queiroz <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport-active-9 Automated backport with mergify to all the active 9.[0-9]+ branches Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants