Skip to content

[Osquery] Cypress - Add SHA512 integrity validation for cached agent#258842

Merged
tomsonpl merged 8 commits intoelastic:mainfrom
tomsonpl:fix-corrupted-agent-file
Mar 23, 2026
Merged

[Osquery] Cypress - Add SHA512 integrity validation for cached agent#258842
tomsonpl merged 8 commits intoelastic:mainfrom
tomsonpl:fix-corrupted-agent-file

Conversation

@tomsonpl
Copy link
Copy Markdown
Contributor

@tomsonpl tomsonpl commented Mar 20, 2026

Summary

Adds SHA512 integrity validation for cached Elastic Agent downloads used by Defend Workflows Cypress tests. Corrupt or truncated tarballs are now automatically detected and re-downloaded, preventing persistent CI failures.

Problem

The Defend Workflows Cypress tests (cy.task('createEndpointHost')) provision Vagrant VMs with Elastic Agent. The agent tarball is pre-downloaded during CI image build and cached at ~/.kibanaSecuritySolutionCliTools/agent_download_storage/.

Example
https://buildkite.com/elastic/kibana-on-merge/builds/91361#019d0abf-4e4e-43e2-a693-44b1023278b4/L2306-L2428

There was zero integrity validation anywhere in this chain. If a download was truncated (network hiccup, CDN issue), the partial file was cached and reused for up to 2 days, causing every test run on that CI agent to fail with:

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

The Elastic artifacts API already exposes sha_url alongside url in search responses, but it was completely ignored.

Changes

  • SHA512 validation after download (agent_downloads_service.ts): fetch expected hash from artifacts API sha_url, compute SHA512 of downloaded file, compare. On mismatch, retry (up to 3 attempts). On sha_url fetch failure, proceed without validation (best-effort — don't block CI if hash infra is down).

  • Cache integrity validation on reuse (agent_downloads_service.ts): cached files now require a .sha512 sidecar file. On cache hit, re-compute hash and compare with sidecar. If mismatch or missing sidecar → delete corrupt file and re-download. This is the key self-healing behavior.

  • SHA URL propagation (fleet_services.ts): getAgentDownloadUrl() now returns shaUrl from the artifacts API response. All callers updated to pass it through: agent_downloader CLI (packer cache), create_and_enroll_endpoint_host_ci.ts (Cypress), enrollHostVmWithFleet().

  • Vagrantfile defense-in-depth: added gzip -t check before tar -zxf extraction. Catches any corruption that slips through (e.g., SCP transfer issue). Fails with "Agent tarball integrity check failed" instead of cryptic tar errors.

  • Sidecar cleanup: .sha512 files are cleaned up alongside their parent tarballs during the existing cache TTL cleanup.

How it works

Before (no validation):
  cached file exists? ──yes──► use as-is ──► tar -zxf ──► 💥 Unexpected EOF

After (self-healing):
  cached file exists? ──yes──► has .sha512 sidecar? ──no──► delete, re-download
                                       │
                                      yes
                                       │
                                       ▼
                               compute hash, compare
                                       │
                              match? ──yes──► use it ✓
                                       │
                                      no
                                       │
                                       ▼
                               delete, re-download ──► validate ──► cache ✓

Test plan

  • agent_downloads_service.test.ts — 14 tests covering:
    • Fresh download without cache
    • Cache hit with valid sidecar hash
    • Cache hit with invalid hash → re-download
    • Cache hit with missing sidecar → re-download
    • Hash validation after fresh download with shaUrl
    • All retry attempts fail hash validation → error
    • sha_url fetch fails → proceed without validation
    • Sidecar file created on successful download
    • Sidecar file deleted during cleanup
    • isAgentDownloadFromDiskAvailable with/without sidecar
    • fetchExpectedHash parsing and error handling
  • agent_downloader.test.ts — 8 tests verify shaUrl passthrough in all scenarios
  • All 22 tests pass

@tomsonpl tomsonpl self-assigned this Mar 20, 2026
@tomsonpl
Copy link
Copy Markdown
Contributor Author

/ci

@tomsonpl tomsonpl added release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution backport:version Backport to applied version labels v9.4.0 v9.3.3 v9.2.8 v8.19.14 labels Mar 20, 2026
@tomsonpl tomsonpl marked this pull request as ready for review March 20, 2026 13:30
@tomsonpl tomsonpl requested a review from a team as a code owner March 20, 2026 13:30
@tomsonpl tomsonpl requested review from gergoabraham and pzl March 20, 2026 13:30
@elasticmachine
Copy link
Copy Markdown
Contributor

Pinging @elastic/security-defend-workflows (Team:Defend Workflows)

@tomsonpl tomsonpl requested review from patrykkopycinski and szwarckonrad and removed request for gergoabraham March 20, 2026 14:14
Copy link
Copy Markdown
Contributor

@szwarckonrad szwarckonrad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things I noticed:


computeFileHash will hang at runtime

await finished(stream.pipe(hash)) never resolves because Hash is a Transform stream and nobody reads from its readable side, so finished() waits forever. Tests pass because createHash is fully mocked. I verified locally on Node 22, it hangs.

Quick fix:

export const computeFileHash = async (filePath: string): Promise<string> => {
  const hash = createHash('sha512');
  const stream = fs.createReadStream(filePath);
  for await (const chunk of stream) {
    hash.update(chunk);
  }
  return hash.digest('hex');
};

Cache breaks when the SHA URL is down

If fetchExpectedHash fails, expectedHash stays undefined, so no sidecar file gets written. Next run sees the file but no sidecar, treats it as unavailable, and re-downloads. This loops forever while the hash endpoint is unreachable, effectively disabling the cache.

One option: compute the local hash after download and store it as the sidecar regardless, so at least disk-level corruption is caught on reuse even without remote validation.

@elasticmachine
Copy link
Copy Markdown
Contributor

elasticmachine commented Mar 23, 2026

⏳ Build in-progress, with failures

Failed CI Steps

History

cc @tomsonpl

@tomsonpl tomsonpl requested a review from szwarckonrad March 23, 2026 12:04
@tomsonpl
Copy link
Copy Markdown
Contributor Author

Thanks @szwarckonrad, my implementation wasnt full without your suggestion. Big thank you! Fixed now :)

@tomsonpl tomsonpl merged commit 8cf183e into elastic:main Mar 23, 2026
18 checks passed
@kibanamachine
Copy link
Copy Markdown
Contributor

Starting backport for target branches: 8.19, 9.2, 9.3

https://github.com/elastic/kibana/actions/runs/23451616679

@kibanamachine
Copy link
Copy Markdown
Contributor

💔 All backports failed

Status Branch Result
8.19 Backport failed because of merge conflicts
9.2 Backport failed because of merge conflicts
9.3 Backport failed because of merge conflicts

Manual backport

To create the backport manually run:

node scripts/backport --pr 258842

Questions ?

Please refer to the Backport tool documentation

@kibanamachine kibanamachine added the backport missing Added to PRs automatically when the are determined to be missing a backport. label Mar 25, 2026
@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

13 similar comments
@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

@kibanamachine
Copy link
Copy Markdown
Contributor

Friendly reminder: Looks like this PR hasn’t been backported yet.
To create automatically backports add a backport:* label or prevent reminders by adding the backport:skip label.
You can also create backports manually by running node scripts/backport --pr 258842 locally
cc: @tomsonpl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport missing Added to PRs automatically when the are determined to be missing a backport. backport:version Backport to applied version labels release_note:skip Skip the PR/issue when compiling release notes Team:Defend Workflows “EDR Workflows” sub-team of Security Solution v8.19.14 v9.2.8 v9.3.3 v9.4.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants