Skip to content

Conversation

@yacovm
Copy link
Contributor

@yacovm yacovm commented Nov 14, 2025

Why this should be merged

The current threshold for free space are very low and as a result the warning may come too late (see #4517).

This commit changes them to something more reasonable.

How this works

Changes default thresholds

How this was tested

CI

Need to be documented in RELEASES.md?

Copilot AI review requested due to automatic review settings November 14, 2025 16:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR increases the default disk space thresholds for node health monitoring and shutdown. The changes raise the minimum required disk space from 0.5 GiB to 10 GiB (shutdown threshold) and the warning threshold from 1 GiB to 100 GiB.

Key changes:

  • Increase shutdown threshold from 0.5 GiB to 10 GiB
  • Increase warning threshold from 1 GiB to 100 GiB

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are certainly better than the current values.

I do wonder if we should be even more aggressive (or make them a percentage of the volume size rather than hard numbers).

But I'd be happy to merge this at the least

joshua-kim
joshua-kim previously approved these changes Nov 14, 2025
@joshua-kim joshua-kim enabled auto-merge November 14, 2025 17:12
}

flags.SetDefaults(FlagsMap{
config.SystemTrackerRequiredAvailableDiskSpaceKey: "1GB",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Github's runners don't have much disk space in them...

@yacovm yacovm force-pushed the changeLimits branch 2 times, most recently from db6ca47 to 361eeba Compare November 14, 2025 17:38
@yacovm
Copy link
Contributor Author

yacovm commented Nov 14, 2025

@maru-ava do you have any idea why the bootstrap monitor task fails in CI?

@maru-ava
Copy link
Contributor

@maru-ava do you have any idea why the bootstrap monitor task fails in CI?

I've created a new PR from this one that saves the kind logs as an artifact. I'm thinking those logs might be required to identify the problem.

Copy link
Contributor

@maru-ava maru-ava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR for which I enabled log collection reports the following FATAL (entry kind-control-plane/pods/bootstrap-test-e2e-5bs8z_avalanchego-node-cz6sv-0_31b6ce5d-f40a-4189-94d6-c0f7765a0300/avago/4.log in the artifact provided by the test run:

2025-11-14T18:36:31.211528364Z stdout F [11-14|18:36:31.211] FATAL node/node.go:1463 low on disk space. Shutting down... {"remainingDiskBytes": 6975516672}

This is consistent with my inline comments regarding the configuration changes you've proposed. In #4516 I also cleaned up configuration of the bootstrap monitor pods so that they start from the tmpnet configuration you've changed rather than being entirely distinct.

I recommend cherry-picking my commits from #4516 and reverting your change to ‎tests/fixture/bootstrapmonitor/e2e/e2e_test.go

@StephenButtolph StephenButtolph dismissed stale reviews from joshua-kim and themself November 14, 2025 19:20

CI failing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Eventually validators run out of disk space, shut down and become unrecoverable.

5 participants