Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JENKINS-72009, JENKINS-72200, JENKINS-24947] various fixes and improvements around disk space monitoring #8593

Merged
merged 17 commits into from
Nov 24, 2023

Conversation

mawinter69
Copy link
Contributor

@mawinter69 mawinter69 commented Oct 10, 2023

JENKINS-24947:

  • Introduce a NodeProperty to define agent specific thresholds for free disk and temp space.

JENKINS-72009

  • Enhance the tooltip in the executor widget so it includes the threshold (also fixes a wrong conversion)
  • Show a tooltip in the nodes table when agent was marked offline or has a warning about low diskspace
  • Use binary prefixes when showing the disk space (GB -> GiB)
  • allow to configure also a warning level
  • Change text of configure button to make it clearer what it does
  • Store totalsize to so we can show it in tooltips
  • Show a tooltip also when all is good that contains total size and the path

JENKINS-72200:

  • ensure an agent is taken offline/online due to diskspace when the background nodemonitor runs.

an admin monitor for disk space warnings is very hard to implement, so skipping for now

support-core and metrics plugin should be adjusted to reflect agent individual thresholds.

image

See JENKINS-72009
JENKINS-72200
JENKINS-24947

Testing done

  • Manual testing as described in JENKINS-72200
  • Configure node properties for agents and verify they are taken offline or show a warning

Proposed changelog entries

  • Allow configuration of disk thresholds globally and for each agent. Improve warning when disk space is too low. Ensure agents are taken offline when disk space is low.

Proposed upgrade guidelines

N/A

Submitter checklist

Desired reviewers

N/A

Before the changes are marked as ready-for-merge:

Maintainer checklist

Introduces a NodeProperty to define agent specific thresholds for free
disk and temp space.
Enhance the tooltip in the executor widget so it includes the threshold
Show a tooltip in the nodes table when agent was marked offline

todo: add a warning threshold
@mawinter69 mawinter69 changed the title [JENKINS-72009] allow agents to define disk space monitoring thresholds [JENKINS-72009, JENKINS-72200, JENKINS-24947] various fixes and improvements around disk space monitoring Oct 18, 2023
@mawinter69 mawinter69 marked this pull request as ready for review October 18, 2023 13:11
@comment-ops-bot
Copy link

I wasn't able to add the following labels: needs-more-review

Check that the label exists and is spelt right then try again.

@abhishekmaity
Copy link
Contributor

/label needs-more-reviews

@comment-ops-bot comment-ops-bot bot added the needs-more-reviews Complex change, which would benefit from more eyes label Oct 18, 2023
@abhishekmaity
Copy link
Contributor

@jenkinsci/core-pr-reviewers

@NotMyFault NotMyFault removed the needs-more-reviews Complex change, which would benefit from more eyes label Oct 19, 2023
@NotMyFault NotMyFault requested a review from a team October 19, 2023 22:02
@NotMyFault NotMyFault added the needs-security-review Awaiting review by a security team member label Oct 19, 2023
Copy link
Contributor

@Kevin-CB Kevin-CB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine from a security perspective

Comment on lines 200 to 202
* Returns the HTML representation of the space.
*/
public String toHtml() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not accurate anymore, it will never return HTML

@Kevin-CB Kevin-CB removed the needs-security-review Awaiting review by a security team member label Oct 26, 2023
@Kevin-CB Kevin-CB added the security-approved @jenkinsci/core-security-review reviewed this PR for security issues label Oct 26, 2023
@NotMyFault NotMyFault added web-ui The PR includes WebUI changes which may need special expertise rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted labels Oct 26, 2023
@NotMyFault NotMyFault requested review from a team October 26, 2023 20:40
Copy link
Contributor

@MarkEWaite MarkEWaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like the direction this is moving very much. Interactive testing found no issues. Thanks!

@MarkEWaite MarkEWaite self-requested a review November 19, 2023 19:29
@MarkEWaite
Copy link
Contributor

Interactive testing showed that all the settings worked very well. I confirmed that the settings are visible through configuration as code and that agents are taken offline when the thresholds are exceeded.

Copy link
Contributor

@MarkEWaite MarkEWaite left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much!

@MarkEWaite MarkEWaite self-assigned this Nov 19, 2023
mawinter69 added a commit to mawinter69/versioncolumn-plugin that referenced this pull request Nov 19, 2023
[JENKINS-72284] when an agent with wrong java version connects, it will
be taken offline right away. Before the agent was only taken offline
when one visited the /computer page.
Use monitor specific offline causes that inherit from
MonitorOfflineCause. This allows to take the agent automatically online
again, after it was restarted with the correct java/remoting version.
Also with jenkinsci/jenkins#8618 it would then
reset the Admin monitor if that was the only reason that it fired.

Requires jenkinsci/jenkins#8593 that made 2
mthods public.
@MarkEWaite MarkEWaite requested a review from Kevin-CB November 22, 2023 21:43
Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/label ready-for-merge


This PR is now ready for merge, after ~24 hours, we will merge it if there's no negative feedback.

Thanks!

@comment-ops-bot comment-ops-bot bot added the ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback label Nov 23, 2023
@timja timja merged commit aad79ef into jenkinsci:master Nov 24, 2023
17 checks passed
MarkEWaite added a commit to jenkinsci/versioncolumn-plugin that referenced this pull request Feb 9, 2024
* improve how agents are token offline and online

[JENKINS-72284] when an agent with wrong java version connects, it will
be taken offline right away. Before the agent was only taken offline
when one visited the /computer page.
Use monitor specific offline causes that inherit from
MonitorOfflineCause. This allows to take the agent automatically online
again, after it was restarted with the correct java/remoting version.
Also with jenkinsci/jenkins#8618 it would then
reset the Admin monitor if that was the only reason that it fired.

Requires jenkinsci/jenkins#8593 that made 2
mthods public.

* fix spotless

* order imports

* update core version

* remove disconnect

* Spotless format

* keep disconnect methods for casc backwards compatibility

* keep disconnect help for CasC

* spotless

---------

Co-authored-by: Mark Waite <[email protected]>
Comment on lines -27 to +29
this.freeSpaceThreshold = "1GB";
this.freeSpaceThreshold = "1GiB";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this is one of the unusual cases where a settings format change is not forward-compatible. So if you run a controller from a version after this PR and then downgrade to a version prior to it and try to save monitor config, you will get an error

java.lang.NumberFormatException: For input string: "1GI"
	at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:2054)
	at java.base/jdk.internal.math.FloatingDecimal.parseDouble(FloatingDecimal.java:110)
	at java.base/java.lang.Double.parseDouble(Double.java:792)
	at hudson.node_monitors.DiskSpaceMonitorDescriptor$DiskSpace.parse(DiskSpaceMonitorDescriptor.java:163)
	at hudson.node_monitors.AbstractDiskSpaceMonitor.<init>(AbstractDiskSpaceMonitor.java:23)
	at hudson.node_monitors.DiskSpaceMonitor.<init>(DiskSpaceMonitor.java:49)
	at …
Caused: java.lang.IllegalArgumentException: Failed to instantiate class hudson.node_monitors.DiskSpaceMonitor from {"freeSpaceThreshold":"1GiB"}
	at org.kohsuke.stapler.RequestImpl$TypePair.convertJSON(RequestImpl.java:771)
	at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:551)
	at org.kohsuke.stapler.RequestImpl.bindJSON(RequestImpl.java:546)
	at hudson.model.Descriptor.bindJSON(Descriptor.java:623)
	at hudson.model.Descriptor.newInstance(Descriptor.java:593)
Caused: java.lang.LinkageError: Failed to instantiate class hudson.node_monitors.DiskSpaceMonitor from {"freeSpaceThreshold":"1GiB"}
	at hudson.model.Descriptor.newInstance(Descriptor.java:596)
	at hudson.util.DescribableList.rebuild(DescribableList.java:183)
	at hudson.model.ComputerSet.doConfigSubmit(ComputerSet.java:355)
	at …

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge The PR is ready to go, and it will be merged soon if there is no negative feedback rfe For changelog: Minor enhancement. use `major-rfe` for changes to be highlighted security-approved @jenkinsci/core-security-review reviewed this PR for security issues web-ui The PR includes WebUI changes which may need special expertise
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants