Skip to content

Commit

Permalink
[JENKINS-72371] rework node monitor configuration (#8719)
Browse files Browse the repository at this point in the history
* [JENKINS-72371] rework node monitor configuration

This PR reworks the way node monitors are configured. It ensures that
also monitors which are set to ignored, can be configured.
Previously, there was a checkbox before each monitor, where the
behaviour was not totally clear. It meant that the monitor is ignored
but still collecting data and not disabled as one might think. But when
the monitor was disabled any configuration was lost and default values
were used.

* improve description

* fix formatting

* add since
  • Loading branch information
mawinter69 authored Dec 18, 2023
1 parent 7a2e389 commit 7258cad
Show file tree
Hide file tree
Showing 15 changed files with 115 additions and 33 deletions.
24 changes: 21 additions & 3 deletions core/src/main/java/hudson/model/ComputerSet.java
Original file line number Diff line number Diff line change
Expand Up @@ -352,15 +352,33 @@ public synchronized HttpResponse doConfigSubmit(StaplerRequest req) throws IOExc
BulkChange bc = new BulkChange(MONITORS_OWNER);
try {
Jenkins.get().checkPermission(Jenkins.MANAGE);
monitors.rebuild(req, req.getSubmittedForm(), getNodeMonitorDescriptors());
JSONObject json = req.getSubmittedForm();
monitors.rebuild(req, json, getNodeMonitorDescriptors());

// add in the rest of instances are ignored instances
for (Descriptor<NodeMonitor> d : NodeMonitor.all())
if (monitors.get(d) == null) {
for (Descriptor<NodeMonitor> d : NodeMonitor.all()) {
NodeMonitor monitor = monitors.get(d);
if (monitor == null) {

Check warning on line 361 in core/src/main/java/hudson/model/ComputerSet.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 361 is only partially covered, one branch is missing
NodeMonitor i = createDefaultInstance(d, true);
if (i != null)
monitors.add(i);
} else {

Check warning on line 365 in core/src/main/java/hudson/model/ComputerSet.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Not covered line

Line 365 is not covered by tests
/*
* Some monitors in plugins do not have a DataBoundConstructor
* but a Descriptor that overrides newInstance. For those the ignored
* field is not set, so we have to explicitly set it.
*/
String name = d.getJsonSafeClassName();
JSONObject o = json.optJSONObject(name);
boolean ignored = true;
if (o != null) {

Check warning on line 374 in core/src/main/java/hudson/model/ComputerSet.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 374 is only partially covered, one branch is missing
if (o.containsKey("ignored")) {

Check warning on line 375 in core/src/main/java/hudson/model/ComputerSet.java

View check run for this annotation

ci.jenkins.io / Code Coverage

Partially covered line

Line 375 is only partially covered, one branch is missing
ignored = o.getBoolean("ignored");
}
}
monitor.setIgnored(ignored);
}
}

// recompute the data
for (NodeMonitor nm : monitors) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,25 @@ protected AbstractNodeMonitorDescriptor() {
this(PERIOD);
}

/**
* Indicates if this monitor is capable to take agents offline in case it detects a problem.
* If true, this will enable the configuration option to ignore the monitor.
* Defaults to {@code true} so plugins that do not override this method behave as before.
* Plugins that do implement a monitor that will not take agents offline should override this
* method and return false.
*
* @return true if this monitor might take agents offline
* @since TODO
*/
public boolean canTakeOffline() {
return true;
}

@Override
public String getConfigPage() {
return getViewPage(clazz, "configure.jelly");
}

/**
* @deprecated as of 1.522
* Extend from {@link AbstractAsyncNodeMonitorDescriptor}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,8 @@
import hudson.remoting.Callable;
import java.io.IOException;
import jenkins.security.MasterToSlaveCallable;
import net.sf.json.JSONObject;
import org.jenkinsci.Symbol;
import org.kohsuke.stapler.DataBoundConstructor;
import org.kohsuke.stapler.StaplerRequest;

/**
* Discovers the architecture of the system to display in the agent list page.
Expand All @@ -60,8 +58,8 @@ public String getDisplayName() {
}

@Override
public NodeMonitor newInstance(StaplerRequest req, JSONObject formData) throws FormException {
return new ArchitectureMonitor();
public boolean canTakeOffline() {
return false;
}
}

Expand Down
12 changes: 5 additions & 7 deletions core/src/main/java/hudson/node_monitors/ClockMonitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,10 @@
import hudson.remoting.Callable;
import hudson.util.ClockDifference;
import java.io.IOException;
import net.sf.json.JSONObject;
import org.jenkinsci.Symbol;
import org.kohsuke.accmod.Restricted;
import org.kohsuke.accmod.restrictions.NoExternalUse;
import org.kohsuke.stapler.DataBoundConstructor;
import org.kohsuke.stapler.StaplerRequest;

/**
* {@link NodeMonitor} that checks clock of {@link Node} to
Expand Down Expand Up @@ -72,6 +70,11 @@ public DescriptorImpl() {
DESCRIPTOR = this;
}

@Override
public boolean canTakeOffline() {
return false;
}

@Override
protected Callable<ClockDifference, IOException> createCallable(Computer c) {
Node n = c.getNode();
Expand All @@ -84,10 +87,5 @@ protected Callable<ClockDifference, IOException> createCallable(Computer c) {
public String getDisplayName() {
return Messages.ClockMonitor_DisplayName();
}

@Override
public NodeMonitor newInstance(StaplerRequest req, JSONObject formData) throws FormException {
return new ClockMonitor();
}
}
}
2 changes: 2 additions & 0 deletions core/src/main/java/hudson/node_monitors/NodeMonitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import hudson.util.DescriptorList;
import java.util.List;
import jenkins.model.Jenkins;
import org.kohsuke.stapler.DataBoundSetter;
import org.kohsuke.stapler.export.Exported;
import org.kohsuke.stapler.export.ExportedBean;

Expand Down Expand Up @@ -133,6 +134,7 @@ public boolean isIgnored() {
return ignored;
}

@DataBoundSetter
public void setIgnored(boolean ignored) {
this.ignored = ignored;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -34,10 +34,8 @@
import java.util.Map;
import java.util.logging.Logger;
import jenkins.security.MasterToSlaveCallable;
import net.sf.json.JSONObject;
import org.jenkinsci.Symbol;
import org.kohsuke.stapler.DataBoundConstructor;
import org.kohsuke.stapler.StaplerRequest;
import org.kohsuke.stapler.export.Exported;
import org.kohsuke.stapler.export.ExportedBean;

Expand Down Expand Up @@ -103,11 +101,6 @@ protected Map<Computer, Data> monitor() throws InterruptedException {
public String getDisplayName() {
return Messages.ResponseTimeMonitor_DisplayName();
}

@Override
public NodeMonitor newInstance(StaplerRequest req, JSONObject formData) throws FormException {
return new ResponseTimeMonitor();
}
}

private static final class Step1 extends MasterToSlaveCallable<Data, IOException> {
Expand Down
12 changes: 5 additions & 7 deletions core/src/main/java/hudson/node_monitors/SwapSpaceMonitor.java
Original file line number Diff line number Diff line change
Expand Up @@ -33,12 +33,10 @@
import java.io.IOException;
import jenkins.model.Jenkins;
import jenkins.security.MasterToSlaveCallable;
import net.sf.json.JSONObject;
import org.jenkinsci.Symbol;
import org.jvnet.hudson.MemoryMonitor;
import org.jvnet.hudson.MemoryUsage;
import org.kohsuke.stapler.DataBoundConstructor;
import org.kohsuke.stapler.StaplerRequest;
import org.kohsuke.stapler.export.Exported;
import org.kohsuke.stapler.export.ExportedBean;

Expand Down Expand Up @@ -103,6 +101,11 @@ public DescriptorImpl() {
DESCRIPTOR = this;
}

@Override
public boolean canTakeOffline() {
return false;
}

@Override
protected MonitorTask createCallable(Computer c) {
return new MonitorTask();
Expand All @@ -113,11 +116,6 @@ protected MonitorTask createCallable(Computer c) {
public String getDisplayName() {
return Messages.SwapSpaceMonitor_DisplayName();
}

@Override
public NodeMonitor newInstance(StaplerRequest req, JSONObject formData) throws FormException {
return new SwapSpaceMonitor();
}
}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ THE SOFTWARE.
<f:form method="post" action="configSubmit" name="config">
<f:descriptorList title="${%Preventive Node Monitoring}"
descriptors="${it.nodeMonitorDescriptors}"
instances="${it.nonIgnoredMonitors}" />
instances="${it.monitors}"
forceRowSet="true"/>

<l:hasAdministerOrManage>
<f:bottomButtonBar>
Expand Down
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
description=Jenkins monitors each attached node for disk space, free temp space, free swap, clock time/sync, and response time. Nodes will be taken offline if any of these values go outside of the configured threshold.
description=Jenkins monitors each attached node for various metrics like free disk space, free temp space, free swap, clock time/sync, response time and others provided by plugins. \
Some of these monitors are able to mark nodes temporarily offline, when the values go outside the configured thresholds.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<div>
This monitor just shows the architecture of the agent for your information. It
never marks the agent offline.
never marks an agent offline.
</div>
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
systems, version control activities and distributed file access (such as NFS,
Windows file shares) tend to have strange problems when systems involved have
different clocks.
<br />
It never marks an agent offline.

<p>To keep clocks in sync, refer to NTP.</p>
</div>
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
<?jelly escape-by-default='true'?>
<j:jelly xmlns:j="jelly:core" xmlns:st="jelly:stapler" xmlns:f="/lib/form">
<j:set var="help" value="${descriptor.helpFile}"/>
<div class="jenkins-form-label help-sibling tr">
${descriptor.displayName}
<f:helpLink url="${help}" featureName="${descriptor.displayName}"/>
</div>
<f:helpArea />
<div class="optionalBlock-container jenkins-form-item">
<div class="form-container">
<j:if test="${descriptor.canTakeOffline()}">
<f:entry title="${%Don't mark agents temporarily offline}" field="ignored">
<f:checkbox checked="${instance.ignored}" />
</f:entry>
</j:if>
<j:if test="${!descriptor.canTakeOffline()}">
<f:invisibleEntry>
<f:checkbox name="ignored" checked="false" />
</f:invisibleEntry>
</j:if>
<st:include page="config.jelly" class="${descriptor.clazz}" optional="true"/>
</div>
</div>
</j:jelly>
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
<div>
When checked the agent will
<i>not</i>
be marked temporarily offline in case the monitor detects a problem. But the
monitor will still run to collect data and the corresponding field will be
highlighted in case of a problem.
</div>
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
<div>
This monitors the round trip network response time from the controller to the
agent, and if it goes above a threshold repeatedly, it marks the agent
offline.
agent, and if it goes above a threshold repeatedly, it disconnects the agent.

<p>
This is useful for detecting unresponsive agents, or other network problems
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<div>
This monitors the available virtual memory (also known as
<code>swap space</code>
).
<p>The exact definition of swap space is platform dependent.</p>
<ul>
<li>
On Windows this is the available space of the page file. As Windows can
automatically increase the page file size, this value doesn't mean much.
</li>
<li>
On Linux this value is retrieved from
<code>/proc/meminfo</code>
.
</li>
<li>
For other Unix systems, the value is obtained from the
<code>top</code>
command.
</li>
</ul>
</div>

0 comments on commit 7258cad

Please sign in to comment.