Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,7 @@ The following sets of tools are available (toolsets marked with ✓ in the Defau
| config | View and manage the current local Kubernetes configuration (kubeconfig) | ✓ |
| core | Most common tools for Kubernetes management (Pods, Generic Resources, Events, etc.) | ✓ |
| kcp | Manage kcp workspaces and multi-tenancy features | |
| openshift | OpenShift-specific tools for cluster management and troubleshooting, check the [OpenShift documentation](docs/OPENSHIFT.md) for more details. | |
| ossm | Most common tools for managing OSSM, check the [OSSM documentation](https://github.com/openshift/openshift-mcp-server/blob/main/docs/OSSM.md) for more details. | |
| kubevirt | KubeVirt virtual machine management tools | |
| observability | Cluster observability tools for querying Prometheus metrics and Alertmanager alerts | ✓ |
Expand Down Expand Up @@ -696,6 +697,25 @@ Common use cases:

</details>

<details>

<summary>openshift</summary>

- **plan_mustgather** - Plan for collecting a must-gather archive from an OpenShift cluster, must-gather is a tool for collecting cluster data related to debugging and troubleshooting like logs, kubernetes resources, etc.
- `node_name` (`string`) - Optional node to run the mustgather pod. If not provided, a random control-plane node will be selected automatically
- `node_selector` (`string`) - Optional node label selector to use, only relevant when specifying a command and image which needs to capture data on a set of cluster nodes simultaneously
- `host_network` (`boolean`) - Optionally run the must-gather pods in the host network of the node. This is only relevant if a specific gather image needs to capture host-level data
- `gather_command` (`string`) - Optionally specify a custom gather command to run a specialized script, eg. /usr/bin/gather_audit_logs (default: /usr/bin/gather)
- `all_component_images` (`boolean`) - Optional when enabled, collects and runs multiple must gathers for all operators and components on the cluster that have an annotated must-gather image available
- `images` (`array`) - Optional list of images to use for gathering custom information about specific operators or cluster components. If not specified, OpenShift's default must-gather image will be used by default
- `source_dir` (`string`) - Optional to set a specific directory where the pod will copy gathered data from (default: /must-gather)
- `timeout` (`string`) - Timeout of the gather process eg. 30s, 6m20s, or 2h10m30s
- `namespace` (`string`) - Optional to specify an existing privileged namespace where must-gather pods should run. If not provided, a temporary namespace will be created
- `keep_resources` (`boolean`) - Optional to retain all temporary resources when the mustgather completes, otherwise temporary resources created will be advised to be cleaned up
- `since` (`string`) - Optional to collect logs newer than a relative duration like 5s, 2m5s, or 3h6m10s. If unspecified, all available logs will be collected

</details>


<!-- AVAILABLE-TOOLSETS-TOOLS-END -->

Expand Down
219 changes: 219 additions & 0 deletions docs/OPENSHIFT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
# OpenShift Toolset

This toolset provides OpenShift-specific prompts for cluster management and troubleshooting.

## Prompts

### plan_mustgather
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this seems more like an MCP prompt than an MCP tool. See for example

func initHealthChecks() []api.ServerPrompt {
return []api.ServerPrompt{
{
Prompt: api.Prompt{
Name: "cluster-health-check",
Title: "Cluster Health Check",
Description: "Perform comprehensive health assessment of Kubernetes/OpenShift cluster",
Arguments: []api.PromptArgument{
{
Name: "namespace",
Description: "Optional namespace to limit health check scope (default: all namespaces)",
Required: false,
},
{
Name: "check_events",
Description: "Include recent warning/error events (true/false, default: true)",
Required: false,
},
},
},
Handler: clusterHealthCheckHandler,
},
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our thinking here was while it does guide a workflow, the complexity of the parameters makes it better suited as a tool rather than a prompt - @swghosh did we investigate this route?

Copy link
Member Author

@swghosh swghosh Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time of initially writing this PR, the upstream MCP server lacked support for Prompts so we ended up using the tools approach.

Also, per what we've had comprehended earlier: prompts are mainly static description/instructions to guide the agent in different things; unlike the health_check example shared it seems we can have a fully-dynamic prompt with params support generated by the MCP server to print full yamls (which is pretty much what we need in the planning). It sounds reasonable to investigate the agent flow by flipping the Tools -> Prompt assuming we can print the same text blurb in the current tool response.

Copy link
Member Author

@swghosh swghosh Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds reasonable to investigate the agent flow by flipping the Tools -> Prompt

IMO one concern comes to my mind, OpenShift Lightspeed being one of the primary agent's we're targetting for this use case probably does not support MCP prompts at this time (only tools).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, worth to raise this there, for supporting prompts? They are part of the mcp spec.

There is a bit of similarity on what @Cali0707 shared for the "health", as being a ServerPrompt


Plan for collecting a must-gather archive from an OpenShift cluster. Must-gather is a tool for collecting cluster data related to debugging and troubleshooting like logs, Kubernetes resources, and more.

This prompt generates YAML manifests for the must-gather resources that can be applied to the cluster.

**Arguments:**
- `node_name` (optional) - Specific node name to run must-gather pod on
- `node_selector` (optional) - Node selector in `key=value,key2=value2` format to filter nodes for the pod
- `source_dir` (optional) - Custom gather directory inside pod (default: `/must-gather`)
- `namespace` (optional) - Privileged namespace to use for must-gather (auto-generated if not specified)
- `gather_command` (optional) - Custom gather command e.g. `/usr/bin/gather_audit_logs` (default: `/usr/bin/gather`)
- `timeout` (optional) - Timeout duration for gather command (e.g., `30m`, `1h`)
- `since` (optional) - Only gather data newer than this duration (e.g., `5s`, `2m5s`, or `3h6m10s`), defaults to all data
- `host_network` (optional) - Use host network for must-gather pod (`true`/`false`)
- `keep_resources` (optional) - Keep pod resources after collection (`true`/`false`, default: `false`)
- `all_component_images` (optional) - Include must-gather images from all installed operators (`true`/`false`)
- `images` (optional) - Comma-separated list of custom must-gather container images

**Example:**
```
# Basic must-gather collection
{}

# Collect with custom timeout and since
{
"timeout": "30m",
"since": "1h"
}

# Collect from all component images
{
"all_component_images": "true"
}

# Collect from specific operator image
{
"images": "registry.redhat.io/openshift-logging/cluster-logging-rhel9-operator@sha256:..."
}
```

## Enable the OpenShift Toolset

### Option 1: Command Line

```bash
kubernetes-mcp-server --toolsets core,config,helm,openshift
```

### Option 2: Configuration File

```toml
toolsets = ["core", "config", "helm", "openshift"]
```

### Option 3: MCP Client Configuration

```json
{
"mcpServers": {
"kubernetes": {
"command": "npx",
"args": ["-y", "kubernetes-mcp-server@latest", "--toolsets", "core,config,helm,openshift"]
}
}
}
```

## Prerequisites

The OpenShift toolset requires:

1. **OpenShift cluster** - These prompts are designed for OpenShift and automatically detect the cluster type
2. **Proper RBAC** - The user/service account must have permissions to:
- Create namespaces
- Create service accounts
- Create cluster role bindings
- Create pods with privileged access
- List ClusterOperators and ClusterServiceVersions (for `all_component_images`)

## How It Works

### Must-Gather Plan Generation

The `plan_mustgather` prompt generates YAML manifests for collecting diagnostic data from an OpenShift cluster:

1. **Namespace** - A temporary namespace (e.g., `openshift-must-gather-xyz`) is created unless an existing namespace is specified
2. **ServiceAccount** - A service account with cluster-admin permissions is created for the must-gather pod
3. **ClusterRoleBinding** - Binds the service account to the cluster-admin role
4. **Pod** - Runs the must-gather container(s) with the specified configuration

### Component Image Discovery

When `all_component_images` is enabled, the prompt discovers must-gather images from:
- **ClusterOperators** - Looks for the `operators.openshift.io/must-gather-image` annotation
- **ClusterServiceVersions** - Checks OLM-installed operators for the same annotation

### Multiple Images Support

Up to 8 gather images can be run concurrently. Each image runs in a separate container within the same pod, sharing the output volume.

## Common Use Cases

### Basic Cluster Diagnostics

Collect general cluster diagnostics:
```json
{}
```

### Audit Logs Collection

Collect audit logs with a custom gather command:
```json
{
"gather_command": "/usr/bin/gather_audit_logs",
"timeout": "2h"
}
```

### Recent Logs Only

Collect logs from the last 30 minutes:
```json
{
"since": "30m"
}
```

### Specific Operator Diagnostics

Collect diagnostics for a specific operator:
```json
{
"images": "registry.redhat.io/openshift-logging/cluster-logging-rhel9-operator@sha256:..."
}
```

### Host Network Access

For gather scripts that need host-level network access:
```json
{
"host_network": "true"
}
```

### All Component Diagnostics

Collect diagnostics from all operators with must-gather images:
```json
{
"all_component_images": "true",
"timeout": "1h"
}
```

## Troubleshooting

### Permission Errors

If you see permission warnings, ensure your user has the required RBAC permissions:
```bash
oc auth can-i create namespaces
oc auth can-i create clusterrolebindings
oc auth can-i create pods --as=system:serviceaccount:openshift-must-gather-xxx:must-gather-collector
```

### Pod Not Starting

Check if the node has enough resources and can pull the must-gather image:
```bash
oc get pods -n openshift-must-gather-xxx
oc describe pod <pod-name> -n openshift-must-gather-xxx
```

### Timeout Issues

For large clusters or audit log collection, increase the timeout:
```json
{
"timeout": "2h"
}
```

### Image Pull Errors

Ensure the must-gather image is accessible:
```bash
oc get secret -n openshift-config pull-secret
```

## Security Considerations

### Privileged Access

The must-gather pods run with:
- `cluster-admin` ClusterRoleBinding
- `system-cluster-critical` priority class
- Tolerations for all taints
- Optional host network access

### Temporary Resources

By default, all created resources (namespace, service account, cluster role binding) should be cleaned up after the must-gather collection is complete. Use `"keep_resources": "true"` to retain them for debugging.

### Image Sources

The prompt uses these default images:
- **Must-gather**: `registry.redhat.io/openshift4/ose-must-gather:latest`
- **Wait container**: `registry.redhat.io/ubi9/ubi-minimal`

Custom images should be from trusted sources.
Loading