feat(kubevirt): add troubleshoot action to vm_lifecycle tool by ksimon1 · Pull Request #653 · containers/kubernetes-mcp-server

ksimon1 · 2026-01-15T10:17:15Z

Add a new "troubleshoot" action to the vm_lifecycle tool that generates a step-by-step troubleshooting guide for diagnosing VirtualMachine issues.

The troubleshoot action:

Renders a diagnostic plan for VMs
Guides the AI through checking VM, VMI, DataVolumes, PVCs, pods, and events
Includes a summary template for reporting findings

This helps AI assistants systematically diagnose VM issues by providing structured instructions on which MCP tools to use and how to interpret the results.
Original credit goes to @lyarwood

Code was assisted by Cursor AI

Signed-off-by: Karel Simon ksimon@redhat.com

ksimon1 · 2026-01-15T11:01:14Z

@lyarwood, @Cali0707, @manusa can you please review this PR?

lyarwood

Looks like a good start but I think this is going to need to be rebased on #626 with evals written for various scenarios before we can merge. @ksimon1 if you agree would you mind trying it and reviewing #626?

I'm also concerned by the actual impact on token use rendered plans like this might have. IIRC we don't have a way of measuring this clearly through gevals yet but I wouldn't be surprised if we ended up asking for plans like this to be converted into code based solutions with simple returns to the calling agent and model.

ksimon1 · 2026-01-15T13:36:52Z

/hold

codingben · 2026-01-15T15:25:58Z

pkg/toolsets/kubevirt/vm/lifecycle/tool.go

 		return api.NewToolCallResult("", err), nil
 	}

-	dynamicClient := params.DynamicClient()


What is the reason for removing it from here and putting the dynamicClient in each switch-case?

Cali0707 · 2026-01-21T17:19:21Z

IIRC we don't have a way of measuring this clearly through gevals yet

This is on the roadmap, but not implemented yet (sorry 😞 )

Cali0707 · 2026-01-21T17:20:45Z

pkg/toolsets/kubevirt/vm/lifecycle/tool.go

 		}
 		message = "# VirtualMachine restarted successfully\n"

+	case ActionTroubleshoot:


IMO this is more of a MCP prompt than a tool. We have added support for toolsets to define their own prompts (see https://github.com/containers/kubernetes-mcp-server/blob/main/docs/PROMPTS.md#toolset-prompts)

TIL looks like this dropped while I was out, thanks!

@ksimon1 are you able to rework this to use a prompt? Would be good to rebase and land this in the next 2 weeks with some basic eval coverage.

pkg/toolsets/kubevirt/vm_troubleshoot.go

README.md

ksimon1 · 2026-01-27T14:51:11Z

@lyarwood, @Cali0707, @manusa can you please review this PR?

ksimon1 · 2026-01-28T08:43:46Z

/retest

manusa · 2026-01-28T14:10:43Z

pkg/toolsets/kubevirt/vm_troubleshoot.go

+Follow these steps to diagnose issues with the VirtualMachine:
+
+## Step 1: Check VirtualMachine Status
+Use resources_get with apiVersion=kubevirt.io/v1, kind=VirtualMachine, namespace=%s, name=%s


Some ideas that float into my head that might or might not apply:

The prompt could probably benefit from some dynamic content injection (as defined in Claude Skills).

The idea here would be to execute the relevant queries ourselves and injecting the desired result, instead of delegating the task to the model and avoiding the extra round-trips and overhead. (I believe this is a pattern we're already following with the cluster-health-check prompt).

I would use this dynamic content injection at least for those requests that we know beforehand that will have to be performed by the model.

Regarding token usage, I understand the initial toll of this pattern should be higher, but it should compensate the extra overhead and roundtrips the model+agent would need to complete the actual troubleshooting task.

Thoughts?

IIUC you mean to prepopulate the prompt text with basic information like vm/vmi manifest, pod description, ...?

@manusa would you please review the new code, which injects the content to the prompt?

IIUC you mean to prepopulate the prompt text with basic information like vm/vmi manifest, pod description, ...?

Yes, exactly

@manusa would you please review the new code, which injects the content to the prompt?

Sorry, last Friday I was completely focused on the code mode feature and demo. Let me check this now.

I'm not sure how the compiled prompt looks like, but you're definitely doing what I was suggesting.
This should prevent the LLM from doing multiple roundtrips which we know beforehand it's going to try because we were instructing it to do so.

lyarwood · 2026-01-29T12:25:28Z

evals/tasks/kubevirt/troubleshoot-vm/task.yaml

+      kubectl delete namespace "$NS" --ignore-not-found
+  prompt:
+    inline: |-
+      There is a VirtualMachine named "broken-vm" in the ${EVAL_NAMESPACE:-vm-test} namespace that is not working correctly.


Have you tried running this with mcpchecker? IIRC it no longer supports bash substitutions, something we can address in the project but it will lead to this being passed directly to the agent/model in it's current form.

The mcpchecker passed. Since this substitution is in all kubevirt's eval tasks, I would update it in different PR in all tasks.

pkg/toolsets/kubevirt/vm_troubleshoot.go

evals/tasks/kubevirt/troubleshoot-vm/task.yaml

lyarwood · 2026-01-29T12:32:12Z

evals/tasks/kubevirt/troubleshoot-vm/task.yaml

+      echo ""
+      echo "=== Troubleshooting Eval Complete ==="
+      echo "The agent should have:"
+      echo "  1. Used the vm-troubleshoot prompt with namespace=$NS and name=broken-vm"


This is also missing from the Task API at the moment IMHO, we can define this in Evals but I also think each Task should be able to assert that tools and/or prompts are called.

@lyarwood +1 here - we have an open discussion trying to figure out how we want to solve this: mcpchecker/mcpchecker#126

Interested in hearing if you have any thoughts 😄

…urces (gvr.go) This will help in next commit to not duplicate GVRs and GVKs. Signed-off-by: Karel Simon <ksimon@redhat.com>

…tics Add a new "vm-troubleshoot" MCP prompt to the kubevirt toolset that generates a step-by-step troubleshooting guide for diagnosing VirtualMachine issues. The prompt: - Provides a structured diagnostic plan for VMs - Guides the AI through checking VM, VMI, DataVolumes, PVCs, pods, and events - Includes a summary template for reporting findings - Tries to fix the VM state This is implemented as an MCP Prompt (not a tool action). Code was assisted by Cursor AI Signed-off-by: Karel Simon <ksimon@redhat.com>

ksimon1 · 2026-02-02T11:06:38Z

@Cali0707, @manusa can you please review this PR?

manusa

The prompt logic looks good to me, thx!
For the eval part, I think Calum should give his blessing.

Cali0707

Evals look fine to me, thanks @ksimon1 !

Only thing to note is that we are hoping to move onto the new task format and off of using bash scripts as much as possible. So in the future, we will need to rework this to e.g. leverage the kubernetes extension

lyarwood · 2026-02-02T15:22:43Z

Evals look fine to me, thanks @ksimon1 !

Only thing to note is that we are hoping to move onto the new task format and off of using bash scripts as much as possible. So in the future, we will need to rework this to e.g. leverage the kubernetes extension

ACK thanks for the reminder, something for a follow up.

I can create an issue if you want to us to track this?

Cali0707 · 2026-02-02T15:34:16Z

I can create an issue if you want to us to track this?

That would be helpful!

ksimon1 · 2026-02-03T06:44:38Z

So can we merge this PR?

manusa · 2026-02-03T09:28:59Z

So can we merge this PR?

Merged 🚀, thx!

ksimon1 force-pushed the troubleshoot-vms branch from 8eea2e0 to a2f9f5c Compare January 15, 2026 10:39

lyarwood reviewed Jan 15, 2026

View reviewed changes

codingben reviewed Jan 15, 2026

View reviewed changes

Cali0707 reviewed Jan 21, 2026

View reviewed changes

ksimon1 force-pushed the troubleshoot-vms branch from a2f9f5c to d63a12e Compare January 26, 2026 12:08

ksimon1 commented Jan 26, 2026

View reviewed changes

pkg/toolsets/kubevirt/vm_troubleshoot.go Show resolved Hide resolved

ksimon1 force-pushed the troubleshoot-vms branch from d63a12e to 86b9339 Compare January 26, 2026 12:35

manusa reviewed Jan 26, 2026

View reviewed changes

README.md Outdated Show resolved Hide resolved

manusa mentioned this pull request Jan 26, 2026

[DOC] Add prompt documentation generation to update-readme-tools #698

Closed

6 tasks

ksimon1 force-pushed the troubleshoot-vms branch from 86b9339 to c5d4a7e Compare January 27, 2026 08:15

ksimon1 force-pushed the troubleshoot-vms branch from c5d4a7e to 7d32c5c Compare January 27, 2026 19:56

manusa reviewed Jan 28, 2026

View reviewed changes

ksimon1 force-pushed the troubleshoot-vms branch from 7d32c5c to 78a4c31 Compare January 29, 2026 12:20

lyarwood reviewed Jan 29, 2026

View reviewed changes

ksimon1 added 2 commits January 30, 2026 13:16

feat: extract GVRs and GVKs into single file called GroupVersion Reso…

1e02176

…urces (gvr.go) This will help in next commit to not duplicate GVRs and GVKs. Signed-off-by: Karel Simon <ksimon@redhat.com>

ksimon1 force-pushed the troubleshoot-vms branch from 78a4c31 to 8aab0e2 Compare January 30, 2026 12:53

lyarwood approved these changes Feb 2, 2026

View reviewed changes

manusa approved these changes Feb 2, 2026

View reviewed changes

manusa added this to the 0.1.0 milestone Feb 2, 2026

Cali0707 approved these changes Feb 2, 2026

View reviewed changes

lyarwood mentioned this pull request Feb 2, 2026

Transition KubeVirt toolset evals to new mcpchecker Task extensions #721

Open

manusa merged commit 6c74e2a into containers:main Feb 3, 2026
7 checks passed

Conversation

ksimon1 commented Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ksimon1 commented Jan 15, 2026

Uh oh!

lyarwood left a comment

Choose a reason for hiding this comment

Uh oh!

ksimon1 commented Jan 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Cali0707 commented Jan 21, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ksimon1 commented Jan 27, 2026

Uh oh!

ksimon1 commented Jan 28, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ksimon1 Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

manusa Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ksimon1 commented Feb 2, 2026

Uh oh!

manusa left a comment

Choose a reason for hiding this comment

Uh oh!

Cali0707 left a comment

Choose a reason for hiding this comment

Uh oh!

lyarwood commented Feb 2, 2026

Uh oh!

Cali0707 commented Feb 2, 2026

Uh oh!

ksimon1 commented Feb 3, 2026

Uh oh!

Uh oh!

manusa commented Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ksimon1 commented Jan 15, 2026 •

edited

Loading

ksimon1 Jan 29, 2026 •

edited

Loading

manusa Feb 2, 2026 •

edited

Loading