MG-34: Add oc cli like must-gather collection with ServerPrompt#51
MG-34: Add oc cli like must-gather collection with ServerPrompt#51swghosh wants to merge 4 commits intoopenshift:mainfrom
Conversation
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@harche @ardaguclu referring to #38 (comment), should we move this into |
My thoughts are we should probably be making one or more OpenShift specific toolgroups eventually |
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
yes. maybe a |
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
c9616ae to
f2b231f
Compare
|
@swghosh: This pull request references MG-34 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
I guess we never did this, but the "core" here is also just one file: |
|
There is a test failing - also it would be nice to do a rebase |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: swghosh The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/test images |
|
/retest |
fa5be4e to
b61dc09
Compare
|
|
||
| ## Tools | ||
|
|
||
| ### plan_mustgather |
There was a problem hiding this comment.
IMO this seems more like an MCP prompt than an MCP tool. See for example
openshift-mcp-server/pkg/toolsets/core/health_check.go
Lines 22 to 45 in 44e41c4
There was a problem hiding this comment.
Our thinking here was while it does guide a workflow, the complexity of the parameters makes it better suited as a tool rather than a prompt - @swghosh did we investigate this route?
There was a problem hiding this comment.
At the time of initially writing this PR, the upstream MCP server lacked support for Prompts so we ended up using the tools approach.
Also, per what we've had comprehended earlier: prompts are mainly static description/instructions to guide the agent in different things; unlike the health_check example shared it seems we can have a fully-dynamic prompt with params support generated by the MCP server to print full yamls (which is pretty much what we need in the planning). It sounds reasonable to investigate the agent flow by flipping the Tools -> Prompt assuming we can print the same text blurb in the current tool response.
There was a problem hiding this comment.
It sounds reasonable to investigate the agent flow by flipping the Tools -> Prompt
IMO one concern comes to my mind, OpenShift Lightspeed being one of the primary agent's we're targetting for this use case probably does not support MCP prompts at this time (only tools).
There was a problem hiding this comment.
hmm, worth to raise this there, for supporting prompts? They are part of the mcp spec.
There is a bit of similarity on what @Cali0707 shared for the "health", as being a ServerPrompt
| var _ api.Toolset = (*Toolset)(nil) | ||
|
|
||
| func (t *Toolset) GetName() string { | ||
| return "openshift" |
There was a problem hiding this comment.
I believe for any toolsets that will be included in the openshift-mcp-server payload we will be requiring passing evals (written with https://github.com/mcpchecker). In particular, the evals will need to work when the openshift and core toolsets are both enabled (but no others)
Feel free to ping @matzew and I for help getting this set up!
There was a problem hiding this comment.
@Prashanth684 given the prompt implementation will be backed by proper evals into achieving what we want to (i.e. let users collect a must-gather from the cluster through minimal conversation and MCP aiding the flow) should be a good idea to investigate the Tool -> Prompt transition.
There was a problem hiding this comment.
re-evals, yeah - I commented on this one yesterday: #69 (comment)
- Add plan_mustgather Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
The goal is to land the plan_mustgather implementation individually first, however that needs us to start adding the openshift toolset here as a must. |
|
Ok, thanks for the clarification!
Will take a detailed look!
Sent from Gmail Mobile
…On Wed 4. Feb 2026 at 18:54, Swarup Ghosh ***@***.***> wrote:
*swghosh* left a comment (openshift/openshift-mcp-server#51)
<#51 (comment)>
Is the goal here to land this and #38
<#38> individually?
The goal is to land the plan_mustgather implementation individually first,
however that needs us to start adding the openshift toolset here as a must.
—
Reply to this email directly, view it on GitHub
<#51 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABGPTQUOU52OESPYDC6N3D4KIW53AVCNFSM6AAAAACI3ZQ5ZGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNBYGM3DSMZUGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
5da2a62 to
9d601d4
Compare
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
9d601d4 to
d87e167
Compare
Signed-off-by: Swarup Ghosh <swghosh@redhat.com>
d87e167 to
672273b
Compare
|
@swghosh: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| if err != nil { | ||
| return "", fmt.Errorf("timeout duration is not valid") | ||
| } | ||
| gatherCmd = fmt.Sprintf("/usr/bin/timeout %s %s", timeout, gatherCmd) |
| // ParseNodeSelector parses a comma-separated key=value selector string into a map. | ||
| func ParseNodeSelector(selector string) map[string]string { |
There was a problem hiding this comment.
this returns a non-nil map for the empty string case, which would serialize to an empty node selector (instead of no selector).
I think we probably want to return nil instead
| gatherCmd := params.GatherCommand | ||
| if gatherCmd == "" { | ||
| gatherCmd = defaultGatherCmd | ||
| } |
There was a problem hiding this comment.
do we want to do any validation on this command? otherwise could this not just run an arbitrary command (which IMO we don't want) ?
There was a problem hiding this comment.
this is especially important in the context that this command will run with cluster admin privileges. maybe we don't allow setting the gather command, or only allow setting params that should be passed to the command?
| var gatherContainers = []corev1.Container{ | ||
| *gatherContainerTemplate.DeepCopy(), | ||
| } |
There was a problem hiding this comment.
do we need this deepcopy here? it seems that if any images are there, we reassign this immediately. Maybe we can invert the logic (i.e. this deepcopy is the fallback if there are no images) ?
| clusterRoleBindingName := fmt.Sprintf("%s-must-gather-collector", namespace) | ||
| clusterRoleBinding := &rbacv1.ClusterRoleBinding{ | ||
| TypeMeta: metav1.TypeMeta{ | ||
| APIVersion: "rbac.authorization.k8s.io/v1", | ||
| Kind: "ClusterRoleBinding", | ||
| }, | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: clusterRoleBindingName, | ||
| }, | ||
| RoleRef: rbacv1.RoleRef{ | ||
| APIGroup: "rbac.authorization.k8s.io", | ||
| Kind: "ClusterRole", | ||
| Name: "cluster-admin", | ||
| }, | ||
| Subjects: []rbacv1.Subject{ | ||
| { | ||
| Kind: "ServiceAccount", | ||
| Name: serviceAccountName, | ||
| Namespace: namespace, | ||
| }, | ||
| }, | ||
| } |
There was a problem hiding this comment.
I get that this is normal for must gather, but binding to cluster admin on a command that is AI-triggerable is a little scary 😅
let's make sure we validate exactly what is being run by the agent when this rolebinding is present, otherwise this seems like a huge security vuln
There was a problem hiding this comment.
@Prashanth684 too, any thoughts?
The way this works right now is a prompt and user has to still approve the tool call for the resources_create_or_update before proceeding with adding the RBAC binding.
There was a problem hiding this comment.
@Cali0707 an alternative approach is to have a fine-grained RBAC that has only "GET" kinda RBAC more fine-grained, more suitable for must-gather (since it's essentially just a collection tool) but we do require nodes/exec to run some performance analysis (which is part of the default must-gather collection). An eg. role for reference.
The downside with the granular RBAC being that users cannot trigger custom must-gather images like a must-gather image specifically targetted by an operator (implemented in all_images flow), because that'll require some other privileges.
There was a problem hiding this comment.
My main concern is that the must gather command (which runs in the pod with all these privileges) is currently something that the agent can set. So, the agent could in theory get around any RBAC/resource protections that are in place through this
There was a problem hiding this comment.
agree with these concerns. what we can do here:
- add an allowlist of commands
- add registry allow list (for non custom images - for custom images, it is upto the user, but we still need to check that the desired image is used)
- interactive confirmation
- explicitly show security warnings
- restrict this functionality to cluster admins
plan_mustgathertool for collecting must-gather(s) from OpenShift clusterDetails
[MCP inspector](https://modelcontextprotocol.io/docs/tools/inspector):Input (inferred defaults):
{ "gather_command": "/usr/bin/gather", "source_dir": "/must-gather" }Output:
The generated plan contains YAML manifests for must-gather pods and required resources (namespace, serviceaccount, clusterrolebinding). Suggest how the user can apply the manifest and copy results locally (
oc cp/kubectl cp).Ask the user if they want to apply the plan
oc apply/kubectl applyinstead.Once the must-gather collection is completed, the user may which to cleanup the created resources.
kubectl delete.