Skip to content

Conversation

@MenD32
Copy link
Contributor

@MenD32 MenD32 commented Oct 15, 2025

Currently OpenShift cannot create H200 machines since they are part of the a3 machineFamily but don't have a quota in the gcp compute library.

@openshift-ci openshift-ci bot requested review from nrb and theobarberbany October 15, 2025 12:32
@elmiko elmiko changed the title fix: added H200 support NO-JIRA: added H200 support Oct 15, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 15, 2025
@openshift-ci-robot
Copy link
Contributor

@MenD32: This pull request explicitly references no jira issue.

In response to this:

Currently OpenShift cannot create H200 machines since they are part of the a3 machineFamily but don't have a quota in the gcp compute library.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Contributor

@elmiko elmiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me, i do wonder if we shouldn't have some warning log message when we are skipping the accelerator validation. if there is no quota, or resource exhaustion, i'm not sure the user will be able to easily detect that.

@elmiko
Copy link
Contributor

elmiko commented Oct 29, 2025

@MenD32 any thoughts about this question ?

also, cc @damdo ptal

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

One nit

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 29, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: damdo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 29, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented Oct 29, 2025

this makes sense to me, i do wonder if we shouldn't have some warning log message when we are skipping the accelerator validation. if there is no quota, or resource exhaustion, i'm not sure the user will be able to easily detect that.

I think this is somewhat broader then just H200 support, since this also affects other machine-types (g2, g4, a4, etc...). So I'm not sure how machine-api-provider should work with those...

@MenD32
Copy link
Contributor Author

MenD32 commented Oct 29, 2025

/retest

@MenD32 MenD32 requested review from damdo and elmiko October 30, 2025 10:16
Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Oct 30, 2025
@damdo
Copy link
Member

damdo commented Oct 30, 2025

@MenD32 do you have a Jira card to track this?

@damdo
Copy link
Member

damdo commented Oct 30, 2025

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 30, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented Oct 30, 2025

No, where should I open one?

@damdo
Copy link
Member

damdo commented Oct 30, 2025

@MenD32 probably on your team's Jira board

@MenD32
Copy link
Contributor Author

MenD32 commented Oct 30, 2025

My team doesn't currently work on RH's Jira, so IDK if there'd integration between the GitHub and the Jira instance. Nevertheless I'll create an issue

@MenD32 MenD32 changed the title NO-JIRA: added H200 support JN-2789: added H200 support Oct 30, 2025
@openshift-ci-robot
Copy link
Contributor

@MenD32: No Jira issue with key JN-2789 exists in the tracker at https://issues.redhat.com/.
Once a valid jira issue is referenced in the title of this pull request, request a refresh with /jira refresh.

In response to this:

Currently OpenShift cannot create H200 machines since they are part of the a3 machineFamily but don't have a quota in the gcp compute library.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot removed the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Oct 30, 2025
@MenD32
Copy link
Contributor Author

MenD32 commented Oct 30, 2025

@MenD32 probably on your team's Jira board

Added a JIRA ticket reference

@damdo
Copy link
Member

damdo commented Oct 30, 2025

TY, I am chatting internally to see how best we test/verify this cc. @elmiko

@MenD32 MenD32 requested a review from damdo November 10, 2025 14:38
@openshift-merge-robot openshift-merge-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2025
@openshift-ci openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Nov 12, 2025
@openshift-merge-robot openshift-merge-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 12, 2025
@damdo
Copy link
Member

damdo commented Nov 12, 2025

Thanks for rebasing @MenD32

It looks like you created a merge commit for this, whereas we normally do a full rebase instead.
Could you please update the PR to reflect that? TY

MenD32 and others added 2 commits November 12, 2025 10:59
@MenD32 MenD32 force-pushed the fix/support-h200-gpus branch from de6fa97 to 495f894 Compare November 12, 2025 08:59
@MenD32
Copy link
Contributor Author

MenD32 commented Nov 12, 2025

Thanks for rebasing @MenD32

It looks like you created a merge commit for this, whereas we normally do a full rebase instead. Could you please update the PR to reflect that? TY

for some reason it still wanted a merge after the rebase, I just redid the rebase and now it seems that its fine

Copy link
Member

@damdo damdo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/unhold

@openshift-ci openshift-ci bot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Nov 13, 2025
@damdo
Copy link
Member

damdo commented Nov 13, 2025

/jira refresh

@openshift-ci-robot
Copy link
Contributor

@damdo: No Jira issue with key JN-2789 exists in the tracker at https://issues.redhat.com/.
Once a valid jira issue is referenced in the title of this pull request, request a refresh with /jira refresh.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Contributor

@damdo: The referenced Jira(s) [JN-2789] could not be located, all automatically applied jira labels will be removed.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@sunzhaohua2
Copy link

/verified by @sunzhaohua2
https://redhat-internal.slack.com/archives/CBZHF4DHC/p1763012438945199?thread_ts=1762265368.286229&cid=CBZHF4DHC

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Nov 13, 2025
@openshift-ci-robot
Copy link
Contributor

@sunzhaohua2: This PR has been marked as verified by @sunzhaohua2.

In response to this:

/verified by @sunzhaohua2
https://redhat-internal.slack.com/archives/CBZHF4DHC/p1763012438945199?thread_ts=1762265368.286229&cid=CBZHF4DHC

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@damdo damdo changed the title JN-2789: added H200 support NO-JIRA: JN-2789: added H200 support Nov 13, 2025
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Nov 13, 2025
@openshift-ci-robot
Copy link
Contributor

@MenD32: This pull request explicitly references no jira issue.

In response to this:

Currently OpenShift cannot create H200 machines since they are part of the a3 machineFamily but don't have a quota in the gcp compute library.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@damdo
Copy link
Member

damdo commented Nov 13, 2025

Adding NO-JIRA as JN-2789 is not picked up by the OCP robot

@damdo
Copy link
Member

damdo commented Nov 13, 2025

/tide refresh

@openshift-merge-bot openshift-merge-bot bot merged commit 8f59a1a into openshift:main Nov 13, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants