chore(models): Move models into kubeflow_katib_api package #2579

kramaranya · 2025-10-08T15:04:35Z

What this PR does / why we need it:
Move generated Python models from sdk/python/v1beta1/kubeflow/katib/models into a new api/kubeflow_katib_api/models package. This is needed so we can implement new OptimizerClient and import those models form the Kubeflow SDK.

Which issue(s) this PR fixes:
Fixes #2577

cc @kubeflow/kubeflow-sdk-team @kubeflow/wg-automl-leads

Signed-off-by: kramaranya <[email protected]>

andreyvelich

Thank you for this @kramaranya!
I left a few comments.

andreyvelich · 2025-10-08T23:19:10Z

api/README.md

@@ -0,0 +1,3 @@
+# Kubeflow Katib API


Let's keep the folder consistent with Trainer and put it under:

api/python_api/...

andreyvelich · 2025-10-08T23:21:58Z

hack/gen-python-api/gen-api.sh

+POST_GEN_PYTHON_HANDLER="hack/gen-python-api/post_gen.py"
+KATIB_VERSIONS=(v1beta1)
+
+# Download JAR package if file doesn't exist.


Please can we use openapi generator container to generate modules like we do for Trainer ?
https://github.com/kubeflow/trainer/blob/master/hack/python-api/gen-api.sh#L34-L48

andreyvelich · 2025-10-08T23:23:24Z

hack/gen-python-api/post_gen.py

@@ -0,0 +1,132 @@
+# Copyright 2025 The Kubeflow Authors.


you don't need to have this post_gen script for modules since we don't need api_client, and other files like in Trainer: https://github.com/kubeflow/trainer/blob/master/hack/python-api/gen-api.sh#L50-L54

Signed-off-by: Andrey Velichkevich <[email protected]>

andreyvelich · 2025-10-10T03:22:17Z

@kramaranya I've done required updates to make sure Katib models can work with Kubeflow SDK correctly.
We still need to perform some testing, but at least I was able to create Katib Experiment:

def train_func(lr: str, num_epochs: str):
    import time
    import random

    for i in range(10):
        time.sleep(1)
        print(f"Training {i}, lr: {lr}, num_epochs: {num_epochs}")

    print(f"loss={round(random.uniform(0.77, 0.99), 2)}")


OptimizerClient().optimize(
    TrainJobTemplate(
        trainer=CustomTrainer(train_func),
    ),
    search_space={
        "lr": Search.loguniform(0.01, 0.05),
        "num_epochs": Search.choice([2, 4, 5]),
    },
)

Katib Experiment

apiVersion: kubeflow.org/v1beta1
kind: Experiment
metadata:
  creationTimestamp: "2025-10-10T03:13:40Z"
  finalizers:
  - update-prometheus-metrics
  generation: 1
  name: j570986812db
  namespace: default
  resourceVersion: "10962"
  uid: ea3b8710-6534-415c-a836-7153b03fbc70
spec:
  algorithm:
    algorithmName: random
  maxTrialCount: 10
  metricsCollectorSpec:
    collector:
      kind: StdOut
  objective:
    metricStrategies:
    - name: loss
      value: min
    objectiveMetricName: loss
    type: minimize
  parallelTrialCount: 1
  parameters:
  - feasibleSpace:
      distribution: logUniform
      max: "0.05"
      min: "0.01"
    name: lr
    parameterType: double
  - feasibleSpace:
      distribution: uniform
      list:
      - "2"
      - "4"
      - "5"
    name: num_epochs
    parameterType: categorical
  resumePolicy: Never
  trialTemplate:
    failureCondition: status.conditions.#(type=="Failed")#|#(status=="True")#
    primaryContainerName: node
    primaryPodLabels:
      batch.kubernetes.io/job-completion-index: "0"
      jobset.sigs.k8s.io/replicatedjob-name: node
    successCondition: status.conditions.#(type=="Complete")#|#(status=="True")#
    trialParameters:
    - name: lr
      reference: lr
    - name: num_epochs
      reference: num_epochs
    trialSpec:
      apiVersion: trainer.kubeflow.org/v1alpha1
      kind: TrainJob
      spec:
        runtimeRef:
          name: torch-distributed
        trainer:
          command:
          - bash
          - -c
          - |2-

            read -r -d '' SCRIPT << EOM

            def train_func(lr: str, num_epochs: str):
                import time
                import random

                for i in range(10):
                    time.sleep(1)
                    print(f"Training {i}, lr: {lr}, num_epochs: {num_epochs}")

                print(f"loss={round(random.uniform(0.77, 0.99), 2)}")

            train_func(**{'lr': '${trialParameters.lr}', 'num_epochs': '${trialParameters.num_epochs}'})

            EOM
            printf "%s" "$SCRIPT" > "test-iceberg.py"
            torchrun "test-iceberg.py"
status:
  conditions:
  - lastTransitionTime: "2025-10-10T03:13:41Z"
    lastUpdateTime: "2025-10-10T03:13:41Z"
    message: Experiment is created
    reason: ExperimentCreated
    status: "True"
    type: Created
  - lastTransitionTime: "2025-10-10T03:14:01Z"
    lastUpdateTime: "2025-10-10T03:14:01Z"
    message: Experiment is running
    reason: ExperimentRunning
    status: "True"
    type: Running
  currentOptimalTrial:
    observation: {}
  pendingTrialList:
  - j570986812db-z7xbnp4m
  startTime: "2025-10-10T03:13:41Z"
  trials: 1
  trialsPending: 1

Signed-off-by: kramaranya <[email protected]>

kramaranya · 2025-10-13T19:55:57Z

/retest

kramaranya · 2025-10-13T19:59:15Z

@kramaranya I've done required updates to make sure Katib models can work with Kubeflow SDK correctly. We still need to perform some testing, but at least I was able to create Katib Experiment:

Thank you @andreyvelich for this! I've updated the PR with container runtime script. Is there anything else that should be updated as part of this PR?

andreyvelich

Thanks @kramaranya 🎉
This modules look to be working fine for create API: kubeflow/sdk#124

/lgtm
/approve

google-oss-prow · 2025-10-13T21:56:29Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: andreyvelich

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [andreyvelich]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

chore(models): Move models into kubeflow_katib_api package

f6132af

Signed-off-by: kramaranya <[email protected]>

google-oss-prow bot added the size/XXL label Oct 8, 2025

google-oss-prow bot requested review from andreyvelich, anencore94 and gaocegege October 8, 2025 15:05

kramaranya added 4 commits October 8, 2025 16:13

Add generated files to .pre-commit-config.yaml

b3be3b8

Signed-off-by: kramaranya <[email protected]>

Add SDK generator JAR file to .gitignore

79d7ff2

Signed-off-by: kramaranya <[email protected]>

Rename subfolder for gen-python-api

05faa28

Signed-off-by: kramaranya <[email protected]>

Remove subfolder for python-api

63050e8

Signed-off-by: kramaranya <[email protected]>

andreyvelich reviewed Oct 8, 2025

View reviewed changes

Generate Katib modules for Kubeflow SDK

f988ef0

Signed-off-by: Andrey Velichkevich <[email protected]>

kramaranya added 2 commits October 13, 2025 17:18

Add container runtime models generation

783b493

Signed-off-by: kramaranya <[email protected]>

Fix shellcheck

97265fe

Signed-off-by: kramaranya <[email protected]>

andreyvelich mentioned this pull request Oct 13, 2025

feat: Hyperparameter Optimization APIs in Kubeflow SDK kubeflow/sdk#124

Open

3 tasks

andreyvelich reviewed Oct 13, 2025

View reviewed changes

google-oss-prow bot assigned andreyvelich Oct 13, 2025

google-oss-prow bot added the lgtm label Oct 13, 2025

google-oss-prow bot added the approved label Oct 13, 2025

google-oss-prow bot merged commit f3bfb48 into kubeflow:master Oct 13, 2025
88 of 89 checks passed

kramaranya mentioned this pull request Oct 13, 2025

Support Hyperparameter Optimization in Kubeflow SDK kubeflow/sdk#46

Open

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(models): Move models into kubeflow_katib_api package #2579

chore(models): Move models into kubeflow_katib_api package #2579

kramaranya commented Oct 8, 2025

Uh oh!

andreyvelich left a comment

Uh oh!

andreyvelich Oct 8, 2025

Uh oh!

andreyvelich Oct 8, 2025

Uh oh!

andreyvelich Oct 8, 2025

Uh oh!

andreyvelich commented Oct 10, 2025

Uh oh!

kramaranya commented Oct 13, 2025

Uh oh!

kramaranya commented Oct 13, 2025

Uh oh!

andreyvelich left a comment

Uh oh!

google-oss-prow bot commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore(models): Move models into kubeflow_katib_api package #2579

chore(models): Move models into kubeflow_katib_api package #2579

Conversation

kramaranya commented Oct 8, 2025

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

andreyvelich Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

andreyvelich Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

andreyvelich Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

andreyvelich commented Oct 10, 2025

Uh oh!

kramaranya commented Oct 13, 2025

Uh oh!

kramaranya commented Oct 13, 2025

Uh oh!

andreyvelich left a comment

Choose a reason for hiding this comment

Uh oh!

google-oss-prow bot commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants