Skip to content

Conversation

@ericspod
Copy link
Member

@ericspod ericspod commented Nov 23, 2025

Description

This is an attempt to create a slim Docker image which is smaller than the current one to avoid running out of space during testing. Various fixes have been included to account for test fails within the image. These appear to be all real issues that need to be addressed (eg. ONNX export) or fixes that should be integrated either way.

This excludes PyTorch 2.9 from the requirements for now to avoid legacy issues with ONNX, Torchscript, and other things. MONAI needs to be updated for PyTorch 2.9 support, specifically dropping the use of Torchscript in places as it's becoming obsolete in place of torch.export.

Some tests fail without enough shared memory, the command I'm using to run with is docker run -ti --rm --gpus '"device=0,1"' --shm-size=10gb -v $(pwd)/tests:/opt/monai/tests monai_slim /bin/bash to tests with GPUs 0 and 1.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests --disttests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@Project-MONAI Project-MONAI deleted a comment from coderabbitai bot Nov 23, 2025
Signed-off-by: Eric Kerfoot <[email protected]>
@ericspod
Copy link
Member Author

ericspod commented Dec 6, 2025

Nine tests in the image currently fail. The first 4 are related to auto3dseg and mention a value "image_stats" being missing from a config file, these tests pass when run in isolation however. The others relate to the GMM module and not being able to compile it since nvcc is missing from image, which is true since the CUDA toolkit is omitted for size reasons.

Output of the errors

======================================================================
ERROR: test_ensemble (tests.integration.test_auto3dseg_ensemble.TestEnsembleBuilder.test_ensemble)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_ensemble.py", line 155, in test_ensemble
    bundle_generator.generate(self.work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_get_history (tests.integration.test_auto3dseg_hpo.TestHPO.test_get_history)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
    bundle_generator.generate(work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_run_algo (tests.integration.test_auto3dseg_hpo.TestHPO.test_run_algo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
    bundle_generator.generate(work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_run_optuna (tests.integration.test_auto3dseg_hpo.TestHPO.test_run_optuna)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
    look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
    raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats', 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
    bundle_generator.generate(work_dir, num_fold=1)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
    gen_algo.export_to_disk(output_folder, name, fold=f_id)
  File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
    self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
  File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
    raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'

======================================================================
ERROR: test_cuda_0_2_batches_1_dimensions_1_channels_2_classes_2_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_0_2_batches_1_dimensions_1_channels_2_classes_2_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_1_2_1_Linux_3_11_2_28_12_8'

======================================================================
ERROR: test_cuda_1_1_batches_1_dimensions_5_channels_2_classes_1_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_1_1_batches_1_dimensions_5_channels_2_classes_1_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_5_2_1_Linux_3_11_2_28_12_8'

======================================================================
ERROR: test_cuda_2_1_batches_2_dimensions_2_channels_4_classes_4_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_2_1_batches_2_dimensions_2_channels_4_classes_4_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_2_4_1_Linux_3_11_2_28_12_8'

======================================================================
ERROR: test_cuda_3_1_batches_3_dimensions_1_channels_2_classes_1_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_3_1_batches_3_dimensions_1_channels_2_classes_1_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
    return func(*(a + p.args), **p.kwargs, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
    gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
    self.compiled_extension = load_module(
                              ^^^^^^^^^^^^
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_1_2_1_Linux_3_11_2_28_12_8_v1'

======================================================================
ERROR: test_load (tests.networks.layers.test_gmm.GMMTestCase.test_load)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
    subprocess.run(
  File "/usr/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/opt/monai/tests/networks/layers/test_gmm.py", line 310, in test_load
    load_module("gmm", {"CHANNEL_COUNT": 2, "MIXTURE_COUNT": 2, "MIXTURE_SIZE": 3}, verbose_build=True)
  File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
    module = load(
             ^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
    return _jit_compile(
           ^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
    _write_ninja_file_and_build_library(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
    _run_ninja_build(
  File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_2_2_3_Linux_3_11_2_28_12_8'

It's a simple matter of forcing the GMM module to build when building the image, this also fails if used as a RUN command: python -c 'from monai._extensions import load_module;load_module("gmm", {"CHANNEL_COUNT": 2, "MIXTURE_COUNT": 2, "MIXTURE_SIZE": 3}, verbose_build=True)'

@ericspod ericspod marked this pull request as ready for review December 8, 2025 11:53
@Project-MONAI Project-MONAI deleted a comment from coderabbitai bot Dec 8, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
.github/workflows/pythonapp.yml (1)

31-37: Consolidate repetitive cleanup logic.

The "Clean unused tools" step is identical across three jobs. Consider using a GitHub Actions composite action or reusable workflow to avoid duplication. Additionally, verify whether the duplicate cleanup commands in the subsequent "Install dependencies" steps (e.g., line 56, line 169) are still necessary now that cleanup runs upfront.

Also applies to: 140-146, 232-238

Dockerfile.slim (1)

88-88: CUDA_HOME set in non-CUDA runtime image.

Line 88 sets ENV CUDA_HOME=/usr/local/cuda, but the CUDA toolkit is not present in the final image (intentionally omitted for size). While MONAI can degrade gracefully, this may cause confusing behavior or spurious build attempts. Consider removing this env var or documenting why it's needed.

requirements.txt (1)

1-2: Add inline comments explaining version constraints.

The <2.9 upper bound (ONNX/TorchScript incompatibility) and Windows !=2.7.0 exclusion (known CUDA/XPU wheel issues) should be documented with comments to help future maintainers understand these constraints.

tests/networks/test_convert_to_onnx.py (1)

73-104: Fix minor docstring typo in SegResNet test

Docstring says “SetResNet” while the model is SegResNet. Consider aligning the wording.

-        """Test converting SetResNet to ONNX."""
+        """Test converting SegResNet to ONNX."""
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 15fd428 and 39ecb09.

📒 Files selected for processing (12)
  • .dockerignore (1 hunks)
  • .github/workflows/pythonapp.yml (3 hunks)
  • Dockerfile.slim (1 hunks)
  • monai/apps/vista3d/inferer.py (1 hunks)
  • monai/networks/nets/vista3d.py (1 hunks)
  • monai/networks/utils.py (2 hunks)
  • requirements-dev.txt (2 hunks)
  • requirements.txt (1 hunks)
  • tests/bundle/test_bundle_download.py (2 hunks)
  • tests/data/meta_tensor/test_meta_tensor.py (1 hunks)
  • tests/losses/test_multi_scale.py (1 hunks)
  • tests/networks/test_convert_to_onnx.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • tests/losses/test_multi_scale.py
  • monai/networks/nets/vista3d.py
  • tests/data/meta_tensor/test_meta_tensor.py
  • monai/apps/vista3d/inferer.py
  • monai/networks/utils.py
  • tests/bundle/test_bundle_download.py
  • tests/networks/test_convert_to_onnx.py
🧬 Code graph analysis (3)
tests/losses/test_multi_scale.py (1)
tests/test_utils.py (1)
  • assert_allclose (119-159)
monai/networks/nets/vista3d.py (1)
monai/data/meta_tensor.py (1)
  • astype (434-461)
tests/data/meta_tensor/test_meta_tensor.py (2)
monai/bundle/scripts.py (1)
  • load (630-770)
monai/networks/nets/transchex.py (1)
  • load (96-103)
🔇 Additional comments (15)
tests/bundle/test_bundle_download.py (2)

18-18: Importing skipIf alongside skipUnless is appropriate

Matches existing usage pattern and keeps decorators local; no issues here.


222-222: NGC private test skip condition is correct

Skipping when NGC_API_KEY is not set is a precise guard for this private-source test and avoids spurious failures in environments without credentials.

tests/data/meta_tensor/test_meta_tensor.py (1)

248-248: Verify that production code loading MetaTensor objects accounts for custom class requirements.

The weights_only=False parameter at line 248 is necessary for MetaTensor to preserve its metadata and custom attributes during deserialization. However, ensure that any production code paths deserializing MetaTensor objects also use weights_only=False, as weights_only=True restricts loading to basic types and will fail for custom classes. For untrusted external model files, consider using safer formats like safetensors or ensure PyTorch ≥ 2.6.0 is used to mitigate known deserialization vulnerabilities.

.dockerignore (1)

6-10: LGTM.

The additions align with Docker build context optimization and complement the Dockerfile.slim copying strategy.

Dockerfile.slim (3)

56-56: Confirm extension build succeeds despite known GMM failures.

Per PR notes, GMM extension build fails in the slim image due to missing CUDA compiler. This build command (line 56) succeeds in the build stage, but GMM tests fail later when run inside the slim image. Clarify whether this is expected behavior or whether the slim image is intended to support GMM-dependent tests.

If the slim image is not meant to support GMM tests, document this limitation or skip GMM tests conditionally in the image.


16-89: Multi-stage build is well-designed for size optimization.

The separation of build (with CUDA) and runtime (without CUDA) stages correctly achieves the slim image goal. Artifact cleanup (line 75-76) appropriately reduces bloat.


38-42: NGC CLI validation is thorough; verify it's used.

The NGC CLI is downloaded and validated via md5sum in the build, but there's no evidence it's invoked in MONAI workflows. If unused, consider removing to reduce image bloat. Verify NGC CLI is actually used by searching the codebase for invocations (e.g., rg -i 'ngc\s+' in monai/ and tests/ directories).

tests/losses/test_multi_scale.py (1)

58-58: Justify the tolerance relaxation.

Loosening rtol from 1e-5 to 1e-4 may mask numerical regressions. Add a comment explaining the reason (e.g., slim image, different CUDA runtime).

monai/apps/vista3d/inferer.py (1)

89-95: LGTM!

Using tuple for immutable slice specification is idiomatic.

requirements-dev.txt (2)

3-3: Unpinning pytorch-ignite may introduce breaking changes.

Consider adding a minimum version constraint if specific features are relied upon.


55-55: LGTM!

Making onnxruntime unconditional is appropriate since it now supports Python 3.11+.

monai/networks/nets/vista3d.py (1)

246-249: LGTM!

Explicit int() cast avoids NumPy bool typing issues. Logic is clearer with intermediate variables.

monai/networks/utils.py (1)

714-745: ONNX export target variable rename is correct and consistent

model_to_export is now set appropriately in both the trace and script branches and passed to torch.onnx.export, resolving the previous naming inconsistency without changing external behavior. Looks good.

tests/networks/test_convert_to_onnx.py (2)

25-32: CPU‑only device options and ONNX import handling are reasonable

Using onnx, _ = optional_import("onnx") together with @SkipIfNoModule("onnx") is safe, and constraining TORCH_DEVICE_OPTIONS to ["cpu"] with a clear FIXME matches the current nondeterministic CUDA behavior and slim-image constraints.


48-70: UNet ONNX export test now robustly exercises both trace and script paths

The refactored test_unet always calls convert_to_onnx, parameterizes use_trace and use_ort, and validates the return type against onnx.ModelProto, giving good coverage of the updated convert_to_onnx behavior on CPU.

@Project-MONAI Project-MONAI deleted a comment from coderabbitai bot Dec 8, 2025
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Signed-off-by: Eric Kerfoot <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 8, 2025

Walkthrough

This PR introduces infrastructure and code refinements across build configuration, CI workflows, and test utilities. Changes include expanding .dockerignore patterns, adding CI cleanup steps with version spec escaping, introducing a new multi-stage Dockerfile.slim for CUDA-enabled builds, performing minor code adjustments (type conversions, variable naming), updating dependency pinning (pytorch-ignite, onnxruntime, torch constraints), and refining test coverage with conditional execution and parameter tweaks. Approximately 12 files affected spanning configuration, Docker, core modules, and tests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • Dockerfile.slim: Most complex change—three-stage build pipeline with CUDA integration, system dependencies, NGC CLI, and environment configuration requires careful validation of layer logic and artifact transfers.
  • requirements.txt: Unified torch upper bound (<2.9) across platforms with Windows-specific exclusion; verify constraint consistency and compatibility implications.
  • CI workflow (pythonapp.yml): Repeated cleanup steps and version spec escaping across multiple jobs; confirm correctness of repeated pattern.
  • Test modifications: Heterogeneous changes (skipIf decorator, weights_only flag, tolerance adjustment, CUDA removal)—each requires separate reasoning for test intent and side effects.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title 'Docker Slim Image' is vague and doesn't clearly convey the purpose—whether it's creating a new slim image, optimizing an existing one, or fixing issues within it. Clarify the title to be more specific, e.g., 'Add Dockerfile.slim for space-optimized image' or 'Create slim Docker image with PyTorch <2.9 constraint'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The description covers intent, changes, and testing approach; however, it lacks a proper issue reference and doesn't enumerate all major file changes (e.g., Dockerfile.slim, requirements updates, test fixes).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Eric Kerfoot <[email protected]>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
tests/losses/test_multi_scale.py (1)

57-58: Relaxed tolerance is reasonable for numeric stability

Bumping rtol to 1e-4 here is still strict for a scalar loss value and should help avoid backend‑dependent noise causing spurious failures. No functional concerns from this change alone. You might optionally consider using the shared assert_allclose helper from tests.test_utils for consistency across tests, but it’s not required for correctness here.

.github/workflows/pythonapp.yml (1)

31-37: Cleanup step placement is ineffectual; consider moving or consolidating.

Running this cleanup before checkout provides minimal value since no artifacts exist yet. Cleanup is most useful after build steps produce large intermediate files. Additionally, the same 5-line block repeats across three jobs—consider extracting to a reusable workflow or composite action to reduce duplication.

Also applies to: 140-146, 232-238

Dockerfile.slim (2)

90-90: BUILD_MONAI flag is build-time only; unnecessary in runtime image.

Line 90 sets BUILD_MONAI=1 in the final image, but this flag is only meaningful during setup.py develop (line 56 of the build stage). Carrying it into the runtime image serves no purpose and may confuse future maintainers.

Apply this diff to remove the unnecessary flag:

 ENV PATH=${PATH}:/opt/tools:/opt/tools/ngc-cli
 ENV POLYGRAPHY_AUTOINSTALL_DEPS=1
-ENV BUILD_MONAI=1
 ENV CUDA_HOME=/usr/local/cuda

76-77: Restrict pycache cleanup to MONAI-specific paths for efficiency.

Line 77 uses find / to remove all __pycache__ directories, which is inefficient and targets too broadly (system libraries, /proc, /sys, etc.). Scope the cleanup to MONAI artifacts and Python site-packages.

Apply this diff:

 RUN rm -rf /opt/monai/build /opt/monai/monai.egg-info && \
-    find / -name __pycache__ | xargs rm -rf
+    find /opt/monai /usr/local/lib -name __pycache__ -type d -exec rm -rf {} + 2>/dev/null || true
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 15fd428 and ee7afe8.

📒 Files selected for processing (12)
  • .dockerignore (1 hunks)
  • .github/workflows/pythonapp.yml (4 hunks)
  • Dockerfile.slim (1 hunks)
  • monai/apps/vista3d/inferer.py (1 hunks)
  • monai/networks/nets/vista3d.py (1 hunks)
  • monai/networks/utils.py (2 hunks)
  • requirements-dev.txt (2 hunks)
  • requirements.txt (1 hunks)
  • tests/bundle/test_bundle_download.py (2 hunks)
  • tests/data/meta_tensor/test_meta_tensor.py (1 hunks)
  • tests/losses/test_multi_scale.py (1 hunks)
  • tests/networks/test_convert_to_onnx.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

Review the Python code for quality and correctness. Ensure variable names adhere to PEP8 style guides, are sensible and informative in regards to their function, though permitting simple names for loop and comprehension variables. Ensure routine names are meaningful in regards to their function and use verbs, adjectives, and nouns in a semantically appropriate way. Docstrings should be present for all definition which describe each variable, return value, and raised exception in the appropriate section of the Google-style of docstrings. Examine code for logical error or inconsistencies, and suggest what may be changed to addressed these. Suggest any enhancements for code improving efficiency, maintainability, comprehensibility, and correctness. Ensure new or modified definitions will be covered by existing or new unit tests.

Files:

  • monai/networks/nets/vista3d.py
  • monai/networks/utils.py
  • tests/networks/test_convert_to_onnx.py
  • tests/data/meta_tensor/test_meta_tensor.py
  • tests/bundle/test_bundle_download.py
  • tests/losses/test_multi_scale.py
  • monai/apps/vista3d/inferer.py
🧬 Code graph analysis (2)
tests/networks/test_convert_to_onnx.py (3)
monai/utils/module.py (1)
  • optional_import (315-445)
monai/networks/nets/unet.py (1)
  • UNet (28-299)
monai/networks/utils.py (1)
  • convert_to_onnx (661-785)
tests/losses/test_multi_scale.py (1)
tests/test_utils.py (1)
  • assert_allclose (119-159)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: packaging
  • GitHub Check: quick-py3 (macOS-latest)
  • GitHub Check: quick-py3 (windows-latest)
  • GitHub Check: quick-py3 (ubuntu-latest)
  • GitHub Check: flake8-py3 (pytype)
  • GitHub Check: min-dep-os (macOS-latest)
🔇 Additional comments (14)
tests/bundle/test_bundle_download.py (2)

18-18: Importing skipIf is appropriate and consistent

Adding skipIf here is correct for the new test gating and keeps skip-related utilities together with skipUnless. No issues.


220-223: Environment‑gated NGC private test looks correct

Skipping test_ngc_private_source_download_bundle when NGC_API_KEY is unset is a sensible way to avoid hard failures in environments without credentials, while still running the test when properly configured.

monai/networks/utils.py (1)

716-736: LGTM - Typo fix in variable naming.

Variable renamed from mode_to_export to model_to_export in both tracing and scripting branches. Semantically correct and consistent.

monai/apps/vista3d/inferer.py (1)

89-95: LGTM - List to tuple for slice construction.

Changed from list to tuple for the slice indexing object. Both work equivalently for tensor indexing; tuple is more idiomatic for immutable sequences.

.dockerignore (1)

6-10: LGTM - Added development artifact ignores.

Good practice to exclude .vscode, .git, and various tool caches from Docker builds. Reduces image size.

monai/networks/nets/vista3d.py (1)

246-249: LGTM - Explicit type conversion for numpy compatibility.

Refactored to build per-point boolean list, apply np.any(), and explicitly cast to int. Comment clarifies this avoids numpy typing issues. Functionally equivalent to previous logic.

tests/networks/test_convert_to_onnx.py (3)

25-32: LGTM - Simplified to CPU-only ONNX testing.

Removed CUDA device options, restricting tests to CPU. Aligns with slim Docker image without CUDA toolkit. Comment at line 29-31 notes CUDA produces different outputs during testing.


48-69: LGTM - Added docstring and type assertion.

Added docstring for test clarity. Now always calls convert_to_onnx and asserts return type is onnx.ModelProto, improving test coverage.


73-104: LGTM - Added docstring for SegResNet test.

Documents that this tests SegResNet ONNX conversion with use_trace=True. Existing tolerance and runtime checks preserved.

tests/data/meta_tensor/test_meta_tensor.py (1)

248-248: Verify necessity of weights_only=False change.

Changed from weights_only=True to weights_only=False when loading the pickled MetaTensor. This allows loading arbitrary Python objects, which is less secure. Ensure this is required for proper MetaTensor deserialization with metadata.

requirements.txt (1)

1-2: Reconsider Windows exclusion of PyTorch 2.7.0 — documented issues are environmental, not version-critical.

PyTorch 2.7.0 (latest stable as of December 2025) has known Windows complications: CUDA/toolkit mismatches (especially CUDA 12.x builds), XPU wheel failures, and dependency conflicts with torchvision/torchaudio. However, these are configuration and environment issues, not version-specific breakages. Excluding 2.7.0 entirely on Windows prevents users from installing the latest stable release. Instead, document the required environment setup (CUDA version matching, wheel source verification, system DLL/Windows 10+ requirement) or keep 2.7.0 available if your tested environment passes.

The upper bound <2.9 is appropriate; MONAI 1.5.x supports PyTorch up to 2.8, and PyTorch 2.9 is not yet released.

Dockerfile.slim (3)

72-73: ✓ Python version handling addressed prior feedback.

The use of ARG PYTHON_VERSION=3.11 with variable substitution in the COPY command (line 73) makes the build maintainable and configurable across Python versions. This directly addresses the prior review concern about hardcoded paths.


38-42: Verify NGC CLI MD5 validation file exists in build context.

Line 41 attempts to validate the NGC CLI download using ngc-cli.md5, but this file must be available in the build context when the RUN command executes. Confirm that ngc-cli.md5 is either:

  • Present in the repository root, or
  • Copied into the Docker image before line 41 via a COPY or ADD statement

89-89: CUDA_HOME points to non-existent directory in runtime image.

Line 89 sets CUDA_HOME=/usr/local/cuda, but the build2 stage (lines 64–68) explicitly does not install the CUDA toolkit. This means the directory doesn't exist at runtime, which could cause failures if code or precompiled extensions try to resolve CUDA libraries or executables.

Either remove this environment variable from the runtime image or install minimal CUDA runtime libraries in the build2 stage if they're required for the precompiled extensions.

# however, "pip install monai*.tar.gz" will build cpp/cuda with an isolated
# fresh torch installation according to pyproject.toml
python -m pip install torch>=2.5.1 torchvision
python -m pip install torch\>=2.5.1 torchvision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Remove escape before >= in torch dependency.

Line 174 escapes >= as \>=, but line 114 shows the correct pattern without escaping. The backslash will likely cause pip to receive an invalid version specifier. Unescaped >= works correctly inside double quotes in bash.

Apply this diff:

-        python -m pip install torch\>=2.5.1 torchvision
+        python -m pip install torch>=2.5.1 torchvision
🤖 Prompt for AI Agents
In .github/workflows/pythonapp.yml around line 174, the torch version specifier
currently escapes the '>=' as '\>=', which produces an invalid pip version
specifier; remove the backslash and use an unescaped comparator (e.g. change to
python -m pip install "torch>=2.5.1" torchvision or python -m pip install
torch>=2.5.1 torchvision) so pip receives a valid requirement.

# Full requirements for developments
-r requirements-min.txt
pytorch-ignite==0.4.11
pytorch-ignite
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

pytorch-ignite breaking changes after 0.4.11 changelog

💡 Result:

Major BC change after v0.4.11

  • v0.5.0: ignite.contrib.* was removed/moved — all contrib metrics/handlers were moved into the top-level modules (ignite.handlers and ignite.metrics). This is the primary breaking change for users importing from ignite.contrib. [1][2]

Notes/citations

  • GitHub releases (v0.4.13…v0.5.0) showing the change. [1]
  • Documentation: ignite.contrib marked deprecated since 0.5.0 (metrics/handlers moved). [2]

References
[1] PyTorch‑Ignite Releases (v0.4.13…v0.5.0).
[2] ignite.contrib.metrics — documentation (deprecated since 0.5.0).


🏁 Script executed:

#!/bin/bash
# Search for pytorch-ignite usage in the codebase, particularly ignite.contrib imports
rg "ignite\." --type py | head -30

Repository: Project-MONAI/MONAI

Length of output: 182


🌐 Web query:

pytorch-ignite 0.5.0 breaking changes ignite.contrib removal

💡 Result:

Short answer: In v0.5.0 the ignite.contrib package was removed as a location for maintained handlers/metrics — items from ignite.contrib.handlers and ignite.contrib.metrics were moved into the main packages (ignite.handlers, ignite.metrics, ignite.loggers, etc.). Update your imports accordingly.

What changed (practical migration):

  • ignite.contrib.handlers. → ignite.handlers.
  • ignite.contrib.metrics. → ignite.metrics.
  • contrib loggers/other deprecated contrib utilities were moved/renamed under the main modules (check specific class names).
  • Some contrib things had been deprecated earlier; v0.5.0 finalised the move so old contrib imports will break.

References:

  • Release notes / changelog describing the move. [1]
  • PyTorch‑Ignite docs (contrib pages show deprecation and relocation to main namespaces). [2][3]

If you want, tell me one or two specific contrib imports you use and I’ll give the exact new import lines and any small API differences.

Sources
[1] PyTorch‑Ignite v0.5.0 release notes / changelog.
[2] PyTorch‑Ignite documentation — contrib handlers (deprecation/move notes).
[3] PyTorch‑Ignite releases page.


Unpinning pytorch-ignite introduces breaking changes—verify MONAI's ignite imports before merging.

pytorch-ignite v0.5.0 removed the ignite.contrib package entirely. Handlers and metrics previously at ignite.contrib.handlers.* and ignite.contrib.metrics.* were relocated to ignite.handlers.* and ignite.metrics.*. Any code importing from ignite.contrib will break with versions ≥0.5.0. Confirm that MONAI does not use ignite.contrib imports, or update those imports to the new top-level module locations before unpinning.

🤖 Prompt for AI Agents
In requirements-dev.txt around line 3, unpinning pytorch-ignite can break code
using the removed ignite.contrib package; search the repository for any imports
of ignite.contrib and either (A) update those import paths to the new top-level
locations (e.g., move ignite.contrib.handlers.* -> ignite.handlers.* and
ignite.contrib.metrics.* -> ignite.metrics.*) and run unit/integration tests, or
(B) if you cannot update code now, pin pytorch-ignite to a pre-0.5.0 version in
requirements-dev.txt (e.g., pytorch-ignite<0.5.0) and add a TODO to migrate
imports later; ensure tests pass after the chosen change.

optuna
git+https://github.com/Project-MONAI/MetricsReloaded@monai-support#egg=MetricsReloaded
onnx>=1.13.0
onnxruntime
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

onnxruntime Python 3.11 3.12 compatibility 2025

💡 Result:

Short answer: Use Python 3.11 for reliable ONNX Runtime support; Python 3.12 support in 2025 is partial and platform-dependent (not consistently available via official pip wheels).

Details & evidence:

  • ONNX Runtime added official Python 3.11 package support (v1.15.0 release notes). (github.com)
  • Python 3.12 has been a frequent feature request and issues exist reporting "No matching distribution" (i.e., missing pip wheels) for some platforms. (github.com)
  • Community/packaging channels (conda-forge) have produced Python 3.12 builds for some ONNX Runtime versions (example: onnxruntime 1.22.0 conda packages for py312 uploaded in 2025), showing partial availability outside official pip wheels. (anaconda.org)
  • ONNX Runtime releases are quarterly and support/platform coverage can change between releases — check the project roadmap and the specific release notes or GitHub releases for the version you plan to install. (onnxruntime.ai)

Recommendation: For production or wide-platform compatibility in Dec 2025 use Python 3.11. If you must use Python 3.12, confirm availability for your OS/arch by checking the ONNX Runtime GitHub releases or package indexes (pip wheels list and conda-forge) for the exact version you plan to install. (github.com)

Citations:


Unpinned onnxruntime may fail on Python 3.12 due to missing platform-specific wheels.

Removing the python_version condition creates an installation risk: Python 3.11 has official support (v1.15.0+), but Python 3.12 wheels are not consistently available via official pip repositories. "No matching distribution" errors are reported on various platforms for Python 3.12. Community builds via conda-forge provide partial coverage. Consider either:

  • Adding a python_version constraint to restrict onnxruntime to Python 3.11, or
  • Pinning to a specific onnxruntime version with confirmed Python 3.12 availability for your target platforms, or
  • Accepting potential installation failures on Python 3.12 and documenting the workaround (conda-forge alternative).
🤖 Prompt for AI Agents
In requirements-dev.txt at line 55, the unpinned onnxruntime entry risks "no
matching distribution" errors on Python 3.12; update the dependency to either
pin a specific onnxruntime version that has verified wheels for your target
platforms (e.g., set to a known-good version) or reintroduce a python_version
constraint to restrict installation to Python 3.11, or add a comment documenting
that Python 3.12 users must install via conda-forge; implement one of these
options and ensure the chosen approach is clearly documented in the file.

@ericspod
Copy link
Member Author

ericspod commented Dec 8, 2025

Hi @KumoLiu I think we should push this one through to solve some of the issues we're seeing with actions running out of space. Some other PRs are stuck with failing checks but I think the changes I have here will fix them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant