Skip to content

Redesign. Dataloaders instead of datasets. Adapters instead of postprocessing and implicit behavior. Containers and our own validation instead of pydantic. Refactor runner and config. #142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Jan 31, 2025

Conversation

ibro45
Copy link
Collaborator

@ibro45 ibro45 commented Dec 20, 2024

Description

  • No more datasets, now there's simply dataloaders. Keys like sampler or batch_size removed from root system
  • Adapters TODO: describe when complete
  • Remove postprocessing, adapters can do that now instead
  • Remove pydantic
  • Enums, containers, and manual validation of config. Instantiate only the relevant parts of config for the run (e.g. lighter fit won't instantiate test or predict dataloaders, etc. parts of the config.

Related Issue

#3 - using _disabled_ has issues, it is better to filter out the unused content from config after resolving it and before instantiating it
#60
#129

Type of Change

  • 📚 Examples / docs / tutorials / dependencies update
  • 🔧 Bug fix (non-breaking change which fixes an issue)
  • 🥂 Improvement (non-breaking change which improves an existing feature)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to change)
  • 🔐 Security fix

Checklist

  • I've read the CODE_OF_CONDUCT.md document.
  • I've read the CONTRIBUTING.md guide.
  • I've updated the code style using make codestyle.
  • I've written tests for all new methods and classes that I created.
  • I've written the docstring in Google format for all the methods and classes that I used.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added configuration management with enhanced schema validation.
    • Introduced new adapters for data processing and transformation.
    • Added new utility types and enumerations for improved type handling.
    • Enhanced CIFAR-10 project configuration for training processes.
    • Comprehensive suite of unit tests for new and existing features.
  • Improvements

    • Simplified class and method naming conventions, including renaming LighterSystem to System.
    • Streamlined system and runner configurations.
    • Enhanced logging and error reporting mechanisms.
    • Improved test coverage and robustness for various components.
  • Breaking Changes

    • Renamed LighterSystem to System.
    • Removed LighterFreezer, LighterFileWriter, and LighterTableWriter.
    • Updated configuration file structures and import paths.
  • Documentation

    • Updated documentation to reflect new class and method names.
    • Improved configuration and project setup guides.
    • Clarified contribution guidelines with updated package management instructions.
  • Bug Fixes

    • Improved error handling in configuration and data processing.
    • Fixed potential issues with module imports and type conversions.
  • Chores

    • Updated CI workflow caching mechanism.
    • Modified pre-commit configuration to use uv instead of poetry.
    • Updated contribution guidelines to reflect new package management instructions.

Copy link
Contributor

coderabbitai bot commented Dec 20, 2024

Walkthrough

This pull request introduces modifications across various files, primarily focusing on enhancing the .gitignore file and renaming components within the Lighter framework. Key changes include updating ignored file patterns in .gitignore, replacing references to LighterSystem with System in documentation and code, and implementing new configuration management features. The changes also involve renaming callback classes and updating related tests to reflect these modifications.

Changes

File/Directory Change Summary
.gitignore Enhanced file ignore patterns, added .code-workspace, .scale_batch_size*, and **/*.txt
README.md, docs/* Replaced LighterSystem references with System
lighter/system.py Significant refactoring of system initialization and data handling
lighter/engine/ New configuration management with Config, Resolver, and schema definitions
lighter/utils/types/ Added new type-related modules with enums, containers, and data classes
lighter/callbacks/ Renamed callbacks (e.g., LighterFreezerFreezer)
tests/ Updated tests to reflect class and method name changes

Possibly related PRs

Suggested labels

size:L

Poem

🐰 A Lighter Hop Through Code's Domain
Farewell, LighterSystem, too verbose and plain,
System now dances with elegance and might,
Refactored paths now shimmer bright!
Callbacks renamed, configurations clean,
A rabbit's touch makes the framework serene! 🌟

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Collaborator

@surajpaib surajpaib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Maybe good to generate some tests for this too?

@ibro45
Copy link
Collaborator Author

ibro45 commented Dec 20, 2024

Absolutely, this was just a quick implementation without even running it, wanted to see what you think about this idea

…ntiation of config, add enums, removed pydantic. Comment out postprocessing in system until it is refactored.
@ibro45
Copy link
Collaborator Author

ibro45 commented Dec 23, 2024

I realized that it was not a good idea, additional complexity to explain to the user. Resorted to simply defining dataloaders (look at the cifar10 YAML).

Refactoring Lighter as a whole in this PR - WIP.

@ibro45 ibro45 changed the title Allow full DataLoader configuration Redesign. Dataloader instead of dataset. Adapters instead of postprocessing and implicit behavior. Containers and our own validation instead of pydantic. Refactor runner and config. Jan 17, 2025
@ibro45 ibro45 changed the title Redesign. Dataloader instead of dataset. Adapters instead of postprocessing and implicit behavior. Containers and our own validation instead of pydantic. Refactor runner and config. Redesign. Dataloaders instead of datasets. Adapters instead of postprocessing and implicit behavior. Containers and our own validation instead of pydantic. Refactor runner and config. Jan 17, 2025
…. Add back the support for multiple config files.
…lity, return all data from the _step no matter what mode. Add loss, metric, step, and epoch to Data enum. Switch str, Enum to StrEnum. Tuner.scale_batch_size() does not work since we switched to dataloaders in config - it needs read/write access to the batch size, figuring out the fix.
…all mode specific hooks in a `_setup_mode_hooks()`.
@surajpaib surajpaib marked this pull request as ready for review January 24, 2025 21:57
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jan 24, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (22)
lighter/engine/config.py (2)

16-49: Consider verifying overrides for nonexistent keys, and add more thorough docstrings.
The constructor appropriately loads and parses configurations, then applies overrides. However, it might be beneficial to log or handle cases where overridden keys are not present in the underlying config structure to avoid user confusion. Additional docstrings or debug logs could clarify which keys are being overridden.


62-86: Recursively formatting errors is helpful but could be expensive for large schemas.
The approach is fine for moderate configurations. If your config or schema grows, consider short-circuiting after a threshold or summarizing repeated patterns to avoid overly verbose error messages.

lighter/utils/types/containers.py (2)

33-48: Metrics dataclass ensures consistent usage of MetricCollection.
Be mindful of edge cases if the user passes a single metric that is incompatible with MetricCollection. Consider adding a small safety check or helpful error message for that scenario.


58-94: Train/Val/Test/Predict dataclasses cleanly organize mode-specific adapters.
This division is intuitive and simple for maintainers. As usage grows, keep an eye on potential duplication across modes—common logic might be extracted if it grows large.

lighter/engine/runner.py (2)

19-25: Runner constructor is minimal but functional.
Deferring initialization until run is called seems logical. Consider logging or partial validation in __init__ if you expect future expansions, especially around concurrency or advanced scenario handling.


48-72: _setup_stage organizes stage-specific config and system/trainer initialization.
Extracting the project module path, system, and trainer from the config is a nice separation of concerns. Keep an eye on error handling for nonexistent or invalid project paths.

lighter/system.py (5)

2-5: Update docstring references to match new dataloader approach
The docstring still mentions "datasets" even though the code now uses dataloaders. Let's maintain consistency by updating the description accordingly.

-This module defines the System class, which encapsulates the components of a deep learning system,
-including the model, optimizer, datasets, and more. It extends PyTorch Lightning's LightningModule.
+This module defines the System class, which encapsulates the components of a deep learning system,
+including the model, optimizer, dataloaders, and more. It extends PyTorch Lightning's LightningModule.

38-39: Add documentation for the adapters parameter
There is a TODO for adapters, but no explanation is provided in the docstring. Consider adding a brief description for clarity on what adapters are and how they’re used.

        ...
        adapters: dict[str, Callable] | None = None,
        ...
+        """
+        adapters: A dictionary of adapters that define how batches, losses, and metrics
+        are prepared for different modes (train, val, test, predict). Each entry corresponds
+        to a mode and implements custom logic tailored for that mode.
+        """

172-184: Handle missing batch size in _log method
The logic assumes that self.dataloaders.<mode>.batch_size is always defined. If a dataloader is custom without a defined batch_size, this may break. Consider providing a fallback.

 def _log(self, name: str, value: Any, on_step: bool = False, on_epoch: bool = False) -> None:
     batch_size = getattr(self.dataloaders, self.mode).batch_size
-    self.log(name, value, logger=True, batch_size=batch_size, on_step=on_step, on_epoch=on_epoch, sync_dist=on_epoch)
+    effective_batch_size = batch_size if batch_size is not None else 1
+    self.log(name, value, logger=True, batch_size=effective_batch_size, on_step=on_step, on_epoch=on_epoch, sync_dist=on_epoch)

227-248: Consider standardizing or streamlining the dynamic hook assignments
Your _setup_mode_hooks approach dynamically binds training_step, validation_step, etc. While flexible, it can be less discoverable than a conventional override approach. Evaluate if the dynamic approach aligns best with your maintainability goals or if a more explicit pattern suits your codebase better.


Line range hint 256-268: Support multiple optimizer parameter groups for learning rate
Currently, setting or getting learning_rate raises an error if multiple parameter groups exist. For advanced users, you might consider returning or accepting a dictionary of learning rates instead, though this may complicate usage.

lighter/__init__.py (1)

12-16: Consider adding exports to __all__ or removing if unused
Static analysis tags these imports as unused. If they're intentionally exposed for convenience, add them to __all__; otherwise, remove them.

 __version__ = "0.0.3a11"

 __all__ = [
+    "Trainer",
+    "Config",
+    "Runner",
+    "System",
 ]
🧰 Tools
🪛 Ruff (0.8.2)

12-12: pytorch_lightning.Trainer imported but unused

(F401)


14-14: .engine.config.Config imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


15-15: .engine.runner.Runner imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


16-16: .system.System imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

lighter/utils/types/enums.py (1)

4-11: Consider simplifying the docstring
While StrEnum is self-explanatory, it would be helpful to include a concise example of usage in the docstring to guide new contributors on how to define or use these enumerations.

tests/unit/test_callbacks_utils.py (1)

Line range hint 5-22: Commendable single-image reshape test
This test effectively validates reshaping of a single 3D image. As a good practice, consider adding checks for edge cases, such as an empty tensor or unexpected dimensions, to harden your test coverage.

🧰 Tools
🪛 Ruff (0.8.2)

1-1: pytest imported but unused

Remove unused import: pytest

(F401)

lighter/engine/resolver.py (3)

5-8: Docstring clarity
The class docstring concisely outlines the purpose of Resolver. Consider explicitly mentioning how this resolver interacts with adapters or dataloaders for a more comprehensive overview.


10-17: Stage-to-mode mapping
This mapping design is straightforward. If you add or rename stages or modes, check for possible mismatches to prevent silent errors. Implementing a quick validation test or assertion in the constructor might reduce debugging time.


22-25: ValueError check
Raising a ValueError for an invalid stage is good. Consider customizing the error message further by appending the actual input stage for clearer debugging.

tests/unit/test_callbacks_writer_table.py (1)

9-10: Remove unused import.
Static analysis flags System as unused. Consider removing it to clean up imports:

- from lighter.system import System
🧰 Tools
🪛 Ruff (0.8.2)

10-10: lighter.system.System imported but unused

Remove unused import: lighter.system.System

(F401)

lighter/engine/schema.py (2)

1-3: Introductory fields _meta_ and _requires_.
Moving away from Pydantic to a dict scheme might reduce overhead but be mindful of thorough validation logic in subsequent code.


17-17: trainer schema.
A generic dict type is fine, but consider more validation if needed (e.g., max epochs type check).

projects/cifar10/experiments/example.yaml (1)

3-15: Consider enabling GPU acceleration.

The trainer is configured to use CPU (accelerator: cpu). For better performance, consider enabling GPU acceleration by uncommenting the devices and strategy configurations.

lighter/callbacks/__init__.py (1)

1-3: Add __all__ declaration for exported symbols.

The static analysis indicates that these imports are unused. Since these are likely meant to be exported symbols, they should be declared in __all__.

Add this at the beginning of the file:

+__all__ = ["Freezer", "FileWriter", "TableWriter"]
+
 from .freezer import Freezer
 from .writer.file import FileWriter
 from .writer.table import TableWriter
🧰 Tools
🪛 Ruff (0.8.2)

1-1: .freezer.Freezer imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .writer.file.FileWriter imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .writer.table.TableWriter imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45035ed and 06a5eb6.

📒 Files selected for processing (43)
  • .gitignore (1 hunks)
  • README.md (1 hunks)
  • docs/advanced/callbacks.md (1 hunks)
  • docs/basics/config.md (10 hunks)
  • docs/basics/projects.md (2 hunks)
  • docs/basics/quickstart.md (4 hunks)
  • docs/index.md (2 hunks)
  • lighter/__init__.py (1 hunks)
  • lighter/adapters.py (1 hunks)
  • lighter/callbacks/__init__.py (1 hunks)
  • lighter/callbacks/freezer.py (4 hunks)
  • lighter/callbacks/utils.py (0 hunks)
  • lighter/callbacks/writer/base.py (4 hunks)
  • lighter/callbacks/writer/file.py (3 hunks)
  • lighter/callbacks/writer/table.py (4 hunks)
  • lighter/engine/config.py (1 hunks)
  • lighter/engine/resolver.py (1 hunks)
  • lighter/engine/runner.py (1 hunks)
  • lighter/engine/schema.py (1 hunks)
  • lighter/system.py (2 hunks)
  • lighter/utils/collate.py (0 hunks)
  • lighter/utils/dynamic_imports.py (1 hunks)
  • lighter/utils/logging.py (2 hunks)
  • lighter/utils/misc.py (1 hunks)
  • lighter/utils/model.py (1 hunks)
  • lighter/utils/patches.py (0 hunks)
  • lighter/utils/types/containers.py (1 hunks)
  • lighter/utils/types/enums.py (1 hunks)
  • projects/cifar10/experiments/example.yaml (1 hunks)
  • projects/cifar10/experiments/example_overrides.yaml (1 hunks)
  • projects/cifar10/experiments/monai_bundle_prototype.yaml (0 hunks)
  • pyproject.toml (1 hunks)
  • tests/integration/test_cifar.py (1 hunks)
  • tests/unit/test_callbacks_freezer.py (8 hunks)
  • tests/unit/test_callbacks_utils.py (1 hunks)
  • tests/unit/test_callbacks_writer_base.py (5 hunks)
  • tests/unit/test_callbacks_writer_file.py (13 hunks)
  • tests/unit/test_callbacks_writer_table.py (6 hunks)
  • tests/unit/test_engine_runner.py (9 hunks)
  • tests/unit/test_engine_schema.py (0 hunks)
  • tests/unit/test_system.py (13 hunks)
  • tests/unit/test_utils_data.py (1 hunks)
  • tests/unit/test_utils_misc.py (1 hunks)
💤 Files with no reviewable changes (5)
  • tests/unit/test_engine_schema.py
  • lighter/utils/collate.py
  • lighter/callbacks/utils.py
  • lighter/utils/patches.py
  • projects/cifar10/experiments/monai_bundle_prototype.yaml
✅ Files skipped from review due to trivial changes (5)
  • docs/advanced/callbacks.md
  • lighter/utils/dynamic_imports.py
  • lighter/utils/model.py
  • tests/unit/test_utils_data.py
  • tests/unit/test_callbacks_freezer.py
🧰 Additional context used
🪛 Ruff (0.8.2)
lighter/__init__.py

12-12: pytorch_lightning.Trainer imported but unused

(F401)


14-14: .engine.config.Config imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


15-15: .engine.runner.Runner imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


16-16: .system.System imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

tests/unit/test_callbacks_writer_table.py

10-10: lighter.system.System imported but unused

Remove unused import: lighter.system.System

(F401)

lighter/adapters.py

240-243: Combine if branches using logical or operator

Combine if branches

(SIM114)

lighter/callbacks/__init__.py

1-1: .freezer.Freezer imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


2-2: .writer.file.FileWriter imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)


3-3: .writer.table.TableWriter imported but unused; consider removing, adding to __all__, or using a redundant alias

(F401)

🪛 Markdownlint (0.37.0)
README.md

46-46: Expected: 0; Actual: 1
Unordered list indentation

(MD007, ul-indent)

docs/basics/config.md

84-84: Expected: fenced; Actual: indented
Code block style

(MD046, code-block-style)

🔇 Additional comments (121)
lighter/utils/misc.py (2)

50-55: Renaming _callable to fn for improved clarity.
No issues spotted. This rename makes the parameter name more descriptive.


61-61: Retaining the same internal logic.
Even though the parameter name changed, the function’s logic and returned result remain untouched.

tests/unit/test_utils_misc.py (1)

7-7: Removing apply_fns from the import impacts dependent code.
With the function gone from lighter.utils.misc, ensure that no other code references apply_fns.

As a follow-up, here is a quick script to check for references across the entire codebase:

✅ Verification successful

No remaining references to apply_fns found in the codebase
The removal of apply_fns from lighter.utils.misc is safe as there are no remaining references to this function in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Searching all references to `apply_fns` in the codebase.
rg -A 3 'apply_fns'

Length of output: 19


Script:

#!/bin/bash
# Search for any remaining references to apply_fns through various patterns
echo "=== Searching direct imports ==="
rg "from lighter.utils.misc import.*apply_fns"

echo -e "\n=== Searching module-level access ==="
rg "misc\.apply_fns"

echo -e "\n=== Searching in all Python files for any mention of apply_fns ==="
fd -e py -x grep -l "apply_fns" {} \;

echo -e "\n=== Checking AST for function calls ==="
ast-grep --pattern 'apply_fns($_)'

Length of output: 559

lighter/engine/config.py (3)

3-5: Imports look appropriate for configuration parsing and validation.
No immediate issues found with these imports. Integration with cerberus and ConfigParser is clear and well-contained.


9-14: Custom exception class is well-defined and meaningful.
Raising a ConfigurationException when validation fails is an effective approach to surface configuration errors in a user-friendly manner.


51-60: Clarify usage between get and get_parsed_content.
The distinction between raw vs. parsed content is sensible, but it could be confusing for maintainers or new contributors. Consider adding docstring examples or references to possible use cases.

lighter/utils/types/containers.py (3)

10-31: nested decorator is a clever way to instantiate nested dataclasses automatically.
This preserves readability in user-facing code and keeps your nested structures consistent. No issues noted.


50-57: DataLoaders placeholders are clear, but test coverage is recommended.
Given their central role, it’s crucial to confirm that these placeholders work as expected for different or atypical loading processes.


95-103: Adapters class effectively unifies all modes under a nested configuration.
The composition approach is straightforward. No major concerns here; keep maintaining type hints and thorough testing to ensure correctness.

lighter/engine/runner.py (3)

26-39: run method handles seeding, config parsing, and stage resolution in a linear flow.
This design is clear. Double-check that calling run multiple times re-initializes properly, especially regarding seeds or global states, in case you ever need repeated runs in the same process.


40-47: _run_stage elegantly delegates to Tuner or Trainer.
The approach is straightforward. Just ensure robust error handling if stage names deviate from the known list.


81-111: cli function offers a convenient entry point for multiple commands.
Mapping fire.Fire subcommands to internal methods is clean. No further comments; just ensure your docstrings align with each subcommand for clarity.

lighter/adapters.py (3)

6-42: Well-structured transforms application
The _TransformsAdapter class cleanly handles multiple functions and applies them in sequence. No concerns regarding functionality or clarity.


153-157: Proper validation of transform usage
Good use of ValueError to prevent misconfiguration (e.g., transform supplied but no corresponding argument). This is an appropriate safeguard, ensuring consistent usage.


240-243: Static analysis recommendation is likely not beneficial
A tool suggests combining these branches with or, but each branch handles a distinct type of accessor (index vs. key). Merging them could reduce clarity.

🧰 Tools
🪛 Ruff (0.8.2)

240-243: Combine if branches using logical or operator

Combine if branches

(SIM114)

tests/integration/test_cifar.py (1)

16-16: Confirm the new config file path is accurate
It appears the configuration file reference has changed to "./projects/cifar10/experiments/example.yaml" for all three method calls (fit, test, and predict). Please ensure that:

  1. The new file path exists and is spelled correctly.
  2. The file contains all necessary configuration sections for each operation.
  3. The tests accurately reflect the project’s current settings.

Also applies to: 21-21, 26-26

lighter/utils/types/enums.py (4)

1-2: Consider backward compatibility or third-party usage
The direct reference to Python’s built-in Enum is fine. If your project is used by third-party code that relies on a different approach to enumerations, consider documenting possible constraints or usage patterns.


14-23: Validate data labeling usage
Enums like IDENTIFIER, INPUT, TARGET, etc., clarify key data pipeline elements. Ensure these string constants are used consistently across the codebase (e.g., in custom adapters, dataloaders, or writer classes) to avoid “magic strings.”


25-32: Stages enumeration seems comprehensive
The enumeration for possible pipeline stages is well-structured. Double-check usage in your new resolver or adapters to ensure that changing or omitting any custom stage won't break downstream logic.


34-38: Confirm Mode alignment with Stage
Mode enumerations and their usage appear consistent with the Stage definitions. Verify that any future expansions (for instance, an auxiliary mode for additional transformations) remain properly mapped in the resolver.

tests/unit/test_callbacks_utils.py (2)

4-4: Ensure consistent imports across the codebase
The new import for preprocess_image looks good. Confirm that other modules referencing preprocess_image have also been updated to reflect its location in lighter.callbacks.utils.


Line range hint 23-42: Comprehensive batch-3D reshape test
Testing a batch of 3D images is a great way to ensure correctness at scale. If the code allows for varied batch sizes, consider parameterizing the test to ensure consistent reshaping across different batch dimensions.

🧰 Tools
🪛 Ruff (0.8.2)

1-1: pytest imported but unused

Remove unused import: pytest

(F401)

lighter/engine/resolver.py (2)

1-2: Imports are aligned with the new architecture
Using Config from lighter.engine.config and the new enums from lighter.utils.types.enums reflects the revised structure. No issues found.


19-21: Initializer
The constructor neatly stores the Config. You might add optional type checking or ensure that config is not None to avoid runtime errors in the subsequent method calls.

lighter/callbacks/writer/table.py (6)

2-2: Docstring addition looks good.
This top-level docstring concisely describes the module’s purpose and is consistent with the new naming conventions.


14-15: Imports align with the refactoring from LighterSystem to System.
These imports are consistent with the new structure and class names in the refactor.


18-18: Renaming class to TableWriter is consistent with the new BaseWriter usage.
The inheritance from BaseWriter aligns well with the overall refactoring strategy.


37-45: Param rename from id to identifier improves clarity.
Renaming the parameter to identifier and storing it in csv_records clarifies the record’s purpose.


47-53: Updated reference to System is consistent with class renaming.
Switching the pl_module parameter type from LighterSystem to System maintains consistency across the refactor.


66-69: Sorting and setting index by identifier might cause collisions if duplicates exist.
While the logic appears correct for unique identifiers, having multiple rows with the same identifier will overwrite or merge indices in the DataFrame. Consider verifying uniqueness if that’s a requirement.

lighter/utils/logging.py (2)

1-2: Top-level docstring provides clear overview of the logging module.
This helps maintainable documentation and quickly orients new contributors to the logging utilities.


80-81: Switch to importlib.import_module aids clarity and maintainability.
This is a preferable replacement to direct __import__ calls and keeps the code more consistent.

tests/unit/test_callbacks_writer_base.py (6)

8-8: Direct import of BaseWriter aligns with the new class naming convention.
This import change is consistent with the project-wide refactoring.


22-24: MockWriter derivation from BaseWriter is a clear test approach.
Adhering to BaseWriter ensures tests remain aligned with the abstract writer interface.


40-46: Parameter rename to identifier in the write method matches production code changes.
Maintaining consistent naming in test mocks reduces confusion and prevents mismatches during integration.


69-69: Testing instantiation constraints of BaseWriter.
Raising TypeError on direct BaseWriter instantiation is the correct behavior for an abstract base class.


95-95: Initializing outputs with identifier key clarifies usage expectations.
This ensures test alignment with the refactored writer logic.


101-101: Verifying identifier assignment in on_predict_batch_end.
Asserting the predicted IDs remain tracked is a correct approach to ensure the callback logic.

lighter/callbacks/writer/file.py (6)

2-2: Docstring clearly states FileWriter’s purpose.
This aids discoverability and understanding of its file-based writing approach.


16-16: Importing BaseWriter aligns with the refactored writer hierarchy.


20-20: FileWriter consistently inherits from BaseWriter.
Renaming from LighterFileWriter to FileWriter follows the new naming scheme.


51-57: Renaming id to identifier ensures param clarity.
It better communicates the purpose for file naming.


60-60: RuntimeError emphasizes FileWriter directory requirement.
This explicit error message helps users diagnose invalid paths quickly.


63-63: Constructing the file path from identifier is a logical approach.
Appending the parameter to the directory path helps produce a clear naming structure for outputs.

lighter/callbacks/writer/base.py (7)

17-18: Consider verifying usage vs. overhead of specialized imports.
These imports look correct and are actively used throughout this file. However, reconfirm that both System and Data, Stage provide genuine benefit here, and avoid extraneous imports if usage evolves.


21-21: Good renaming for clarity.
Renaming the class to BaseWriter aligns well with its purpose and the rename conventions seen across the codebase.


Line range hint 57-67: Method signature alignment looks good.
Switching from id to identifier is more descriptive and consistent with the docstring. The parameter’s usage for naming or record-keeping is clearly documented.


70-79: Practical stage check.
The early return for stage != Stage.PREDICT is sensible to bypass unnecessary setup steps in other phases. The docstring references pl_module: System correctly, matching the new naming.


106-113: Consistent callback method signature.
Using System in place of the older LighterSystem is coherent with the rename efforts, aligning the type hints with the actual code usage.


120-129: Global ID generation logic seems correct.
The approach ensures a unique global range in multi-process scenarios. Consider verifying that outputs[Data.PRED] always has the expected size to avoid potential edge cases with an empty batch.


132-133: Ensure lengths match between predictions and identifiers.
When zipping outputs[Data.PRED] and outputs[Data.IDENTIFIER], a mismatch would cause a runtime error. Confirm each batch’s identifier list precisely matches its predictions list.

tests/unit/test_callbacks_writer_table.py (11)

28-28: Clear test docstring.
“Test proper initialization of TableWriter” is concise and informative. No issues found.


34-34: TableWriter instantiation is straightforward.
Good practice: verifying that the path is converted to a Path object.


40-40: Descriptive test docstring.
This clarifies the custom writer functionality test well.


47-49: Custom writer integration works.
Populating csv_records with the processed data is aligned with the writer’s purpose.


55-59: Multifunction docstring.
The multi-case coverage in these lines is thorough, ensuring the test scenario is well-documented.


71-76: Handled multiple record formats.
Creating an array of different identifier types is a robust approach to testing varied data.


79-82: Incremental assertion pattern.
Asserting intermediate results ensures swift detection of unexpected changes. The approach is well-structured.

Also applies to: 85-86


93-97: Identifier column cast logic.
Casting to string before re-checking each record is a reliable approach to match actual vs. expected data.


108-108: Well-labeled test scenario for multi-process usage.
The docstring clarifies distributed environment tests. This is a solid pattern to ensure data consistency under concurrency.


130-135: Collective record gathering simulation.
Using mocked rank data accurately tests multi-process record merging. This is beneficial for distributed setups.


159-164: Comprehensive data verification.
Casting identifiers and evaluating predictions ensures the final CSV is correct for every rank’s records.

lighter/callbacks/freezer.py (7)

2-2: Updated documentation refers to new class name.
Renaming to Freezer is consistent with the overall rename strategy. The docstring matches the new class name.


11-11: Import usage validated.
System is indeed used to differentiate logic in _set_model_requires_grad. Keep the import.


15-15: Class rename clarity.
Transition from LighterFreezer to Freezer improves naming uniformity.


60-66: Train batch start logic.
Switching pl_module to System is consistent with refactoring. The docstring and parameter usage match.


73-73: Consistent invocation for validation, test, predict.
Repeating _on_batch_start ensures a uniform freezing mechanism across all stages.

Also applies to: 78-78, 83-83


87-93: Encapsulated freeze/unfreeze logic.
Centrally handling the freeze logic in _on_batch_start allows consistent behavior. No issues found.


113-113: Sensible fallback to the underlying model.
Unwrapping pl_module: System into its .model field is consistent and ensures the correct parameters are frozen. Use caution for any advanced or nested model structures.

Also applies to: 121-122

tests/unit/test_callbacks_writer_file.py (14)

10-10: Imported writer references match new naming.
FileWriter reference is properly aligned with the rename. Import usage is valid.


14-17: Self-documenting tests.
Mentioning the exact checks for path, writer, and directory creation helps future maintainers quickly identify what’s tested.


27-27: FileWriter instantiation with “tensor”.
Straightforward approach for verifying correct writer function association.


35-35: Clear test rationale.
Ensures coverage of essential tensor writing behavior, including file existence and data integrity upon load.


48-50: Direct saving & re-loading.
This is a robust approach to confirm that the saved file can be correctly interpreted by torch later.


64-64: Comprehensive image test docstring.
Explicitly stating shape, format, and dimension checks ensures clarity for new contributors.


77-79: Saving as PNG.
Using "image" writer for an RGB tensor is consistent. The approach is verified by subsequent shape checks.


94-94: Video test coverage.
It’s good that multiple frames, dimension checks, and existence checks are included in the docstring.


106-108: Video writer usage.
Properly verifying file presence after writing. The test ensures the output is indeed an .mp4.


118-118: Grayscale video scenario.
Highlighting the single-channel input format is valuable for testing robust dimension handling.


131-134: Confirming grayscale to RGB conversion.
Ensures edge-case coverage. If future logic changes for single-channel data, this test will catch regressions.


144-144: ITK image docstring.
Providing explicit type constraints (MONAI MetaTensor) is especially helpful to new readers.


157-157: Distinguishing MONAI MetaTensor usage.
The test ensures that invalid formats raise an error, while valid MetaTensors are stored. This fosters clarity around custom data workflows.

Also applies to: 162-166


176-176: Error handling tests.
Verifying runtime errors when file paths are mistaken for directories or when directories don’t exist. Clear messages guide users effectively.

Also applies to: 189-191, 196-197

tests/unit/test_engine_runner.py (10)

7-7: Rename from LighterSystem to System looks good.
This import aligns correctly with the updated class name and references.


13-18: Docstring updated to reference the new class name.
The docstring and return statement now consistently refer to System, matching the rename.


74-74: Docstring parameter reference updated.
Ensure the mention of mock_system aligns with standard naming conventions. Currently, the naming is consistent with usage.


123-123: Docstring parameter reference updated for new System.
No issues noted, good consistency throughout the tests.


151-151: Docstring for mock_system updated.
All references to LighterSystem are removed.


184-184: Docstring clarifies ValueError condition for invalid System.
This helps maintain clarity for test expectations.


194-194: Exception message updated to reference System.
The raised ValueError message is now accurate and consistent with the new class name.


203-203: Docstring parameter reference for mock_system is now coherent.
No concerns here; it remains consistent with the rest of the file.


225-225: Docstring parameter reference for mock_system.
All naming changes appear correct.


253-253: Docstring parameter reference for mock_system.
No additional issues.

tests/unit/test_system.py (14)

12-12: Import of System in place of LighterSystem.
This import aligns with the project-wide rename effort.


56-56: Instantiating System with new naming and batch size.
This ensures consistency with the refactored usage. Looks good.


68-68: Refactoring to System with a dict-based loss.
The logic for using a dictionary-based criterion is consistent with the approach in other tests.


85-85: Refactoring to System with an invalid dict criterion.
The test properly handles the invalid scenario by raising a ValueError.


105-106: Additional check on identifier in the batch.
Now that identifier is a recognized field, this error-handling improves robustness.


181-181: Test for step and epoch injection in forward with System.
Ensures advanced usage of partial arguments in the forward pass. Nicely handled.


204-204: Minimal System instantiation.
Empty datasets dict triggers an error, which clarifies the need for at least a 'train' dataset.


221-221: Criterion using only predictions (System).
The test covers an edge case where target might be absent. Good coverage.


237-237: Batch type validation updated for System.
This enforces stronger checks on the input batch, aligning with the new approach.


258-258: Postprocessing dictionary for System.
The approach is consistent with the newly introduced adapters concept.


287-287: Multiple optimizer parameter groups error.
Clear error message prevents ambiguous learning rate usage.


298-298: No optimizer scenario for System.
The test logic ensures a ValueError clarifies the missing optimizer.


304-304: Refactored metric usage with System.
Integration with PyTorch Lightning metrics indicates consistent usage.


29-29: Changed key name to identifier in the dataset.
Be sure all references in code and tests are updated to identifier to avoid mismatches.

✅ Verification successful

All references to identifier are consistent across the codebase

The key name change has been properly implemented throughout the codebase. The key is defined as an enum Data.IDENTIFIER and is consistently used in all relevant files, including tests and callbacks. No remaining references to the old id key were found.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Verify that `id_` references are updated across the codebase
rg 'data\["id"\]' -A2 -B2
rg 'data\.get\("id' -A2 -B2

Length of output: 57


Script:

#!/bin/bash
# Search for both old and new key patterns with broader context
echo "=== Searching for 'id' key patterns ==="
rg -i "[\"\']id[\"\']" -A3 -B3

echo -e "\n=== Searching for 'identifier' key patterns ==="
rg -i "[\"\']identifier[\"\']" -A3 -B3

echo -e "\n=== Searching in Python files for dictionary access patterns ==="
rg "\.get\(['\"](?:id|identifier)['\"]|[\[\'](?:id|identifier)[\]\']" --type py -A3 -B3

Length of output: 873


Script:

#!/bin/bash
# Search for dictionary access patterns with simpler regex
echo "=== Searching for 'id' key in dictionary access ==="
rg '"id"' --type py -A2 -B2
rg "'id'" --type py -A2 -B2

echo -e "\n=== Searching for 'identifier' key in dictionary access ==="
rg '"identifier"' --type py -A2 -B2
rg "'identifier'" --type py -A2 -B2

echo -e "\n=== Searching for .get() patterns ==="
rg '\.get\(' --type py -A2 -B2

Length of output: 12375

projects/cifar10/experiments/example_overrides.yaml (1)

1-2: Added trainer and system overrides in YAML.
Specifying a single epoch and a new optimizer target for CIFAR-10 is straightforward. Ensure usage aligns with any updated config schema.

lighter/engine/schema.py (3)

4-5: project and vars fields.
No immediate concerns; ensure they match the actual usage in the code and documentation.


6-16: args schema definitions.
Splitting subcommands like fit, validate, test, etc., under args is a good approach for clarity.


18-53: system schema with nested fields.
This structure is significantly more flexible than Pydantic-based classes. Carefully confirm runtime validation to avoid silent misconfigurations.

✅ Verification successful

Schema validation is properly implemented and enforced

The configuration system uses Cerberus for strict runtime validation with proper error handling and detailed error messages. The implementation includes:

  • Mandatory schema validation during config initialization
  • Custom exception handling with formatted validation errors
  • Comprehensive real-world examples demonstrating correct usage
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Confirm references to the new dictionary schema usage.
rg 'SCHEMA' -A 10 -B 10

Length of output: 2413


Script:

#!/bin/bash
# Check Config class implementation and tests
ast-grep --pattern 'def validate' 
rg -g '*.py' 'def validate' -A 5 -B 5

# Look for test files
fd test_config.py

# Look for example configurations
fd -e yml -e yaml

Length of output: 923


Script:

#!/bin/bash
# Check test and example configurations
cat tests/configs/test1.yaml
cat projects/cifar10/experiments/example.yaml

# Look for Config class validation implementation
ast-grep --pattern 'class Config {
  $$$
  def __init__($$$) {
    $$$
  }
  $$$
}'

rg -g 'config.py' 'class Config' -A 50

Length of output: 7256

.gitignore (1)

153-153: LGTM! Improved ignore patterns.

The changes enhance the ignore patterns by:

  • Using **/.DS_Store to ignore .DS_Store files at any directory level
  • Adding .code-workspace to ignore VS Code workspace files
  • Adding .scale_batch_size* to ignore batch size scaling artifacts

Also applies to: 156-157

projects/cifar10/experiments/example.yaml (3)

31-51: LGTM! Comprehensive metrics configuration.

Excellent setup of metrics with consistent configuration across train, val, and test phases. The use of macro averaging and proper task specification for multiclass classification is appropriate.


96-104: LGTM! Clear adapter configuration.

The batch adapter configuration is well-structured with proper input and target accessors.


52-95: Verify memory requirements for the batch size.

The batch size of 512 with CIFAR10 images might require significant memory. Consider implementing a batch size finder or memory estimation.

pyproject.toml (1)

196-197: Verify the necessity of redefining 'input' built-in.

The addition of 'input' to allowed-redefined-builtins suggests that the code redefines the built-in input function. This could lead to confusion and potential issues.

✅ Verification successful

The redefinition of 'input' built-in is justified

The codebase consistently uses 'input' as a parameter name in data processing methods, following a clear pattern (input, target, pred) that enhances code readability. The redefinitions in adapters.py are part of data transformation logic where 'input' naturally describes the processed data.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for redefinitions of 'input'
echo "Searching for redefinitions of 'input'..."
rg -n "(?:^|\s+)input\s*=[^=]" --type py

echo "Searching for 'input' as parameter name..."
rg -n "def\s+\w+\s*\([^)]*\binput\b[^)]*\)" --type py

Length of output: 1415

docs/index.md (1)

31-31: LGTM! Consistent naming updates.

Documentation updates correctly reflect the rename from LighterSystem to System, maintaining consistency with the codebase changes.

Also applies to: 110-110

docs/basics/projects.md (1)

18-18: LGTM! Documentation alignment with new naming.

The documentation correctly reflects the system class renaming, maintaining consistency across the project.

Also applies to: 106-106

docs/basics/quickstart.md (1)

38-41: 🛠️ Refactor suggestion

Documentation needs update to reflect architectural changes.

The System component description needs to be updated to align with the PR's architectural changes, particularly regarding the replacement of datasets with dataloaders.

README.md (1)

46-46: LGTM!

The renaming from "LighterSystem" to "System" is consistent with the overall refactoring effort.

🧰 Tools
🪛 Markdownlint (0.37.0)

46-46: Expected: 0; Actual: 1
Unordered list indentation

(MD007, ul-indent)

docs/basics/config.md (1)

74-77: 🛠️ Refactor suggestion

Update System description to reflect new architecture.

The description needs to be updated to reflect the architectural changes mentioned in the PR objectives, particularly:

  1. The replacement of datasets with dataloaders
  2. The use of adapters instead of postprocessing
  3. The removal of Pydantic in favor of containers and manual validation

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
lighter/engine/schema.py (1)

24-24: Add validation for optimizer configuration.

The optimizer dictionary should validate common parameters like learning rate and weight decay.

-"optimizer": {"type": "dict"},
+"optimizer": {
+    "type": "dict",
+    "schema": {
+        "_target_": {"type": "string", "required": True},
+        "lr": {"type": "float", "min": 0, "required": True},
+        "weight_decay": {"type": "float", "min": 0},
+        "momentum": {"type": "float", "min": 0}
+    }
+},
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0252fd2 and bf93a05.

📒 Files selected for processing (2)
  • lighter/engine/config.py (1 hunks)
  • lighter/engine/schema.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • lighter/engine/config.py
🧰 Additional context used
📓 Learnings (1)
lighter/engine/schema.py (1)
Learnt from: ibro45
PR: project-lighter/lighter#142
File: lighter/engine/schema.py:0-0
Timestamp: 2025-01-27T02:17:12.626Z
Learning: The adapters schema in Lighter requires:
- batch and logging keys for all modes (train, val, test, predict)
- criterion key only for train and val modes
- metrics key for train, val, and test modes but not for predict mode
🔇 Additional comments (3)
lighter/engine/schema.py (3)

17-17: Add schema validation for trainer configuration.

The trainer configuration is marked as required but lacks schema validation for critical parameters.


35-43: Documentation needs updating to reflect dataloader configuration.


44-81: 🛠️ Refactor suggestion

Update adapters schema to enforce required fields.

Based on the retrieved learnings, the schema should enforce required fields for each mode:

  • batch and logging keys for all modes
  • criterion key only for train and val modes
  • metrics key for train, val, and test modes
 "adapters": {
     "type": "dict",
     "schema": {
         "train": {
             "type": "dict",
             "schema": {
-                "batch": {"type": "dict"},
-                "criterion": {"type": "dict"},
-                "metrics": {"type": "dict"},
-                "logging": {"type": "dict"}
+                "batch": {"type": "dict", "required": True},
+                "criterion": {"type": "dict", "required": True},
+                "metrics": {"type": "dict", "required": True},
+                "logging": {"type": "dict", "required": True}
             }
         },
         "val": {
             "type": "dict",
             "schema": {
-                "batch": {"type": "dict"},
-                "criterion": {"type": "dict"},
-                "metrics": {"type": "dict"},
-                "logging": {"type": "dict"}
+                "batch": {"type": "dict", "required": True},
+                "criterion": {"type": "dict", "required": True},
+                "metrics": {"type": "dict", "required": True},
+                "logging": {"type": "dict", "required": True}
             }
         },
         "test": {
             "type": "dict",
             "schema": {
-                "batch": {"type": "dict"},
-                "metrics": {"type": "dict"},
-                "logging": {"type": "dict"}
+                "batch": {"type": "dict", "required": True},
+                "metrics": {"type": "dict", "required": True},
+                "logging": {"type": "dict", "required": True}
             }
         },
         "predict": {
             "type": "dict",
             "schema": {
-                "batch": {"type": "dict"},
-                "logging": {"type": "dict"}
+                "batch": {"type": "dict", "required": True},
+                "logging": {"type": "dict", "required": True}
             }
         }
     }
 }

Likely invalid or redundant comment.

Comment on lines 6 to 16
"args": {
"type": "dict",
"schema": {
"fit": {"type": "dict"},
"validate": {"type": "dict"},
"test": {"type": "dict"},
"predict": {"type": "dict"},
"lr_find": {"type": "dict"},
"scale_batch_size": {"type": "dict"}
}
},
Copy link
Contributor

@coderabbitai coderabbitai bot Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add validation for args operation modes.

The args schema allows any dictionary for operation modes (fit, validate, test, etc.). Consider adding validation for common parameters:

  • Batch size limits
  • Number of devices/workers
  • Other mode-specific parameters
 "args": {
     "type": "dict",
     "schema": {
-        "fit": {"type": "dict"},
+        "fit": {
+            "type": "dict",
+            "schema": {
+                "batch_size": {"type": "integer", "min": 1},
+                "num_workers": {"type": "integer", "min": 0},
+                "accelerator": {"type": "string", "allowed": ["cpu", "gpu", "tpu"]},
+                "devices": {"type": ["integer", "list"], "min": 1}
+            }
+        },
         "validate": {"type": "dict"},
         "test": {"type": "dict"},
         "predict": {"type": "dict"},
         "lr_find": {"type": "dict"},
         "scale_batch_size": {"type": "dict"}
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"args": {
"type": "dict",
"schema": {
"fit": {"type": "dict"},
"validate": {"type": "dict"},
"test": {"type": "dict"},
"predict": {"type": "dict"},
"lr_find": {"type": "dict"},
"scale_batch_size": {"type": "dict"}
}
},
"args": {
"type": "dict",
"schema": {
"fit": {
"type": "dict",
"schema": {
"batch_size": {"type": "integer", "min": 1},
"num_workers": {"type": "integer", "min": 0},
"accelerator": {"type": "string", "allowed": ["cpu", "gpu", "tpu"]},
"devices": {"type": ["integer", "list"], "min": 1}
}
},
"validate": {"type": "dict"},
"test": {"type": "dict"},
"predict": {"type": "dict"},
"lr_find": {"type": "dict"},
"scale_batch_size": {"type": "dict"}
}
},

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibro45
I think over time, if we populate this schema, this would allow users to know the config system possibilities at a glance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@surajpaib
Copy link
Collaborator

surajpaib commented Jan 29, 2025

Pre-commit hooks, e.g. checks the style when you try to commit. We had that but with uv merge something went wrong for me, not sure if it's just me

I'm not sure about it. I haven't used it and I don't see it in the repo either. Maybe you'll need to look into the Makefile changes

Correction: I do see it now. Its based on poetry that's why it doesn't work. I vote to remove this unless you think its useful

@ibro45
Copy link
Collaborator Author

ibro45 commented Jan 30, 2025

  • I will start writing the tests now on a separate branch. I'll merge it into this one when done.
  • Thanks for the fixes and improvements!
  • I liked the pre-commit where it would apply the style, but it's not necessary really. .pre-commit-config.yaml
  • Docs are being updated on a different branch

Btw, there are poetry remnants throughout some files
image

coderabbitai[bot]

This comment was marked as resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (6)
.pre-commit-config.yaml (1)

Line range hint 1-26: Consider reinstating the pyupgrade hook.

The removal of the pyupgrade hook might impact Python version compatibility checks. Consider keeping it to ensure code stays compatible with the supported Python versions.

Add this configuration to reinstate the pyupgrade hook:

 - repo: local
   hooks:
+    - id: pyupgrade
+      name: pyupgrade
+      entry: uvx run pyupgrade --py310-plus
+      types: [python]
+      language: system
CONTRIBUTING.md (3)

38-38: Add missing comma after introductory phrase.

Add a comma after "After installation" for better readability.

-After installation you may execute code formatting:
+After installation, you may execute code formatting:
🧰 Tools
🪛 LanguageTool

[uncategorized] ~38-~38: A comma is probably missing here.
Context: ...commit install ## Codestyle After installation you may execute code formatting: ba...

(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)


63-75: Improve the submission checklist formatting.

The submission steps should use infinitive form for consistency.

 Before submitting your code:

-1. Add any changes you want
-2. Add tests for the new changes  
-3. Edit documentation if you have changed something significant
-4. Format all your code
+1. To add your changes
+2. To add tests for the new changes  
+3. To edit documentation for significant changes
+4. To format all your code
 ```bash
 make codestyle

-5. Run all checks:
+5. To run all checks:

make lint

<details>
<summary>🧰 Tools</summary>

<details>
<summary>🪛 LanguageTool</summary>

[grammar] ~65-~65: The verb “Add” needs to be in the to-infinitive form.
Context: ...ting your code:  1. Add any changes you want 2. Add tests for the new changes   3. Edit doc...

(MISSING_TO_BEFORE_A_VERB)

</details>

</details>

---

`92-92`: **Consider using requirements file for documentation dependencies.**

Instead of installing documentation dependencies directly, consider using a `docs/requirements.txt` file for better maintainability.

1. Create `docs/requirements.txt`:

mkdocs-material
mkdocs-autorefs
mkdocstrings
mkdocs-gen-files
mkdocs-literate-nav
mkdocs-section-index


2. Update the installation command:
```diff
-uv pip install mkdocs-material mkdocs-autorefs mkdocstrings mkdocs-gen-files mkdocs-literate-nav mkdocs-section-index
+uv pip install -r docs/requirements.txt
README.md (2)

46-46: Fix list indentation.

The unordered list item should not be indented.

- - [PyTorch Lightning](https://github.com/Lightning-AI/lightning) - Our [`System`](https://project-lighter.github.io/lighter/reference/system/) is based on the PyTorch Lightning [`LightningModule`](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html) and implements all the necessary training logic for you. Couple it with the PyTorch Lightning [Trainer](https://lightning.ai/docs/pytorch/stable/common/trainer.html) and you're good to go.
+ - [PyTorch Lightning](https://github.com/Lightning-AI/lightning) - Our [`System`](https://project-lighter.github.io/lighter/reference/system/) is based on the PyTorch Lightning [`LightningModule`](https://lightning.ai/docs/pytorch/stable/common/lightning_module.html) and implements all the necessary training logic for you. Couple it with the PyTorch Lightning [Trainer](https://lightning.ai/docs/pytorch/stable/common/trainer.html) and you're good to go.
🧰 Tools
🪛 markdownlint-cli2 (0.17.2)

46-46: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)


71-71: Add a comment about uv installation.

Since the installation now uses uv, it would be helpful to add a comment about installing uv first.

make setup
-make install             # Install lighter via uv
+make install             # Install lighter via uv (requires uv to be installed first)
make pre-commit-install  # Set up the pre-commit hook for code formatting
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 39820ab and e80feb7.

📒 Files selected for processing (4)
  • .github/workflows/ci.yml (1 hunks)
  • .pre-commit-config.yaml (1 hunks)
  • CONTRIBUTING.md (2 hunks)
  • README.md (2 hunks)
🧰 Additional context used
🪛 LanguageTool
CONTRIBUTING.md

[uncategorized] ~38-~38: A comma is probably missing here.
Context: ...commit install ## Codestyle After installation you may execute code formatting: ba...

(MISSING_COMMA_AFTER_INTRODUCTORY_PHRASE)


[grammar] ~65-~65: The verb “Add” needs to be in the to-infinitive form.
Context: ...ting your code: 1. Add any changes you want 2. Add tests for the new changes 3. Edit doc...

(MISSING_TO_BEFORE_A_VERB)

🪛 markdownlint-cli2 (0.17.2)
README.md

46-46: Unordered list indentation
Expected: 0; Actual: 1

(MD007, ul-indent)

🔇 Additional comments (2)
.pre-commit-config.yaml (1)

15-15: Verify the availability of uvx in the development environment.

The switch from poetry run to uvx run requires that uvx is properly installed and available in the development environment.

Run this script to check if the development environment is properly configured:

Also applies to: 23-23

✅ Verification successful

uvx setup is properly documented in CONTRIBUTING.md

The switch from poetry run to uvx run is valid as the project has clear setup instructions in CONTRIBUTING.md that guide developers to:

  1. Create a virtual environment with uv venv
  2. Sync dependencies with uv sync
  3. Install pre-commit hooks with uvx pre-commit install
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify uvx installation and configuration

# Check if uvx is available
command -v uvx >/dev/null 2>&1 || { echo "uvx not found. Please install it first."; exit 1; }

# Check uvx version
uvx --version

# List installed packages to verify isort and black are available
uv pip list | grep -E "isort|black"

Length of output: 114


Script:

#!/bin/bash
# Search for uvx-related configuration and documentation
echo "=== Searching for uvx in documentation and configuration files ==="
rg -i "uvx|uv " -g "!*.pyc" -g "!*.git*" -g "!*.ipynb" -g "README*" -g "*.md" -g "*.toml" -g "*.yaml" -g "*.yml" -A 3

echo -e "\n=== Checking for uv/uvx in requirements files ==="
fd -e txt -e in requirements | xargs cat

echo -e "\n=== Checking CI configurations ==="
fd -e yml -e yaml -g ".github/workflows/*" | xargs cat

Length of output: 1709

.github/workflows/ci.yml (1)

32-32: Verify the presence and generation of uv.lock file.

The cache key now depends on uv.lock instead of poetry.lock. Ensure that:

  1. uv.lock is being properly generated during dependency installation
  2. The file is committed to the repository

Run this script to verify the lock file:

✅ Verification successful

Lock file verification successful

The uv.lock file exists and is properly tracked in the repository, making it suitable for use in the cache key.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check if uv.lock exists and is properly tracked

# Check if uv.lock exists
if [ ! -f "uv.lock" ]; then
    echo "uv.lock not found"
    exit 1
fi

# Check if uv.lock is tracked by git
if ! git ls-files --error-unmatch uv.lock >/dev/null 2>&1; then
    echo "uv.lock is not tracked by git"
    exit 1
fi

# Show last modification date
stat uv.lock

Length of output: 174


Script:

#!/bin/bash
# Search for uv.lock file and check git status
echo "=== Searching for uv.lock ==="
fd uv.lock

echo -e "\n=== Checking git tracked files ==="
git ls-files | grep -i "uv.lock"

echo -e "\n=== Checking pyproject.toml for uv configuration ==="
if [ -f "pyproject.toml" ]; then
    cat pyproject.toml
fi

echo -e "\n=== Checking .gitignore for uv.lock ==="
if [ -f ".gitignore" ]; then
    cat .gitignore | grep -i "uv.lock"
fi

Length of output: 4629

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (19)
tests/integration/test_overrides.yaml (1)

4-8: LGTM! Consider DRY-ing up common values.

The new structure provides better organization and flexibility by allowing different configurations for train and val dataloaders. Since both dataloaders currently share the same values, you could consider extracting these common settings to reduce duplication.

Here's a suggestion to make it DRY-er:

-system#dataloaders#train#batch_size: 16
-system#dataloaders#train#num_workers: 2
-
-system#dataloaders#val#batch_size: 16
-system#dataloaders#val#num_workers: 2
+system#dataloaders#defaults:
+  batch_size: 16
+  num_workers: 2
+system#dataloaders#train:
+  <<: *defaults
+system#dataloaders#val:
+  <<: *defaults
tests/integration/test_cifar.py (1)

27-36: Update docstring to reflect Stage enum usage.

The implementation looks good, but the docstring should be updated to reflect that stage is now a Stage enum rather than a string.

Apply this diff to fix the docstring:

     """
     Test the specified stage using the given configuration.
     Args:
-        stage: The stage to run (e.g., "fit", "test", "predict").
+        stage: The stage to run (e.g., Stage.FIT, Stage.TEST, Stage.PREDICT).
         config: Path to the configuration file.
     """
tests/unit/test_engine_config.py (3)

8-20: Consider adding edge cases to the initialization test.

While the basic initialization test is good, consider adding edge cases such as:

  • Empty dictionary
  • Nested empty dictionaries
  • Dictionary with None values

28-67: Add test cases for partial validation.

The validation test covers complete validation well, but consider adding test cases for:

  • Partial validation of specific sections
  • Validation of optional fields

158-196: Enhance error message validation.

The error formatting tests could be more comprehensive. Consider adding test cases for:

  • Deeply nested validation errors (>3 levels)
  • Multiple errors at the same path
  • Unicode characters in error messages
tests/unit/test_engine_resolver.py (2)

10-38: Enhance test configuration fixture.

The test configuration could be more realistic by:

  • Including actual model class paths instead of string placeholders
  • Adding more complex nested configurations
  • Including all possible configuration options

47-56: Add edge cases for resolver initialization.

Consider adding test cases for:

  • Empty configuration
  • Configuration with missing optional sections
  • Configuration with only required fields
tests/unit/test_utils_misc.py (1)

143-181: Enhance optimizer stats test coverage.

The test for optimizers with betas is good but could be improved by:

  1. Testing with other optimizers that use betas (e.g., AdamW, Adamax)
  2. Adding edge cases:
    • Zero learning rate
    • Default betas
    • Custom beta ranges
  3. Testing error handling for invalid beta values

Example test cases to add:

def test_get_optimizer_stats_with_invalid_betas():
    model = torch.nn.Linear(10, 1)
    # Test with invalid beta range
    with pytest.raises(ValueError):
        optimizer = Adam(model.parameters(), lr=0.001, betas=(1.1, 0.999))
        get_optimizer_stats(optimizer)

def test_get_optimizer_stats_with_different_optimizers():
    model = torch.nn.Linear(10, 1)
    # Test AdamW
    optimizer = torch.optim.AdamW(model.parameters(), lr=0.001, betas=(0.9, 0.999))
    stats = get_optimizer_stats(optimizer)
    assert stats["optimizer/AdamW/momentum"] == 0.9
tests/unit/test_adapters.py (2)

37-37: Replace lambda functions with regular functions.

Lambda functions are used for transforms in multiple test cases. While they work, using regular functions would improve readability and follow Python's style guidelines.

Apply this diff to replace the lambda functions with regular functions:

-    transform = lambda x: x + 1
+    def transform(x):
+        return x + 1

-    transform1 = lambda x: x * 2
+    def transform1(x):
+        return x * 2

-    transform2 = lambda x: x - 1
+    def transform2(x):
+        return x - 1

Also applies to: 47-47, 48-48, 279-279, 289-289, 290-290

🧰 Tools
🪛 Ruff (0.8.2)

37-37: Do not assign a lambda expression, use a def

Rewrite transform as a def

(E731)


161-167: Add assertions for callable access test.

The test_callable_access function could benefit from additional assertions to verify the behavior of the callable accessors.

Add assertions to verify that the callable accessors are invoked correctly:

     adapter = BatchAdapter(input_accessor=lambda b: b["data"], target_accessor=lambda b: b["label"])
     batch = {"data": create_tensor(), "label": create_tensor()}
     input, target, identifier = adapter(batch)
     assert input.equal(batch["data"])
     assert target.equal(batch["label"])
     assert identifier is None
+    # Add assertions to verify callable behavior
+    assert isinstance(input, torch.Tensor), "Input accessor should return a Tensor"
+    assert isinstance(target, torch.Tensor), "Target accessor should return a Tensor"
tests/unit/test_utils_model.py (2)

363-378: Enhance test assertions for perfect match case.

The test for perfect match could benefit from additional assertions to verify that all weights are loaded correctly.

Add assertions to verify the weights after loading:

     # Load state dict with perfect match
     adjust_prefix_and_load_state_dict(dummy_model, perfect_match_state_dict_file)
-    # Note: We can't directly test the logger output, but the function should complete without raising exceptions
+    # Load the state dict to compare weights
+    state_dict = torch.load(perfect_match_state_dict_file)
+    
+    # Verify all weights are loaded correctly
+    assert torch.equal(dummy_model.layer1.weight, state_dict["layer1.weight"])
+    assert torch.equal(dummy_model.layer1.bias, state_dict["layer1.bias"])
+    assert torch.equal(dummy_model.layer2.weight, state_dict["layer2.weight"])
+    assert torch.equal(dummy_model.layer2.bias, state_dict["layer2.bias"])

380-400: Clarify test behavior for no matching prefix.

The test for no matching prefix could be more explicit about the expected behavior when no keys match the prefix mapping.

Add comments and assertions to clarify the behavior:

     # Load state dict with prefix mapping that won't match any keys
     ckpt_to_model_prefix = {"nonexistent_prefix": ""}
+    # Store original weights for comparison
+    original_state_dict = {
+        "layer1.weight": dummy_model.layer1.weight.clone(),
+        "layer1.bias": dummy_model.layer1.bias.clone(),
+        "layer2.weight": dummy_model.layer2.weight.clone(),
+        "layer2.bias": dummy_model.layer2.bias.clone(),
+    }
+
     adjust_prefix_and_load_state_dict(dummy_model, path, ckpt_to_model_prefix=ckpt_to_model_prefix)

-    # Verify weights match the original state dict
+    # Verify weights are updated from the original state dict, not preserved
     assert torch.equal(dummy_model.layer1.weight, state_dict["layer1.weight"])
     assert torch.equal(dummy_model.layer1.bias, state_dict["layer1.bias"])
+    assert not torch.equal(dummy_model.layer1.weight, original_state_dict["layer1.weight"])
+    assert not torch.equal(dummy_model.layer1.bias, original_state_dict["layer1.bias"])
tests/unit/test_utils_logging.py (2)

28-28: Add stacklevel to warning.

The warning is raised without a stacklevel, which could make it harder to trace the source of the warning.

Add stacklevel to the warning:

-        warnings.warn("Test warning")
+        warnings.warn("Test warning", stacklevel=2)
🧰 Tools
🪛 Ruff (0.8.2)

28-28: No explicit stacklevel keyword argument found

(B028)


29-31: Verify warning message content.

The test verifies that the warning was captured but doesn't check the content of the warning message.

Add assertions to verify the warning message:

         # Verify the warning was captured by loguru
         mock_logger.opt.assert_called_with(depth=2)
-        mock_opt.warning.assert_called_once()
+        mock_opt.warning.assert_called_once_with("Test warning")
tests/unit/test_callbacks_writer_base.py (2)

226-228: Combine nested with statements.

The nested with statements could be combined into a single statement for better readability.

Combine the with statements:

-    with patch.object(Path, "exists", return_value=False), patch("torch.distributed.get_rank", return_value=1):
-        with pytest.raises(RuntimeError, match="Rank 1 does not share storage with rank 0"):
-            writer.setup(trainer, pl_module, stage="predict")
+    with (
+        patch.object(Path, "exists", return_value=False),
+        patch("torch.distributed.get_rank", return_value=1),
+        pytest.raises(RuntimeError, match="Rank 1 does not share storage with rank 0")
+    ):
+        writer.setup(trainer, pl_module, stage="predict")
🧰 Tools
🪛 Ruff (0.8.2)

226-227: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)


181-204: Enhance test assertions for existing path case.

The test for existing path could benefit from additional assertions to verify the warning behavior.

Add assertions to verify warning behavior:

     with patch.object(Path, "exists", return_value=True):
         writer.setup(trainer, pl_module, stage="predict")
+        # Verify warning messages
         assert any("already exists" in msg for msg in warning_messages)
         assert any("existing predictions will be overwritten" in msg for msg in warning_messages)
+        # Verify warning count
+        assert len([msg for msg in warning_messages if "already exists" in msg]) == 1
+        assert len([msg for msg in warning_messages if "existing predictions will be overwritten" in msg]) == 1
+        # Verify writer is still set up correctly
+        assert writer._pred_counter == 0
+        assert writer.path == target_path
tests/unit/test_callbacks_writer_table.py (1)

10-10: Remove unused import.

The System import is not used in this file.

-from lighter.system import System
🧰 Tools
🪛 Ruff (0.8.2)

10-10: lighter.system.System imported but unused

Remove unused import: lighter.system.System

(F401)

tests/unit/test_engine_runner.py (1)

7-7: Remove unused import.

The Tuner import from pytorch_lightning.tuner is not used in this file.

-from pytorch_lightning.tuner import Tuner
🧰 Tools
🪛 Ruff (0.8.2)

7-7: pytorch_lightning.tuner.Tuner imported but unused

Remove unused import: pytorch_lightning.tuner.Tuner

(F401)

tests/unit/test_system.py (1)

3-3: Remove unused import.

The patch import from unittest.mock is not used in this file.

-from unittest.mock import patch
🧰 Tools
🪛 Ruff (0.8.2)

3-3: unittest.mock.patch imported but unused

Remove unused import: unittest.mock.patch

(F401)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e80feb7 and d0f7399.

📒 Files selected for processing (17)
  • pyproject.toml (2 hunks)
  • tests/integration/test_cifar.py (1 hunks)
  • tests/integration/test_overrides.yaml (1 hunks)
  • tests/unit/test_adapters.py (1 hunks)
  • tests/unit/test_callbacks_freezer.py (8 hunks)
  • tests/unit/test_callbacks_writer_base.py (7 hunks)
  • tests/unit/test_callbacks_writer_file.py (8 hunks)
  • tests/unit/test_callbacks_writer_table.py (7 hunks)
  • tests/unit/test_engine_config.py (1 hunks)
  • tests/unit/test_engine_resolver.py (1 hunks)
  • tests/unit/test_engine_runner.py (1 hunks)
  • tests/unit/test_system.py (1 hunks)
  • tests/unit/test_utils_logging.py (1 hunks)
  • tests/unit/test_utils_misc.py (2 hunks)
  • tests/unit/test_utils_model.py (2 hunks)
  • tests/unit/test_utils_patches.py (0 hunks)
  • tests/unit/test_utils_types_containers.py (1 hunks)
💤 Files with no reviewable changes (1)
  • tests/unit/test_utils_patches.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • pyproject.toml
  • tests/unit/test_callbacks_writer_file.py
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/unit/test_utils_logging.py

28-28: No explicit stacklevel keyword argument found

(B028)

tests/unit/test_callbacks_writer_base.py

226-227: Use a single with statement with multiple contexts instead of nested with statements

(SIM117)

tests/unit/test_engine_runner.py

7-7: pytorch_lightning.tuner.Tuner imported but unused

Remove unused import: pytorch_lightning.tuner.Tuner

(F401)

tests/unit/test_callbacks_writer_table.py

10-10: lighter.system.System imported but unused

Remove unused import: lighter.system.System

(F401)

tests/unit/test_adapters.py

37-37: Do not assign a lambda expression, use a def

Rewrite transform as a def

(E731)


47-47: Do not assign a lambda expression, use a def

Rewrite transform1 as a def

(E731)


48-48: Do not assign a lambda expression, use a def

Rewrite transform2 as a def

(E731)


279-279: Do not assign a lambda expression, use a def

Rewrite transform as a def

(E731)


289-289: Do not assign a lambda expression, use a def

Rewrite transform1 as a def

(E731)


290-290: Do not assign a lambda expression, use a def

Rewrite transform2 as a def

(E731)

tests/unit/test_system.py

3-3: unittest.mock.patch imported but unused

Remove unused import: unittest.mock.patch

(F401)

tests/unit/test_utils_types_containers.py

1-1: lighter.utils.types.containers.nested imported but unused

Remove unused import: lighter.utils.types.containers.nested

(F401)

🪛 GitHub Actions: Lighter CI
tests/unit/test_engine_runner.py

[medium] 268-275: [B102:exec_used] Use of exec detected. This could potentially allow arbitrary code execution.


[medium] 280-287: [B102:exec_used] Use of exec detected. This could potentially allow arbitrary code execution.

🔇 Additional comments (6)
tests/integration/test_cifar.py (2)

4-4: LGTM! Enhanced type safety with Runner class and Stage enum.

The change from importing run to using Runner class and Stage enum improves type safety and follows better object-oriented principles.


10-24: LGTM! Improved test parameterization with Stage enum.

The changes enhance type safety by using Stage enum instead of strings, making the tests more maintainable and less prone to errors. The updated configuration path aligns with the PR's refactoring goals.

tests/unit/test_engine_config.py (1)

1-7: LGTM! Well-structured test file organization.

The imports and module docstring are clear and appropriate.

tests/unit/test_engine_runner.py (1)

268-275: Consider safer alternatives to exec() for testing main guard.

The use of exec() for testing the __main__ guard pattern could potentially allow arbitrary code execution. While this is a test file and the code being executed is controlled, consider using a safer alternative like mocking the module's __name__ attribute directly.

Here's a safer alternative approach:

def test_main_guard():
    """Test the __main__ guard."""
    with patch("lighter.engine.runner.cli") as mock_cli:
        # Test when module is main
        with patch.dict('__main__.__dict__', {'__name__': '__main__'}):
            from lighter.engine.runner import __name__ as module_name
            if module_name == '__main__':
                cli()
        mock_cli.assert_called_once()

        # Test when module is not main
        mock_cli.reset_mock()
        with patch.dict('__main__.__dict__', {'__name__': 'not_main'}):
            from lighter.engine.runner import __name__ as module_name
            if module_name == '__main__':
                cli()
        mock_cli.assert_not_called()

Also applies to: 280-287

🧰 Tools
🪛 GitHub Actions: Lighter CI

[medium] 268-275: [B102:exec_used] Use of exec detected. This could potentially allow arbitrary code execution.

tests/unit/test_callbacks_freezer.py (1)

Line range hint 1-570: Well-structured and comprehensive test coverage!

The test suite thoroughly covers various aspects of the Freezer callback, including:

  • Initialization with different parameter combinations
  • Basic functionality for freezing/unfreezing layers
  • Edge cases with step and epoch limits
  • Exception handling
  • Complex scenarios with multiple layer combinations
tests/unit/test_system.py (1)

1-613: Excellent test coverage aligning with PR objectives!

The test suite comprehensively covers the System class functionality and aligns well with the PR's objectives of:

  • Testing dataloader integration
  • Verifying adapter functionality
  • Validating the new container-based configuration approach
🧰 Tools
🪛 Ruff (0.8.2)

3-3: unittest.mock.patch imported but unused

Remove unused import: unittest.mock.patch

(F401)

Comment on lines +58 to +198
# Should remove optimizer and scheduler but keep criterion
assert "optimizer" not in system_config
assert "scheduler" not in system_config
assert "criterion" in system_config

# Should keep only validate args
assert list(config.get("args").keys()) == ["validate"]


def test_test_stage_config(resolver):
"""Test configuration for test stage."""
config = resolver.get_stage_config(Stage.TEST)
system_config = config.get("system")

# Should keep only test dataloader
assert "test" in system_config["dataloaders"]
assert "train" not in system_config["dataloaders"]
assert "val" not in system_config["dataloaders"]
assert "predict" not in system_config["dataloaders"]

# Should keep only test metrics
assert "test" in system_config["metrics"]
assert "train" not in system_config["metrics"]
assert "val" not in system_config["metrics"]

# Should remove optimizer, scheduler, and criterion
assert "optimizer" not in system_config
assert "scheduler" not in system_config
assert "criterion" not in system_config

# Should keep only test args
assert list(config.get("args").keys()) == ["test"]


def test_predict_stage_config(resolver):
"""Test configuration for predict stage."""
config = resolver.get_stage_config(Stage.PREDICT)
system_config = config.get("system")

# Should keep only predict dataloader
assert "predict" in system_config["dataloaders"]
assert "train" not in system_config["dataloaders"]
assert "val" not in system_config["dataloaders"]
assert "test" not in system_config["dataloaders"]

# Should remove all metrics
assert "metrics" in system_config
assert not system_config["metrics"]

# Should remove optimizer, scheduler, and criterion
assert "optimizer" not in system_config
assert "scheduler" not in system_config
assert "criterion" not in system_config

# Should keep only predict args
assert list(config.get("args").keys()) == ["predict"]


def test_lr_find_stage_config(resolver):
"""Test configuration for lr_find stage."""
config = resolver.get_stage_config(Stage.LR_FIND)
system_config = config.get("system")

# Should keep train and val dataloaders
assert "train" in system_config["dataloaders"]
assert "val" in system_config["dataloaders"]
assert "test" not in system_config["dataloaders"]
assert "predict" not in system_config["dataloaders"]

# Should keep train and val metrics
assert "train" in system_config["metrics"]
assert "val" in system_config["metrics"]
assert "test" not in system_config["metrics"]

# Should keep optimizer, scheduler, and criterion
assert "optimizer" in system_config
assert "scheduler" in system_config
assert "criterion" in system_config


def test_scale_batch_size_stage_config(resolver):
"""Test configuration for scale_batch_size stage."""
config = resolver.get_stage_config(Stage.SCALE_BATCH_SIZE)
system_config = config.get("system")

# Should keep train and val dataloaders
assert "train" in system_config["dataloaders"]
assert "val" in system_config["dataloaders"]
assert "test" not in system_config["dataloaders"]
assert "predict" not in system_config["dataloaders"]

# Should keep train and val metrics
assert "train" in system_config["metrics"]
assert "val" in system_config["metrics"]
assert "test" not in system_config["metrics"]

# Should keep optimizer, scheduler, and criterion
assert "optimizer" in system_config
assert "scheduler" in system_config
assert "criterion" in system_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Reduce test duplication using parametrization.

The stage-specific tests follow similar patterns. Consider using pytest's parametrize to reduce code duplication:

@pytest.mark.parametrize("stage,expected_config", [
    (Stage.FIT, {
        "dataloaders": ["train", "val"],
        "metrics": ["train", "val"],
        "has_optimizer": True,
        "has_scheduler": True,
        "has_criterion": True,
        "args": ["fit"]
    }),
    # Add other stages...
])
def test_stage_config(resolver, stage, expected_config):
    config = resolver.get_stage_config(stage)
    system_config = config.get("system")
    
    for loader in expected_config["dataloaders"]:
        assert loader in system_config["dataloaders"]
    # Add other assertions...

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
tests/unit/test_engine_runner.py (3)

35-51: LGTM! Well-structured mock configuration.

The mock configuration fixture provides a clear example of the expected configuration structure.

Consider adding a docstring that explains the structure and purpose of each configuration key to help other developers understand the expected format.


106-116: Consider adding more configuration validation test cases.

While the tests cover invalid system and trainer types well, consider adding test cases for:

  • Invalid configuration structure
  • Missing required fields
  • Type validation for configuration values

This aligns with the PR's goal of moving from Pydantic to custom validation.

Also applies to: 119-129


227-261: LGTM! Well-structured CLI command tests.

The tests effectively verify the CLI interface and command handling.

Consider adding test cases for:

  • CLI command help text
  • Command argument validation
  • Error handling for invalid configuration files

This would improve the user experience when using the CLI.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d0f7399 and 6abba8a.

📒 Files selected for processing (1)
  • tests/unit/test_engine_runner.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
tests/unit/test_engine_runner.py

7-7: pytorch_lightning.tuner.Tuner imported but unused

Remove unused import: pytorch_lightning.tuner.Tuner

(F401)

🔇 Additional comments (2)
tests/unit/test_engine_runner.py (2)

60-66: LGTM! Comprehensive initialization tests.

The tests thoroughly verify that the Runner initializes with correct default values.


170-224: LGTM! Comprehensive stage handling tests.

The tests effectively cover:

  • All normal stages (fit, validate, test, predict)
  • Tuner stages (lr_find, scale_batch_size)
  • Invalid stage error handling

This commit adds test coverage for:
1. Optional module imports with error handling
2. Nested dataclass initialization and behavior
3. Metrics and training stage dataclass default factories
@ibro45
Copy link
Collaborator Author

ibro45 commented Jan 31, 2025

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Jan 31, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (10)
tests/unit/test_utils_dynamic_imports.py (3)

118-130: Enhance docstring to match existing documentation style.

The test function would benefit from a more detailed docstring that follows the established pattern in the file.

Apply this diff to improve the docstring:

 def test_optional_import_success():
     """
-    Tests that an available module is imported successfully
-    and is stored in the 'imports' dictionary.
+    Test successful import and storage of an available module.
+
+    This test verifies that:
+    1. A built-in module (sys) can be imported successfully
+    2. The imported module is identical to the one from direct import
+    3. The module is properly stored in the OPTIONAL_IMPORTS.imports dictionary
+
+    Setup:
+        - Uses the built-in 'sys' module as a guaranteed available import
     """

133-145: Enhance docstring to match existing documentation style.

The test function would benefit from a more detailed docstring that follows the established pattern in the file.

Apply this diff to improve the docstring:

 def test_optional_import_already_imported():
     """
-    Tests that requesting the same module a second time
-    does not re-import it but just returns the stored instance.
+    Test caching behavior of optional imports.
+
+    This test verifies that:
+    1. The first import request properly imports the module
+    2. Subsequent requests return the cached instance
+    3. The same object reference is maintained across requests
+
+    Setup:
+        - Uses the built-in 'sys' module for consistent availability
     """

148-158: Enhance docstring to match existing documentation style.

The test function would benefit from a more detailed docstring that follows the established pattern in the file.

Apply this diff to improve the docstring:

 def test_optional_import_failure():
     """
-    Tests that attempting to import a non-existent module
-    raises an ImportError.
+    Test error handling for non-existent optional imports.
+
+    This test verifies that attempting to import a non-existent module:
+    1. Raises an ImportError
+    2. Includes the module name in the error message
+
+    Raises:
+        ImportError: Expected to be raised when importing non-existent module
     """
tests/unit/test_system.py (2)

3-3: Remove unused import.

The patch import from unittest.mock is not used in the code.

-from unittest.mock import MagicMock, patch
+from unittest.mock import MagicMock
🧰 Tools
🪛 Ruff (0.8.2)

3-3: unittest.mock.patch imported but unused

Remove unused import: unittest.mock.patch

(F401)


97-613: Consider adding more edge cases to strengthen test coverage.

While the existing tests are well-implemented and provide good coverage, consider adding the following test cases to make the test suite more robust:

  1. Test with empty dataloaders
  2. Test with invalid batch sizes
  3. Test with malformed data structures
  4. Test boundary conditions for learning rates
  5. Test error cases for metrics computation

Would you like me to help generate these additional test cases?

tests/integration/test_cifar.py (1)

28-34: Update docstring to match the actual parameter type.

The docstring describes the stage parameter as a string (e.g., "fit", "test", "predict"), but it's actually a Stage enum.

-        stage: The stage to run (e.g., "fit", "test", "predict").
+        stage: The stage to run (e.g., Stage.FIT, Stage.TEST, Stage.PREDICT).
tests/unit/test_utils_types_containers.py (3)

1-1: Remove unused import ‘field’.

dataclasses.field is imported but never used in this file.

Consider removing it with the following diff:

-from dataclasses import dataclass, field, is_dataclass
+from dataclasses import dataclass, is_dataclass
🧰 Tools
🪛 Ruff (0.8.2)

1-1: dataclasses.field imported but unused

Remove unused import: dataclasses.field

(F401)


4-4: Remove unused import ‘Metric’.

torchmetrics.Metric is imported but never used in this file.

Consider removing it with the following diff:

-from torchmetrics import Accuracy, Metric, MetricCollection
+from torchmetrics import Accuracy, MetricCollection
🧰 Tools
🪛 Ruff (0.8.2)

4-4: torchmetrics.Metric imported but unused

Remove unused import: torchmetrics.Metric

(F401)


77-138: Consolidate repeated test patterns for default factories.

The tests for Train, Val, Test, and Predict each verify identical default factories. Consider extracting common verification logic into a shared helper function to minimize duplication and promote maintainability.

lighter/engine/schema.py (1)

1-17: Add documentation for schema structure.

Consider adding docstrings or comments to explain:

  • Purpose and usage of _meta_ and _requires_ fields
  • Expected structure of vars dictionary
  • Common parameters for each operation mode in args
+# Configuration schema for Lighter project
+# _meta_: Metadata about the configuration (version, description, etc.)
+# _requires_: Dependencies or required configurations
+# project: Project name or identifier
+# vars: Variables that can be referenced throughout the configuration
+# args: Operation-specific arguments (fit, validate, test, etc.)
+# trainer: Training-specific configuration (required)
 SCHEMA = {
     "_meta_": {"type": "dict"},
     "_requires_": {"type": ["string", "list", "dict"]},
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6abba8a and bd25862.

📒 Files selected for processing (6)
  • lighter/engine/schema.py (1 hunks)
  • tests/integration/test_cifar.py (1 hunks)
  • tests/unit/test_callbacks_freezer.py (8 hunks)
  • tests/unit/test_system.py (1 hunks)
  • tests/unit/test_utils_dynamic_imports.py (1 hunks)
  • tests/unit/test_utils_types_containers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/unit/test_callbacks_freezer.py
🧰 Additional context used
📓 Learnings (1)
lighter/engine/schema.py (2)
Learnt from: ibro45
PR: project-lighter/lighter#142
File: lighter/engine/schema.py:44-81
Timestamp: 2025-01-30T19:31:55.980Z
Learning: Adapter fields in the configuration schema should not be marked as required, allowing for optional configuration of batch, criterion, metrics, and logging adapters across different modes.
Learnt from: ibro45
PR: project-lighter/lighter#142
File: lighter/engine/schema.py:0-0
Timestamp: 2025-01-27T02:17:12.626Z
Learning: The adapters schema in Lighter requires:
- batch and logging keys for all modes (train, val, test, predict)
- criterion key only for train and val modes
- metrics key for train, val, and test modes but not for predict mode
🪛 Ruff (0.8.2)
tests/unit/test_utils_types_containers.py

1-1: dataclasses.field imported but unused

Remove unused import: dataclasses.field

(F401)


4-4: torchmetrics.Metric imported but unused

Remove unused import: torchmetrics.Metric

(F401)

tests/unit/test_system.py

3-3: unittest.mock.patch imported but unused

Remove unused import: unittest.mock.patch

(F401)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
🔇 Additional comments (11)
tests/unit/test_system.py (2)

17-33: LGTM! Well-structured test classes.

The test classes (DummyDataset, SimpleModel, and DictLossNoTotal) are well-implemented and serve their testing purposes effectively.

Also applies to: 41-49, 260-267


52-94: LGTM! Well-designed test fixtures.

The fixtures dummy_dataloaders and simple_system are well-documented, properly set up, and provide reusable test components following pytest best practices.

tests/integration/test_cifar.py (3)

5-5: LGTM! Good use of type-safe enums.

The change from string literals to the Stage enum improves type safety and IDE support.


35-37: LGTM! Clean implementation with good error handling.

The test implementation is clear and provides helpful feedback on failure.


11-25: Verify the new configuration file path.

The test now uses example.yaml instead of monai_bundle_prototype.yaml. Please ensure this file exists and contains the necessary configuration.

✅ Verification successful

Configuration file verification successful

The new configuration file example.yaml exists and contains all necessary components for training, testing, and prediction stages. No references to the old configuration file remain in the codebase.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and content of the new configuration file

# Check if the file exists
if [ -f "./projects/cifar10/experiments/example.yaml" ]; then
    echo "File exists. Checking content..."
    cat "./projects/cifar10/experiments/example.yaml"
else
    echo "ERROR: Configuration file not found!"
    exit 1
fi

# Check if there are any remaining references to the old file
rg "monai_bundle_prototype.yaml"

Length of output: 3256

tests/unit/test_utils_types_containers.py (2)

10-53: Great job adding tests for the nested decorator.

This addresses the previous request to implement tests for the nested decorator. The coverage is thorough and verifies initialization, default values, and error handling.


55-66: Comprehensive validation of metric conversion.

It’s good that you verify single metrics are automatically converted to MetricCollection and remain None when appropriate in Metrics. This ensures proper handling of metric objects throughout the code.

lighter/engine/schema.py (4)

27-34: LGTM!

The metrics schema correctly:

  • Supports both list and dict configurations
  • Includes train, val, and test modes
  • Excludes predict mode as metrics aren't needed for predictions

35-43: Reference the documentation task for dataloaders.

The dataloader schema is correctly defined but needs documentation. A separate task exists to update docs/basics/config.md and docs/basics/workflows.md with dataloader configuration examples.


44-74: LGTM!

The adapters schema correctly implements the requirements:

  • All adapter fields are optional as specified
  • Mode-specific adapter keys are correctly defined:
    • train/val: batch, criterion, metrics, logging
    • test: batch, metrics, logging
    • predict: batch, logging

18-26: Verify system target class existence.

The _target_ field is required but there's no validation of the target class existence. This could lead to runtime errors if the specified class doesn't exist.

This commit includes:
- Adding a comprehensive docstring to the schema module explaining its purpose and structure
- Minor import cleanup in test files
- Updating stage parameter documentation in test_cifar
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (11)
tests/unit/test_utils_types_containers.py (4)

24-53: Consider adding more edge cases to the nested decorator tests.

While the current test coverage is good, consider adding tests for:

  • Deeply nested structures (>2 levels)
  • Multiple nested fields in a single class
  • Optional nested fields (using Optional[Type])

Example structure:

@nested
@dataclass
class DeepInner:
    value: int

@nested
@dataclass
class MultipleInner:
    deep1: DeepInner
    deep2: Optional[DeepInner]
    value: str

@nested
@dataclass
class MultipleOuter:
    inner1: MultipleInner
    inner2: Optional[MultipleInner]
    name: str = "default"

55-75: Enhance metrics conversion test coverage.

Consider adding tests for:

  • Multiple metrics in a collection
  • Invalid metric types

Example test cases:

def test_metrics_multiple_metrics():
    """Test conversion of multiple metrics to MetricCollection."""
    accuracy = Accuracy(task="binary")
    precision = Precision(task="binary")
    metrics_instance = Metrics(train=[accuracy, precision])
    
    assert isinstance(metrics_instance.train, MetricCollection)
    assert len(metrics_instance.train) == 2

def test_metrics_invalid_type():
    """Test that invalid metric types raise appropriate errors."""
    with pytest.raises(TypeError):
        Metrics(train="not a metric")

77-113: Reduce code duplication in Train and Val tests.

The test cases for Train and Val are nearly identical. Consider using a parametrized test to reduce duplication.

Example refactor:

@pytest.mark.parametrize("factory_class", [Train, Val])
def test_train_val_default_factory(factory_class):
    """Test that Train/Val dataclasses use correct default factories."""
    instance = factory_class()

    assert isinstance(instance.batch, BatchAdapter)
    assert instance.batch.input_accessor == 0
    assert instance.batch.target_accessor == 1

    assert isinstance(instance.criterion, CriterionAdapter)
    assert instance.criterion.pred_argument == 0
    assert instance.criterion.target_argument == 1

    assert isinstance(instance.metrics, MetricsAdapter)
    assert instance.metrics.pred_argument == 0
    assert instance.metrics.target_argument == 1

    assert isinstance(instance.logging, LoggingAdapter)

130-138: Enhance Predict adapter input_accessor testing.

The current test only verifies that the input_accessor is callable and handles a simple string. Consider testing more complex batch structures.

Example test cases to add:

def test_predict_batch_adapter_complex_inputs():
    """Test that Predict batch adapter handles various input types."""
    predict_instance = Predict()
    
    # Test with tuple
    assert predict_instance.batch.input_accessor((1, 2)) == (1, 2)
    
    # Test with dict
    batch_dict = {"input": "data", "meta": "info"}
    assert predict_instance.batch.input_accessor(batch_dict) == batch_dict
    
    # Test with custom object
    class CustomBatch:
        def __init__(self, data):
            self.data = data
    
    custom = CustomBatch("test")
    assert predict_instance.batch.input_accessor(custom) == custom
tests/integration/test_cifar.py (1)

28-37: Consider adding more specific assertions.

While the basic completion check is good, consider adding stage-specific assertions to verify the expected outcomes:

  • For FIT: Check if model weights were updated
  • For TEST: Verify metrics were computed
  • For PREDICT: Ensure predictions were generated

Example enhancement:

 def test_trainer_stage(stage: Stage, config: str):
     """
     Test the specified stage using the given configuration.
     Args:
         stage: The stage to run (e.g., Stage.FIT, Stage.TEST, Stage.PREDICT).
         config: Path to the configuration file.
     """
     runner = Runner()
     runner.run(stage, config=f"{config},{test_overrides}")
     assert runner.trainer.state.finished, f"Stage {stage} did not finish successfully."
+    if stage == Stage.FIT:
+        assert runner.trainer.state.model != None, "Model should be updated after training"
+    elif stage == Stage.TEST:
+        assert hasattr(runner.trainer.state, 'metrics'), "Test metrics should be computed"
+    elif stage == Stage.PREDICT:
+        assert hasattr(runner.trainer.state, 'predictions'), "Predictions should be generated"
lighter/engine/schema.py (2)

2-13: Consider enhancing documentation with examples.

The documentation clearly explains the schema structure. Consider adding example configurations for each section to improve user understanding.

 """
 Defines the schema for configuration validation in the Lighter framework.

 The schema ensures user configurations are correctly structured and typed. It includes:
 - `_meta_`: Metadata as a dictionary.
 - `_requires_`: Runs first, primarily to be used for imports.
 - `project`: Project name as a string.
 - `vars`: Variables as a dictionary.
 - `args`: Arguments to pass to Trainer stage methods like `fit`, `validate`, `test`.
 - `trainer`: Trainer setup.
 - `system`: System setup, encapsulates model, criterion, optimizer, scheduler, inferer, metrics, dataloaders, and adapters.

+Example:
+```yaml
+_meta_:
+  name: example-config
+project: my-project
+vars:
+  batch_size: 32
+args:
+  fit:
+    max_epochs: 10
+trainer:
+  accelerator: gpu
+  devices: 1
+system:
+  _target_: lighter.System
+  model:
+    _target_: torchvision.models.resnet18
+```
 """

21-31: Add validation for common args parameters.

The args schema could benefit from validation of common parameters like max_epochs, batch_size, etc.

     "args": {
         "type": "dict",
         "schema": {
-            "fit": {"type": "dict"},
+            "fit": {
+                "type": "dict",
+                "schema": {
+                    "max_epochs": {"type": "integer", "min": 1},
+                    "max_steps": {"type": "integer", "min": -1},
+                    "limit_train_batches": {"type": ["float", "integer"], "min": 0},
+                    "limit_val_batches": {"type": ["float", "integer"], "min": 0}
+                }
+            },
             "validate": {"type": "dict"},
             "test": {"type": "dict"},
             "predict": {"type": "dict"},
             "lr_find": {"type": "dict"},
             "scale_batch_size": {"type": "dict"},
         },
     },
tests/unit/test_system.py (4)

17-33: Add type hints and improve documentation for helper classes.

The helper classes would benefit from:

  1. Type hints for method parameters and return values
  2. More detailed docstrings explaining the purpose and behavior
  3. Examples in docstrings showing usage patterns

Apply these improvements:

 class DummyDataset(Dataset):
-    """Dataset returning (input_tensor, target_int)"""
+    """A dummy dataset for testing purposes.
+    
+    Returns tuples of (input_tensor, target_int) when with_target=True,
+    or just input_tensor when with_target=False.
+    
+    Args:
+        size (int): Number of samples in the dataset. Defaults to 8.
+        with_target (bool): Whether to include target values. Defaults to True.
+    """

-    def __init__(self, size=8, with_target=True):
+    def __init__(self, size: int = 8, with_target: bool = True) -> None:

-    def __getitem__(self, idx):
+    def __getitem__(self, idx: int) -> Union[Tuple[torch.Tensor, int], torch.Tensor]:

-    def __len__(self):
+    def __len__(self) -> int:

Similar improvements for SimpleModel and DictLossNoTotal.

Also applies to: 41-49, 260-267


52-94: Enhance test fixtures with error handling and validation.

The fixtures could be more robust by:

  1. Validating input parameters
  2. Handling edge cases
  3. Adding cleanup code

Apply these improvements:

 @pytest.fixture
-def dummy_dataloaders():
+def dummy_dataloaders(request):
     """Provides train/val/test/predict DataLoaders"""
+    # Add validation for batch size
+    batch_size = getattr(request, 'param', {}).get('batch_size', 2)
+    if batch_size < 1:
+        raise ValueError("batch_size must be positive")
+
     return {
-        "train": DataLoader(DummyDataset(size=8, with_target=True), batch_size=2),
+        "train": DataLoader(DummyDataset(size=8, with_target=True), batch_size=batch_size),
         # ... similar changes for other dataloaders
     }

 @pytest.fixture
-def simple_system(dummy_dataloaders):
+def simple_system(dummy_dataloaders, request):
     """Creates a System instance with a mock trainer and mocked log method"""
+    # Add cleanup
+    yield system
+    # Clean up any resources
+    system.trainer = None
+    system._log_stats = None

97-188: Enhance test coverage with parameterized tests.

Several test functions could be parameterized to cover more test cases efficiently:

  1. Learning rate tests
  2. Optimizer configuration tests
  3. Error handling tests

Example improvement:

+@pytest.mark.parametrize("learning_rate,expected", [
+    (0.01, 0.01),
+    (0.001, 0.001),
+    (1e-5, 1e-5),
+])
-def test_learning_rate_property(simple_system):
+def test_learning_rate_property(simple_system, learning_rate, expected):
     """Check that learning_rate getter/setter works properly with a single param group."""
-    initial_lr = simple_system.learning_rate
-    assert initial_lr == 0.01
+    simple_system.learning_rate = learning_rate
+    assert simple_system.learning_rate == expected

Also applies to: 247-316


572-613: Enhance logging tests with more explicit assertions.

The logging tests verify call counts but could be more thorough by checking the actual logged values and their formats.

Example improvement:

 def test_log_stats_with_logger(simple_system):
     """Test _log_stats with a logger."""
     simple_system._log_stats = System._log_stats.__get__(simple_system)
     simple_system.trainer.logger = MagicMock()
     simple_system.log = MagicMock()
 
     # Test single loss value
     simple_system.mode = Mode.TRAIN
     simple_system._log_stats(torch.tensor(1.0), None, 0)
-    # Twice (on step, on epoch) for the loss, twice for the SGD optimizer
-    assert simple_system.log.call_count == 4
+    # Verify exact calls with their arguments
+    expected_calls = [
+        call('train/loss', 1.0, on_step=True, on_epoch=True),
+        call('train/loss', 1.0, on_step=False, on_epoch=True),
+        call('train/lr', 0.01, on_step=True, on_epoch=False),
+        call('train/momentum', 0.0, on_step=True, on_epoch=False)
+    ]
+    assert simple_system.log.call_args_list == expected_calls
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd25862 and e404068.

📒 Files selected for processing (4)
  • lighter/engine/schema.py (1 hunks)
  • tests/integration/test_cifar.py (1 hunks)
  • tests/unit/test_system.py (1 hunks)
  • tests/unit/test_utils_types_containers.py (1 hunks)
🧰 Additional context used
📓 Learnings (1)
lighter/engine/schema.py (2)
Learnt from: ibro45
PR: project-lighter/lighter#142
File: lighter/engine/schema.py:44-81
Timestamp: 2025-01-30T19:31:55.980Z
Learning: Adapter fields in the configuration schema should not be marked as required, allowing for optional configuration of batch, criterion, metrics, and logging adapters across different modes.
Learnt from: ibro45
PR: project-lighter/lighter#142
File: lighter/engine/schema.py:0-0
Timestamp: 2025-01-27T02:17:12.626Z
Learning: The adapters schema in Lighter requires:
- batch and logging keys for all modes (train, val, test, predict)
- criterion key only for train and val modes
- metrics key for train, val, and test modes but not for predict mode
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: build (3.12)
🔇 Additional comments (5)
tests/unit/test_utils_types_containers.py (1)

1-22: LGTM! Well-structured imports and class definitions.

The imports are logically organized and the nested dataclass definitions are clean and follow best practices.

tests/integration/test_cifar.py (2)

5-5: LGTM! Good improvement in type safety.

The change from importing run to importing Runner and Stage improves type safety by using an enum instead of strings for stages.


11-24: Verify the configuration file changes.

The test now uses example.yaml instead of monai_bundle_prototype.yaml. Please ensure that:

  1. The new configuration file exists and contains all necessary settings
  2. Using the same configuration file for all stages (FIT, TEST, PREDICT) is intentional
✅ Verification successful

Configuration changes verified successfully

The new example.yaml configuration file exists and contains all necessary stage-specific settings. The reuse of configurations across stages (FIT, TEST, PREDICT) is intentional through YAML references.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the existence and content of the configuration files

# Check if the new config file exists
if [ -f "./projects/cifar10/experiments/example.yaml" ]; then
    echo "New config file exists"
    # Check if it contains necessary stage-specific configurations
    echo "Checking for stage-specific configurations:"
    grep -A 5 "fit\|test\|predict" "./projects/cifar10/experiments/example.yaml"
else
    echo "New config file not found"
fi

# Check if the old config file is still referenced elsewhere
echo "Checking for other references to the old config:"
rg "monai_bundle_prototype\.yaml"

Length of output: 884

lighter/engine/schema.py (2)

59-86: LGTM! Adapters schema aligns with requirements.

The adapters schema correctly:

  • Makes adapter fields optional (not required)
  • Includes mode-specific adapter configurations:
    • batch and logging for all modes
    • criterion for train and val only
    • metrics for train, val, and test only

50-58: 🛠️ Refactor suggestion

Add validation for PyTorch DataLoader parameters.

The dataloaders schema should validate common PyTorch DataLoader parameters.

             "dataloaders": {
                 "type": "dict",
                 "schema": {
-                    "train": {"type": "dict"},
+                    "train": {
+                        "type": "dict",
+                        "schema": {
+                            "_target_": {"type": "string", "required": True},
+                            "batch_size": {"type": "integer", "min": 1},
+                            "shuffle": {"type": "boolean"},
+                            "num_workers": {"type": "integer", "min": 0},
+                            "pin_memory": {"type": "boolean"},
+                            "dataset": {"type": "dict", "required": True}
+                        }
+                    },
                     "val": {"type": "dict"},
                     "test": {"type": "dict"},
                     "predict": {"type": "dict"},
                 },
             },

Likely invalid or redundant comment.

Comment on lines +381 to +570
# Test batch processing
batch = {"input": torch.randn(2, 3, 32, 32), "target": torch.randint(0, 10, (2,))}
_ = system._base_step(batch, 0, "train")
assert count_calls.calls == 1


def test_multiple_param_groups():
model = DummyModel()
optimizer = Adam([{"params": model.net[0].parameters(), "lr": 0.001}, {"params": model.net[1].parameters(), "lr": 0.002}])

system = LighterSystem(model=model, batch_size=8, optimizer=optimizer, datasets={"train": DummyDataset()})

with pytest.raises(ValueError, match="multiple optimizer parameter groups"):
_ = system.learning_rate

with pytest.raises(ValueError, match="multiple optimizer parameter groups"):
system.learning_rate = 0.1


def test_configure_optimizers_no_optimizer():
model = DummyModel()
system = LighterSystem(model=model, batch_size=8, criterion=nn.CrossEntropyLoss(), datasets={"train": DummyDataset()})

with pytest.raises(ValueError, match="Please specify 'system.optimizer' in the config."):
system.configure_optimizers()

model = DummyModel()
system = LighterSystem(
# Assert that only test hooks are overridden, other hooks remain as default
assert system.training_step == super(System, system).training_step
assert system.train_dataloader == super(System, system).train_dataloader
assert system.on_train_start == super(System, system).on_train_start
assert system.on_train_end == super(System, system).on_train_end

assert system.validation_step == super(System, system).validation_step
assert system.val_dataloader == super(System, system).val_dataloader
assert system.on_validation_start == super(System, system).on_validation_start
assert system.on_validation_end == super(System, system).on_validation_end

assert system.test_step != super(System, system).test_step
assert system.test_dataloader != super(System, system).test_dataloader
assert system.on_test_start != super(System, system).on_test_start
assert system.on_test_end != super(System, system).on_test_end

assert system.predict_step == super(System, system).predict_step
assert system.predict_dataloader == super(System, system).predict_dataloader
assert system.on_predict_start == super(System, system).on_predict_start
assert system.on_predict_end == super(System, system).on_predict_end

# Test case 5: Only predict dataloader is provided
model = SimpleModel()
optimizer = SGD(model.parameters(), lr=0.01)
system = System(
model=model,
batch_size=8,
optimizer=Adam(model.parameters()),
datasets={"train": DummyDataset()},
metrics={"train": Accuracy(task="multiclass", num_classes=10)},
dataloaders={"predict": DataLoader(DummyDataset())},
optimizer=optimizer,
scheduler=None,
criterion=nn.CrossEntropyLoss(),
metrics=None,
adapters=None,
inferer=None,
)

# Mock trainer with logger
system.trainer = Mock()
system.trainer.logger = Mock()

# Test logging loss
loss = torch.tensor(1.0)
metrics = {"accuracy": torch.tensor(0.8)}
system._log_stats(loss, metrics, "train", 0)

# Test dict loss logging
dict_loss = {"total": torch.tensor(1.0), "component": torch.tensor(0.5)}
system._log_stats(dict_loss, metrics, "train", 0)
# Assert that only predict hooks are overridden, other hooks remain as default
assert system.training_step == super(System, system).training_step
assert system.train_dataloader == super(System, system).train_dataloader
assert system.on_train_start == super(System, system).on_train_start
assert system.on_train_end == super(System, system).on_train_end

assert system.validation_step == super(System, system).validation_step
assert system.val_dataloader == super(System, system).val_dataloader
assert system.on_validation_start == super(System, system).on_validation_start
assert system.on_validation_end == super(System, system).on_validation_end

assert system.test_step == super(System, system).test_step
assert system.test_dataloader == super(System, system).test_dataloader
assert system.on_test_start == super(System, system).on_test_start
assert system.on_test_end == super(System, system).on_test_end

assert system.predict_step != super(System, system).predict_step
assert system.predict_dataloader != super(System, system).predict_dataloader
assert system.on_predict_start != super(System, system).on_predict_start
assert system.on_predict_end != super(System, system).on_predict_end

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor dynamic mode hooks tests to reduce code duplication.

The test cases for different dataloader combinations contain significant duplication. Consider refactoring using parameterization and helper functions.

Example refactoring:

+def create_system_with_dataloaders(dataloader_keys):
+    """Helper function to create system with specific dataloaders."""
+    model = SimpleModel()
+    optimizer = SGD(model.parameters(), lr=0.01)
+    dataloaders = {
+        key: DataLoader(DummyDataset())
+        for key in dataloader_keys
+    }
+    return System(
+        model=model,
+        dataloaders=dataloaders,
+        optimizer=optimizer,
+        scheduler=None,
+        criterion=nn.CrossEntropyLoss(),
+        metrics=None,
+        adapters=None,
+        inferer=None,
+    )
+
+@pytest.mark.parametrize("dataloader_keys,expected_overridden", [
+    (["train", "val", "test", "predict"], ["train", "val", "test", "predict"]),
+    (["train"], ["train"]),
+    (["val"], ["val"]),
+    (["test"], ["test"]),
+    (["predict"], ["predict"]),
+])
+def test_dynamic_mode_hooks(dataloader_keys, expected_overridden):
+    """Test the dynamic attachment of mode-specific hooks."""
+    system = create_system_with_dataloaders(dataloader_keys)
+    
+    # Helper function to check if hooks are overridden
+    def check_hooks_override(mode):
+        is_overridden = mode in expected_overridden
+        assert (system.f"{mode}_step" != super(System, system).f"{mode}_step") == is_overridden
+        assert (system.f"{mode}_dataloader" != super(System, system).f"{mode}_dataloader") == is_overridden
+        assert (system.f"on_{mode}_start" != super(System, system).f"on_{mode}_start") == is_overridden
+        assert (system.f"on_{mode}_end" != super(System, system).f"on_{mode}_end") == is_overridden
+    
+    for mode in ["train", "val", "test", "predict"]:
+        check_hooks_override(mode)

Committable suggestion skipped: line range outside the PR's diff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:XXL This PR changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants