Skip to content

FSDPConfig#3170

Merged
salmanmohammadi merged 5 commits into
mainfrom
fsdp_config
Oct 10, 2025
Merged

FSDPConfig#3170
salmanmohammadi merged 5 commits into
mainfrom
fsdp_config

Conversation

@salmanmohammadi

@salmanmohammadi salmanmohammadi commented Sep 22, 2025

Copy link
Copy Markdown
Contributor

Summary by CodeRabbit

  • New Features

    • Introduced a dedicated, typed FSDP configuration with clear options (e.g., activation checkpointing, parameter offloading, state dict types, auto-wrap policies, mixed precision).
    • Improved validation by using structured fields for safer, clearer config handling.
  • Bug Fixes

    • Normalization/migration now consistently renames FSDP keys and removes legacy/conflicting entries, improving reliability across versioned configs.
  • Tests

    • Expanded coverage to validate FSDP config migration, renamed fields, and preservation of non-versioned settings.

@coderabbitai

coderabbitai Bot commented Sep 22, 2025

Copy link
Copy Markdown
Contributor

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Introduces a dedicated FSDPConfig Pydantic model and switches AxolotlInputConfig.fsdp_config to use it. Updates validation logic to access FSDP settings via attributes instead of dict keys. Adjusts tests to reflect renamed/mapped FSDP fields during config normalization/migration.

Changes

Cohort / File(s) Summary of changes
Schema typing update
src/axolotl/utils/schemas/config.py
Imports FSDPConfig and changes AxolotlInputConfig.fsdp_config type from dict[str, Any]
New FSDP schema model
src/axolotl/utils/schemas/fsdp.py
Adds FSDPConfig(BaseModel) with optional fields for checkpointing, offloading, sync, CPU-efficient loading, orig params, state dict types, auto-wrap policy, transformer layer class, resharding, and mixed precision; uses Literal for constrained options.
Validation access pattern change
src/axolotl/utils/schemas/validation.py
Replaces dict-style access with attribute access for fsdp_config fields (cpu_ram_efficient_loading, offload_params) in validation checks.
Tests updated for migration output
tests/test_normalize_config.py
Updates expectations for migrated fsdp_config: presence of renamed keys (auto_wrap_policy, offload_params, cpu_ram_efficient_loading), removal of legacy keys and regular_param in versioned path.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title FSDPConfig is concise and directly reflects the PR's primary change: adding an FSDPConfig schema and switching AxolotlInputConfig.fsdp_config to that type. It is specific and relevant for a reviewer scanning the project's history.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Sep 22, 2025

Copy link
Copy Markdown
Contributor

📖 Documentation Preview: https://68e8f210ee7e3e94afb37b19--resonant-treacle-0fd729.netlify.app

Deployed on Netlify from commit 7a8672e

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/axolotl/utils/schemas/validation.py (1)

885-896: Use int comparison for fsdp_version; keep attribute access

Minor consistency/safety: elsewhere fsdp_version is treated as int post-parse. Comparing as string is brittle.

Apply:

-            and str(self.fsdp_version) != "2"
+            and self.fsdp_version != 2
src/axolotl/utils/schemas/fsdp.py (1)

10-14: Forbid unknown keys to catch typos early

Given we normalize legacy fsdp_* keys before model parsing, rejecting unknown fields here helps prevent silent misconfigurations.

Apply:

 class FSDPConfig(BaseModel):
     """
     FSDP Configuration Schema
     """
+    model_config = {"extra": "forbid"}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7be8740 and 80a7e3e.

📒 Files selected for processing (4)
  • src/axolotl/utils/schemas/config.py (2 hunks)
  • src/axolotl/utils/schemas/fsdp.py (1 hunks)
  • src/axolotl/utils/schemas/validation.py (2 hunks)
  • tests/test_normalize_config.py (0 hunks)
💤 Files with no reviewable changes (1)
  • tests/test_normalize_config.py
🧰 Additional context used
🧬 Code graph analysis (1)
src/axolotl/utils/schemas/config.py (1)
src/axolotl/utils/schemas/fsdp.py (1)
  • FSDPConfig (10-67)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: PyTest from Source Dist (3.11, 2.8.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.8.0)
  • GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
  • GitHub Check: PyTest (3.11, 2.6.0)
  • GitHub Check: PyTest (3.11, 2.7.1)
  • GitHub Check: preview
🔇 Additional comments (2)
src/axolotl/utils/schemas/config.py (1)

27-27: LGTM: new FSDPConfig import

Import is correct and localizes FSDP schema cleanly.

src/axolotl/utils/schemas/validation.py (1)

821-831: Switch to attribute access (FSDPConfig) — correct

Reading cpu_ram_efficient_loading via attribute on the typed model is correct for an after validator.

Comment thread src/axolotl/utils/schemas/config.py
@salmanmohammadi salmanmohammadi changed the title [WIP] FSDPConfig FSDPConfig Sep 22, 2025
@codecov

codecov Bot commented Sep 22, 2025

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@salmanmohammadi salmanmohammadi requested a review from a team September 22, 2025 16:52
)
# TODO @SalmanMohammadi strongly type this as its own schema
fsdp_config: dict[str, Any] | None = Field(
fsdp_config: FSDPConfig | None = Field(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mean we're now dropping fsdp1 support? Given that the fsdp_config schema only has fsdp2 configs, any prior fsdp1 configs would be dropped.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsdp_config should also include FSDP1 fields, am I missing some?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also the old way of configuring FSDP1 through cfg.fsdp which remains.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For ex:

fsdp_config:
  fsdp_state_dict_type: FULL_STATE_DICT
  fsdp_transformer_layer_cls_to_wrap: MistralDecoderLayer
  fsdp_activation_checkpointing: true

With the new pydantic schema, would these be ignored?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we strip fsdp_ from any fields which have it and emit a deprecation warning to encourage users to just use state_dict_type etc. But those fields are common to both FSDP1 and FSDP2.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are tests which cover this in test_normalize_config.py.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for clarifying. Forgot it gets stripped before the pydantic

@salmanmohammadi

Copy link
Copy Markdown
Contributor Author

Please hold off on merging until #3167 lands.

@salmanmohammadi salmanmohammadi added the hold don't merge this yet label Sep 23, 2025
@salmanmohammadi salmanmohammadi merged commit 143dea4 into main Oct 10, 2025
18 checks passed
@salmanmohammadi salmanmohammadi deleted the fsdp_config branch October 10, 2025 13:44
flaviusburca pushed a commit to invergent-ai/axolotl that referenced this pull request Oct 18, 2025
(cherry picked from commit 143dea4)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hold don't merge this yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants