Config doc autogen#2718
Conversation
WalkthroughThis update introduces a new script to auto-generate Quarto documentation from Pydantic models, restructures configuration schemas to use rich field-level descriptions, and moves complex validation logic into modular mixin classes. Configuration and documentation-related files are updated to reflect these changes, enhancing clarity, introspection, and maintainability of the configuration system. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Quarto
participant PreRenderScript as generate_config_docs.py
participant PydanticModels
User->>Quarto: Render documentation
Quarto->>PreRenderScript: Run as pre-render step
PreRenderScript->>PydanticModels: Introspect config models
PydanticModels-->>PreRenderScript: Provide field metadata, including nested/inherited fields
PreRenderScript-->>Quarto: Output config-reference.qmd
Quarto-->>User: Serve updated documentation
sequenceDiagram
participant ConfigLoader
participant AxolotlInputConfig
participant ValidationMixin
ConfigLoader->>AxolotlInputConfig: Load config data
AxolotlInputConfig->>ValidationMixin: Run all validators (from mixins)
ValidationMixin-->>AxolotlInputConfig: Raise errors/warnings or adjust fields as needed
AxolotlInputConfig-->>ConfigLoader: Return validated config instance
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used🪛 Ruff (0.11.9)docs/scripts/generate_config_docs.py278-281: Use a single (SIM102) 531-535: Use a single Combine (SIM102) 630-632: Use a single (SIM102) ⏰ Context from checks skipped due to timeout of 90000ms (9)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
1b80549 to
705b078
Compare
Codecov ReportAttention: Patch coverage is
📢 Thoughts on this report? Let us know! |
9242fb6 to
bc4ecc0
Compare
|
Moving to ready for review so I can generate + share the website preview. |
There was a problem hiding this comment.
Actionable comments posted: 4
🔭 Outside diff range comments (2)
src/axolotl/utils/schemas/quantization.py (2)
36-44: Validator rejects default &Nonevalues
validate_dtyperuns in mode="before" for optional fields.
When the user omitsactivation_dtype, the raw value isNone; when they omitweight_dtype, the raw value is the Enum default.
Both fall through to the finalraise ValueError, makingQATConfig()impossible.- if v == "int4": + if v is None or isinstance(v, TorchIntDType): + return v + if v == "int4": return TorchIntDType.int4Apply the same guard to
PTQConfig.
67-74:PTQConfigvalidator ignores the documenteduintXdtypesThe description states that
weight_dtypemay be anyuint[1-7], but the validator only whitelists"int4"/"int8". Extend the validator to handle theTorchIntDType.uint*variants or remove them from the description.
🧹 Nitpick comments (8)
_quarto.yml (1)
239-239: Sidebar entry could break local preview if the file hasn’t been generated yetDuring a first-time
quarto previewdevelopers will see a 404 until the pre-render step finishes successfully.
Consider committing a very small placeholder file (e.g. an HTML comment) so the link resolves even when the generation step is skipped.src/axolotl/utils/schemas/trl.py (1)
11-16: Truncated description stringThe
betafield’s description ends abruptly with “Use”.- "description": "Beta parameter for the RL training. Same as `rl_beta`. Use" + "description": "Beta (KL-penalty) coefficient for RL training. Same semantics as `rl_beta`.",Without a complete sentence the autogenerated docs look unpolished.
src/axolotl/utils/schemas/datasets.py (1)
56-61: Field nametypeshadows Python built-inWhile Pydantic permits this, accessing the attribute (
dataset.type) inside model methods can be confusing and hampers IDE auto-completion.
Consider renaming toprompt_type(update validators accordingly) to avoid the shadow.src/axolotl/utils/schemas/integrations.py (2)
15-18: Documentuse_mlflowwithFieldfor parityAll other MLFlowConfig fields now use
Fieldwithjson_schema_extra.use_mlflowis still a bare annotation, so it will be absent from the autogenerated descriptions. Wrap it inField(..., json_schema_extra={"description": "Enable or disable MLflow integration."})to keep documentation uniform.
160-178: Inconsistent metadata key (helpvsdescription)
RayConfigusesjson_schema_extra={"help": ...}whereas every other model now standardises on thedescriptionkey. The generator script only looks fordescription, so these three texts will silently disappear from the docs. Rename the key todescription.src/axolotl/utils/schemas/training.py (1)
81-86:optim_argsacceptsstrbut doc says “Dictionary”Type annotation allows
str | dict[str, Any], yet the description mentions only a dictionary.
Either dropstrfrom the type or expand the description (e.g. “str (YAML/JSON) or dict”).src/axolotl/utils/schemas/model.py (1)
45-50: Typo: “mistral-common tokenizer” wordingMinor wording polish – consider “the Mistral common tokenizer” for clarity.
src/axolotl/utils/schemas/peft.py (1)
520-523: Error message lists the same flag three timesTypo makes the message misleading:
-"lora_mlp_kernel, lora_mlp_kernel, and lora_mlp_kernel are not compatible with 8-bit LoRA" +"lora_mlp_kernel, lora_qkv_kernel, and lora_o_kernel are not compatible with 8-bit LoRA"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (14)
_quarto.yml(2 hunks)docs/.gitignore(1 hunks)docs/scripts/generate_config_docs.py(1 hunks)src/axolotl/utils/schemas/config.py(9 hunks)src/axolotl/utils/schemas/datasets.py(2 hunks)src/axolotl/utils/schemas/deprecated.py(1 hunks)src/axolotl/utils/schemas/enums.py(2 hunks)src/axolotl/utils/schemas/integrations.py(3 hunks)src/axolotl/utils/schemas/model.py(2 hunks)src/axolotl/utils/schemas/peft.py(3 hunks)src/axolotl/utils/schemas/quantization.py(2 hunks)src/axolotl/utils/schemas/training.py(2 hunks)src/axolotl/utils/schemas/trl.py(3 hunks)src/axolotl/utils/schemas/validation.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/axolotl/utils/schemas/quantization.py (1)
src/axolotl/utils/schemas/enums.py (1)
TorchIntDType(10-21)
src/axolotl/utils/schemas/validation.py (1)
src/axolotl/utils/schemas/enums.py (3)
ChatTemplate(35-67)RingAttnFunc(82-90)RLType(24-32)
🪛 Ruff (0.11.9)
src/axolotl/utils/schemas/validation.py
133-134: Use a single if statement instead of nested if statements
(SIM102)
406-414: Use a single if statement instead of nested if statements
(SIM102)
501-506: Use a single if statement instead of nested if statements
(SIM102)
515-520: Use a single if statement instead of nested if statements
(SIM102)
582-587: Use a single if statement instead of nested if statements
(SIM102)
744-750: Use a single if statement instead of nested if statements
(SIM102)
docs/scripts/generate_config_docs.py
28-28: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks
(B019)
181-184: Use a single if statement instead of nested if statements
(SIM102)
424-428: Use a single if statement instead of nested if statements
Combine if statements using and
(SIM102)
523-525: Use a single if statement instead of nested if statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.6.0)
🔇 Additional comments (6)
docs/.gitignore (1)
5-5: Ensure the ignored filename matches the generator output exactlyThe generator script writes to
docs/config-reference.qmd(note thedocs/prefix) while the ignore rule omits the directory prefix.
If the script ever runs from a different working directory (e.g., CI) it could emit the file underdocs/, which would then be tracked by Git.- config-reference.qmd + docs/config-reference.qmd_quarto.yml (1)
3-3: Makepre-renderscript path portable & executable
pre-render: docs/scripts/generate_config_docs.pyis interpreted by Quarto as a command, not a Python module.
Ensure the file is:
chmod +xand starts with the correct she-bang (#!/usr/bin/env python3), or- Invoked explicitly via the interpreter:
pre-render: python docs/scripts/generate_config_docs.pyOtherwise the site build will fail on environments that don’t automatically treat “*.py” files as executables.
src/axolotl/utils/schemas/enums.py (1)
3-3: Module-level pylint disable is 👍Moving the directive here reduces noise in the enum definitions themselves.
src/axolotl/utils/schemas/deprecated.py (2)
62-75: Alias direction may break attribute accessThe field name is
overrides_of_model_configwith alias"model_config".
Inside pythoncfg.overrides_of_model_configworks, but existing code that doescfg.model_configwill no longer compile unlessmodel_configis also defined as a property ormodel_configis accessed throughcfg["model_config"].Please grep the codebase for
.model_configattribute access or addpopulate_by_name=True/ an explicit property.
76-89: Same alias concern formodel_type&model_revisionThe same caveat applies to
type_of_model/model_typeandrevision_of_model/model_revision. Verify call-sites or flip the alias if the attribute name should staymodel_type.src/axolotl/utils/schemas/config.py (1)
1-10: Nothing to flag on this file.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
docs/scripts/generate_config_docs.py (2)
63-88:_extract_nested_typediscards multi-type unionsThe helper returns only the first non-
Nonemember (int | str | None→int), which loses information and can mis-document fields. Consider:• Returning
Union[int, str](stringified) when multiple concrete types exist, or
• Falling back toAnywith an explanatory comment.This keeps the generated reference accurate.
638-641: Ensure output directory exists before writingThe script assumes
docs/is present. A fresh checkout or altered build path
will raiseFileNotFoundError.+ from pathlib import Path ... - with open("docs/config-reference.qmd", "w", encoding="utf-8") as f: + Path("docs").mkdir(parents=True, exist_ok=True) + with open("docs/config-reference.qmd", "w", encoding="utf-8") as f:Minor but avoids brittle CI failures.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/scripts/generate_config_docs.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.11.9)
docs/scripts/generate_config_docs.py
179-182: Use a single if statement instead of nested if statements
(SIM102)
422-426: Use a single if statement instead of nested if statements
Combine if statements using and
(SIM102)
521-523: Use a single if statement instead of nested if statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: pre-commit
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: pre-commit
🔇 Additional comments (1)
docs/scripts/generate_config_docs.py (1)
27-44: 👍 Cached-method issue resolvedThe problematic
@lru_cachedecorator that previously broke instance binding has been removed. The method now works correctly with the explicit_class_fields_cache.
|
Caution Failed to replace (edit) comment. This is likely due to insufficient permissions or the comment being deleted. Error details |
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
docs/scripts/generate_config_docs.py (2)
82-90: Still using class names for recursion guard – collisions remain
Thevisitedset keys onmodel_class.__name__; different classes that share the same name will collide and either skip models or trigger wrong recursion termination. Use the class object itself.
462-465: Raw default value interpolation can emit invalid YAML
field_line += f" = {field_info['default']}"directly stringifies Python objects; complex types or strings containing:will break the generated YAML.- if field_info.get("default") is not None: - field_line += f" = {field_info['default']}" + default_val = field_info.get("default") + if default_val is not None: + import yaml + field_line += f" = {yaml.safe_dump(default_val, default_flow_style=True).strip()}"Same issue appears later when building
linefor the non-expanded path.Also applies to: 595-597
🧹 Nitpick comments (2)
src/axolotl/utils/schemas/validation.py (1)
395-400: Mismatch between comment, validation message, and supported-metrics setThe in-code comment says only 4 metrics are supported, yet
SUPPORTED_METRICSincludes"perplexity"and the runtime check enforces the larger set.
Please reconcile to avoid confusing users and reviewers.docs/scripts/generate_config_docs.py (1)
45-48: Guardissubclassagainst non-class inputs
issubclassraisesTypeErrorwhentype_objis not a class (e.g. atypingconstruct).
Wrap the call:- return inspect.isclass(type_obj) and issubclass(type_obj, BaseModel) + try: + return inspect.isclass(type_obj) and issubclass(type_obj, BaseModel) + except TypeError: + return False
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/scripts/generate_config_docs.py(1 hunks)src/axolotl/utils/schemas/datasets.py(2 hunks)src/axolotl/utils/schemas/validation.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/axolotl/utils/schemas/datasets.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/utils/schemas/validation.py (2)
src/axolotl/utils/schemas/enums.py (3)
ChatTemplate(35-67)RingAttnFunc(82-90)RLType(24-32)src/axolotl/utils/schemas/datasets.py (1)
check_chat_template_config(196-216)
🪛 Ruff (0.11.9)
docs/scripts/generate_config_docs.py
165-168: Use a single if statement instead of nested if statements
(SIM102)
408-412: Use a single if statement instead of nested if statements
Combine if statements using and
(SIM102)
507-509: Use a single if statement instead of nested if statements
(SIM102)
src/axolotl/utils/schemas/validation.py
133-134: Use a single if statement instead of nested if statements
(SIM102)
406-414: Use a single if statement instead of nested if statements
(SIM102)
501-506: Use a single if statement instead of nested if statements
(SIM102)
515-520: Use a single if statement instead of nested if statements
(SIM102)
582-587: Use a single if statement instead of nested if statements
(SIM102)
744-750: Use a single if statement instead of nested if statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
| def check_kto_config(cls, data): | ||
| if data.get("rl") == "kto": | ||
| if data.get("sample_packing") or data.get("eval_sample_packing"): | ||
| raise ValueError("sample_packing is not supported with kto") | ||
|
|
||
| if data.get("remove_unused_columns") is not False: | ||
| raise ValueError("Set `remove_unused_columns: False` when using kto") | ||
| return data |
There was a problem hiding this comment.
Enum comparison will silently fail
check_kto_config compares data.get("rl") to the string "kto".
After Pydantic parsing this field will usually hold the RLType.KTO enum member, so the equality test will be False and the validation never runs.
- if data.get("rl") == "kto":
+ if data.get("rl") in (RLType.KTO, "kto"):Apply the same pattern for other enum checks ("grpo", etc.).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def check_kto_config(cls, data): | |
| if data.get("rl") == "kto": | |
| if data.get("sample_packing") or data.get("eval_sample_packing"): | |
| raise ValueError("sample_packing is not supported with kto") | |
| if data.get("remove_unused_columns") is not False: | |
| raise ValueError("Set `remove_unused_columns: False` when using kto") | |
| return data | |
| def check_kto_config(cls, data): | |
| - if data.get("rl") == "kto": | |
| + if data.get("rl") in (RLType.KTO, "kto"): | |
| if data.get("sample_packing") or data.get("eval_sample_packing"): | |
| raise ValueError("sample_packing is not supported with kto") | |
| if data.get("remove_unused_columns") is not False: | |
| raise ValueError("Set `remove_unused_columns: False` when using kto") | |
| return data |
🤖 Prompt for AI Agents
In src/axolotl/utils/schemas/validation.py around lines 619 to 626, the method
check_kto_config compares data.get("rl") to the string "kto", but after Pydantic
parsing this field will be an enum member RLType.KTO, causing the equality check
to fail silently. To fix this, update the comparison to check against the enum
member RLType.KTO instead of the string "kto". Apply the same enum member
comparison pattern for other similar enum checks like "grpo" elsewhere in the
code.
|
@djsaunde We should update the workflows/preview-docs.yml to use |
Great catch! I was wondering where the preview was 😅 |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
docs/scripts/generate_config_docs.py (1)
462-466: Default value emission still unsafe for complex literalsInterpolating
field_info['default']directly can output invalid YAML (e.g. strings with:).
Serialize withrepr()oryaml.safe_dump()instead — this is the same issue flagged earlier.src/axolotl/utils/schemas/validation.py (1)
620-626: Enum comparison will silently fail
check_kto_configcomparesdata.get("rl")to the string"kto".
After Pydantic parsing this field is usuallyRLType.KTO, so the check is bypassed.- if data.get("rl") == "kto": + if data.get("rl") in (RLType.KTO, "kto"):Apply the same enum-aware pattern for other RLType comparisons.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
.github/workflows/preview-docs.yml(1 hunks)docs/scripts/generate_config_docs.py(1 hunks)src/axolotl/utils/schemas/datasets.py(2 hunks)src/axolotl/utils/schemas/validation.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/preview-docs.yml
🚧 Files skipped from review as they are similar to previous changes (1)
- src/axolotl/utils/schemas/datasets.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/utils/schemas/validation.py (2)
src/axolotl/utils/schemas/enums.py (3)
ChatTemplate(35-67)RingAttnFunc(82-90)RLType(24-32)src/axolotl/utils/schemas/datasets.py (1)
check_chat_template_config(196-216)
🪛 Ruff (0.11.9)
docs/scripts/generate_config_docs.py
165-168: Use a single if statement instead of nested if statements
(SIM102)
408-412: Use a single if statement instead of nested if statements
Combine if statements using and
(SIM102)
507-509: Use a single if statement instead of nested if statements
(SIM102)
src/axolotl/utils/schemas/validation.py
133-134: Use a single if statement instead of nested if statements
(SIM102)
406-414: Use a single if statement instead of nested if statements
(SIM102)
501-506: Use a single if statement instead of nested if statements
(SIM102)
515-520: Use a single if statement instead of nested if statements
(SIM102)
582-587: Use a single if statement instead of nested if statements
(SIM102)
744-750: Use a single if statement instead of nested if statements
(SIM102)
🪛 GitHub Actions: Preview
docs/scripts/generate_config_docs.py
[error] 16-16: ModuleNotFoundError: No module named 'torch' when importing in generate_config_docs.py at line 16.
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: pre-commit
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: pre-commit
bc0846b to
52a9d4f
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
docs/scripts/generate_config_docs.py (2)
462-466: Default value emission still unsafe for complex YAML – reuse earlier fixPast review already pointed out that directly concatenating
field_info['default']
produces invalid YAML for strings with:or multiline lists. The code is
unchanged.Re-emit with
repr()oryaml.safe_dump():- if field_info.get("default") is not None: - field_line += f" = {field_info['default']}" + default_val = field_info.get("default") + if default_val is not None: + field_line += f" = {repr(default_val)}"
14-17: ImportingAxolotlInputConfigat module import time drags heavy deps into the docs job
AxolotlInputConfigtransitively importstorchand other GPU-specific
packages. A docs-only CI environment (or local Quarto preview) rarely has
these installed, causing immediateModuleNotFoundError.Lazy-load the model inside
main()or stub-inject missing heavies before
import:-import inspect +import inspect +import sys, types ... -from axolotl.utils.schemas.config import AxolotlInputConfig + +# Guard heavy deps so docs build on slim environments +try: + from axolotl.utils.schemas.config import AxolotlInputConfig +except ModuleNotFoundError as exc: + if exc.name == "torch": + sys.modules["torch"] = types.ModuleType("torch") # minimal stub + from axolotl.utils.schemas.config import AxolotlInputConfig + else: + raisesrc/axolotl/utils/schemas/validation.py (1)
619-626: Use enum comparison forrlfield incheck_kto_config.
After Pydantic parsing,data.get("rl")will be anRLTypemember rather than a raw string. Comparing to the string"kto"may never trigger. Update the check to:- if data.get("rl") == "kto": + if data.get("rl") == RLType.KTO:or support both:
- if data.get("rl") == "kto": + if data.get("rl") in (RLType.KTO, "kto"):
🧹 Nitpick comments (3)
docs/scripts/generate_config_docs.py (1)
52-69: Minor: combine nestedifstatements for clarityRuff SIM102 raised this – the nested
ifblocks for filteringUnioncould be
collapsed into a single condition withand, reducing indentation noise.
Low-impact, optional.src/axolotl/utils/schemas/validation.py (2)
95-100: Remove or address TODO incheck_eval_packing.
The TODO suggests additional dataset checks but is unimplemented. Please implement the intended validation or remove the comment to reduce technical debt.
132-140: Flatten nested conditions for clarity.
Incheck_mm_prepare, the nestedifblocks can be combined:- if data.get("skip_prepare_dataset"): - if data.get("remove_unused_columns") is None: + if data.get("skip_prepare_dataset") and data.get("remove_unused_columns") is None:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/preview-docs.yml(2 hunks)_quarto.yml(2 hunks)docs/.gitignore(1 hunks)docs/scripts/generate_config_docs.py(1 hunks)src/axolotl/utils/schemas/config.py(9 hunks)src/axolotl/utils/schemas/datasets.py(2 hunks)src/axolotl/utils/schemas/deprecated.py(1 hunks)src/axolotl/utils/schemas/enums.py(2 hunks)src/axolotl/utils/schemas/integrations.py(3 hunks)src/axolotl/utils/schemas/model.py(2 hunks)src/axolotl/utils/schemas/peft.py(3 hunks)src/axolotl/utils/schemas/quantization.py(2 hunks)src/axolotl/utils/schemas/training.py(2 hunks)src/axolotl/utils/schemas/trl.py(3 hunks)src/axolotl/utils/schemas/validation.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- src/axolotl/utils/schemas/integrations.py
🚧 Files skipped from review as they are similar to previous changes (11)
- docs/.gitignore
- .github/workflows/preview-docs.yml
- src/axolotl/utils/schemas/trl.py
- _quarto.yml
- src/axolotl/utils/schemas/enums.py
- src/axolotl/utils/schemas/datasets.py
- src/axolotl/utils/schemas/training.py
- src/axolotl/utils/schemas/model.py
- src/axolotl/utils/schemas/deprecated.py
- src/axolotl/utils/schemas/quantization.py
- src/axolotl/utils/schemas/peft.py
🧰 Additional context used
🧬 Code Graph Analysis (3)
docs/scripts/generate_config_docs.py (1)
src/axolotl/utils/schemas/config.py (1)
AxolotlInputConfig(55-832)
src/axolotl/utils/schemas/config.py (8)
src/axolotl/utils/schemas/validation.py (1)
ValidationMixin(1060-1073)src/axolotl/utils/schemas/vllm.py (1)
VllmConfig(8-55)src/axolotl/utils/logging.py (1)
get_logger(53-62)src/axolotl/prompt_strategies/dpo/chat_template.py (1)
default(9-124)src/axolotl/utils/schemas/enums.py (3)
RLType(24-32)RingAttnFunc(82-90)ChatTemplate(35-67)src/axolotl/utils/schemas/datasets.py (5)
SFTDataset(45-216)DPODataset(244-252)KTODataset(278-286)StepwiseSupervisedDataset(255-264)PretrainingDataset(219-229)src/axolotl/utils/schemas/model.py (1)
SpecialTokensConfig(97-104)src/axolotl/utils/mistral_tokenizer.py (1)
chat_template(139-141)
src/axolotl/utils/schemas/validation.py (2)
src/axolotl/utils/schemas/enums.py (3)
ChatTemplate(35-67)RingAttnFunc(82-90)RLType(24-32)src/axolotl/utils/schemas/datasets.py (1)
check_chat_template_config(196-216)
🪛 Ruff (0.11.9)
docs/scripts/generate_config_docs.py
165-168: Use a single if statement instead of nested if statements
(SIM102)
408-412: Use a single if statement instead of nested if statements
Combine if statements using and
(SIM102)
507-509: Use a single if statement instead of nested if statements
(SIM102)
src/axolotl/utils/schemas/validation.py
133-134: Use a single if statement instead of nested if statements
(SIM102)
406-414: Use a single if statement instead of nested if statements
(SIM102)
501-506: Use a single if statement instead of nested if statements
(SIM102)
515-520: Use a single if statement instead of nested if statements
(SIM102)
582-587: Use a single if statement instead of nested if statements
(SIM102)
744-750: Use a single if statement instead of nested if statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (6)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.5.1)
🔇 Additional comments (2)
src/axolotl/utils/schemas/config.py (1)
290-300: Docstring/default mismatch foreval_causal_lm_metricsThe description lists a non-empty default
['sacrebleu', 'comet', 'ter', 'chrf', 'perplexity']
but the field’s runtime default isNone.Either drop the default list from the description or set it via
default_factory=list(...)so behaviour and docs stay in sync.src/axolotl/utils/schemas/validation.py (1)
329-337: Validation logic corrected for deprecated parameters.
The unreachable branch incheck_neftunehas been fixed to properly detect when both parameters are set. This resolves the prior duplicate condition issue.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (5)
src/axolotl/utils/schemas/validation.py (3)
327-337: Unreachable branch incheck_neftune– duplicate conditionThe
elifbranch tests the same condition as theifbranch, making it unreachable. The intended logic should check if both parameters are present.- elif data.get("noisy_embedding_alpha") and not data.get("neftune_noise_alpha"): + elif data.get("noisy_embedding_alpha") and data.get("neftune_noise_alpha"):
512-524: Fix typo in error messageThe error message incorrectly repeats
lora_mlp_kernelthree times.- "lora_mlp_kernel, lora_mlp_kernel, and lora_mlp_kernel are not compatible with 8-bit LoRA" + "lora_mlp_kernel, lora_qkv_kernel, and lora_o_kernel are not compatible with 8-bit LoRA"
619-626: Fix enum comparison that will silently failThe method compares
data.get("rl")to the string"kto", but after Pydantic parsing this field will be the enum memberRLType.KTO.- if data.get("rl") == "kto": + if data.get("rl") in (RLType.KTO, "kto"):docs/scripts/generate_config_docs.py (2)
8-16: Handle missing torch dependency in docs CIThe import of
AxolotlInputConfigpulls in torch, which is not available in the docs CI environment.Add this before the import to stub torch:
+import sys +import types + +# Stub torch for docs generation +try: + import torch # noqa: F401 +except ModuleNotFoundError: + sys.modules["torch"] = types.ModuleType("torch") + sys.modules["torch"].__dict__["__getattr__"] = lambda *_: None + from axolotl.utils.schemas.config import AxolotlInputConfig
462-467: Unsafe default value emission for complex typesDirectly interpolating the default value can produce invalid YAML for complex types like strings with colons or multiline lists.
- if field_info.get("default") is not None: - field_line += f" = {field_info['default']}" + default_val = field_info.get("default") + if default_val is not None: + field_line += f" = {repr(default_val)}"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
docs/quantize.qmd(1 hunks)docs/scripts/generate_config_docs.py(1 hunks)src/axolotl/utils/schemas/validation.py(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/quantize.qmd
🧰 Additional context used
🧬 Code Graph Analysis (1)
src/axolotl/utils/schemas/validation.py (2)
src/axolotl/utils/schemas/enums.py (3)
ChatTemplate(35-67)RingAttnFunc(82-90)RLType(24-32)src/axolotl/utils/schemas/datasets.py (1)
check_chat_template_config(196-216)
🪛 Ruff (0.11.9)
src/axolotl/utils/schemas/validation.py
133-134: Use a single if statement instead of nested if statements
(SIM102)
406-414: Use a single if statement instead of nested if statements
(SIM102)
501-506: Use a single if statement instead of nested if statements
(SIM102)
515-520: Use a single if statement instead of nested if statements
(SIM102)
582-587: Use a single if statement instead of nested if statements
(SIM102)
744-750: Use a single if statement instead of nested if statements
(SIM102)
docs/scripts/generate_config_docs.py
165-168: Use a single if statement instead of nested if statements
(SIM102)
408-412: Use a single if statement instead of nested if statements
Combine if statements using and
(SIM102)
507-509: Use a single if statement instead of nested if statements
(SIM102)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: preview
- GitHub Check: PyTest from Source Dist (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.7.1)
- GitHub Check: PyTest from Source Dist (3.11, 2.7.1)
- GitHub Check: PyTest (3.11, 2.5.1)
- GitHub Check: PyTest (3.11, 2.6.0)
- GitHub Check: PyTest from Source Dist (3.11, 2.6.0)
- GitHub Check: pre-commit
- GitHub Check: pre-commit
70bca22 to
ba84445
Compare
ba84445 to
3b0685d
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/axolotl/utils/schemas/validation.py (2)
620-637: Enum comparison will still silently fail – useRLTypenot raw strings
data.get("rl") == "kto"/"grpo"will beFalseafter Pydantic casts the field toRLType, so these validations are skipped.
Replace the checks with enum-aware variants:- if data.get("rl") == "kto": + if data.get("rl") in (RLType.KTO, "kto"): ... - if ( - data.get("rl") == "grpo" + if ( + data.get("rl") in (RLType.GRPO, "grpo")
579-591: Duplicate validator – keep one, drop the other
check_lora_kernel_8bit(this block) repeats the logic already implemented incheck_lora_8bit(514-524).
Remove this copy to avoid diverging fixes and needless execution.
🧹 Nitpick comments (1)
docs/scripts/generate_config_docs.py (1)
97-99: Inconsistent field declaration – drops schema metadata
mean_resizing_embeddingsis the only boolean option defined withoutField(...).
As a result it will be missing from the autogenerated JSON-schema and docs.
Declare it viaField(default=False, json_schema_extra={...})to stay in sync with the rest of the config.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (23)
.github/workflows/preview-docs.yml(2 hunks).runpod/README.md(1 hunks)README.md(1 hunks)_quarto.yml(2 hunks)docs/.gitignore(1 hunks)docs/config.qmd(0 hunks)docs/dataset-formats/conversation.qmd(2 hunks)docs/dataset-formats/inst_tune.qmd(1 hunks)docs/dataset_loading.qmd(1 hunks)docs/getting-started.qmd(2 hunks)docs/quantize.qmd(1 hunks)docs/scripts/generate_config_docs.py(1 hunks)src/axolotl/utils/schemas/config.py(9 hunks)src/axolotl/utils/schemas/datasets.py(2 hunks)src/axolotl/utils/schemas/deprecated.py(1 hunks)src/axolotl/utils/schemas/enums.py(2 hunks)src/axolotl/utils/schemas/integrations.py(3 hunks)src/axolotl/utils/schemas/model.py(2 hunks)src/axolotl/utils/schemas/peft.py(3 hunks)src/axolotl/utils/schemas/quantization.py(2 hunks)src/axolotl/utils/schemas/training.py(2 hunks)src/axolotl/utils/schemas/trl.py(3 hunks)src/axolotl/utils/schemas/validation.py(1 hunks)
💤 Files with no reviewable changes (1)
- docs/config.qmd
✅ Files skipped from review due to trivial changes (6)
- docs/getting-started.qmd
- docs/dataset-formats/inst_tune.qmd
- README.md
- .runpod/README.md
- docs/dataset-formats/conversation.qmd
- docs/dataset_loading.qmd
🚧 Files skipped from review as they are similar to previous changes (14)
- docs/quantize.qmd
- docs/.gitignore
- .github/workflows/preview-docs.yml
- _quarto.yml
- src/axolotl/utils/schemas/enums.py
- src/axolotl/utils/schemas/trl.py
- src/axolotl/utils/schemas/integrations.py
- src/axolotl/utils/schemas/deprecated.py
- src/axolotl/utils/schemas/training.py
- src/axolotl/utils/schemas/config.py
- src/axolotl/utils/schemas/datasets.py
- src/axolotl/utils/schemas/quantization.py
- src/axolotl/utils/schemas/model.py
- src/axolotl/utils/schemas/peft.py
🧰 Additional context used
🪛 Ruff (0.11.9)
docs/scripts/generate_config_docs.py
924-925: Use a single if statement instead of nested if statements
(SIM102)
1044-1045: Use a single if statement instead of nested if statements
(SIM102)
1084-1089: Use a single if statement instead of nested if statements
Combine if statements using and
(SIM102)
src/axolotl/utils/schemas/validation.py
133-134: Use a single if statement instead of nested if statements
(SIM102)
406-414: Use a single if statement instead of nested if statements
(SIM102)
501-506: Use a single if statement instead of nested if statements
(SIM102)
515-520: Use a single if statement instead of nested if statements
(SIM102)
582-587: Use a single if statement instead of nested if statements
(SIM102)
744-750: Use a single if statement instead of nested if statements
(SIM102)
|
Follow-up work: Reorder / rethink grouping of config classes / fields s.t. the most important bits are displayed near the top of the generated config reference. |
Description
Motivation and Context
We have a large, manually maintained config reference in our docs. Why not autogen from our pydantic models? This enables us to have a single source of truth.
How has this been tested?
Local quarto build + render.
Screenshots (if appropriate)
Once docs preview is live, check that out.
Types of changes
Docs, python script for autogen.
Summary by CodeRabbit
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Chores
Refactor
Style