Skip to content

[studio] Fix VLM detection for transformers v5#17

Closed
danielhanchen wants to merge 8 commits into
mainfrom
pr-4868-code
Closed

[studio] Fix VLM detection for transformers v5#17
danielhanchen wants to merge 8 commits into
mainfrom
pr-4868-code

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Staging mirror of unslothai#4868

Original PR: unslothai#4868
Author: Datta0

This is a staging copy for review and editing. Once finalized, changes will be pushed back to the original PR.


Original description

Fixes: unslothai#4859


This PR contains code changes only (1 files). Test changes are in a separate PR.

Changed files:

  • studio/backend/utils/models/model_config.py

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Vision-Language Model (VLM) detection logic within the is_vision_model function. It introduces a more robust multi-stage detection process that includes in-process configuration checking, a version-specific subprocess check, and a final fallback to raw metadata inspection via _load_model_config_metadata. Additionally, the _is_vlm_config helper was added to unify the identification of VLM characteristics across different configuration formats, and error handling in the subprocess check was updated to return None to facilitate the new fallback mechanism. I have no feedback to provide.

@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. More of your lovely PRs please.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! The goal of this PR is to fix VLM detection for models that require transformers v5 (like Qwen3.5 and Gemma4) in Unsloth Studio. As a summary, this PR refactors the vision detection logic by introducing a unified _is_vlm_config() helper, a raw config.json fallback via _load_model_config_metadata(), and changes _is_vision_model_subprocess() to return Optional[bool] (None on transient failures) to enable the multi-tier fallback chain.

Review Loop 1 - Consolidated Findings

Reviewers Severity Finding
7/7 High Merge conflict resolution dropped main's vision-detection cache (_vision_detection_cache, _token_fingerprint, _vision_cache_lock, _is_vision_model_uncached()) from unslothai#4853/unslothai#4855. Every repeated is_vision_model() call now re-runs load_model_config() and may re-spawn the 60s subprocess.
2/7 Medium For transformers-5-only models (Qwen3.5, Gemma4), the new flow tries the known-failing in-process load_model_config() before falling back to subprocess/raw-config. This adds avoidable latency and warning log noise for the exact models being fixed.
1/7 High _is_vlm_config() false-positives on seq2seq models (T5, BART, mBART, Pegasus, etc.) because ForConditionalGeneration suffix matches both VLMs and seq2seq. Only csm and whisper are excluded.
1/7 Medium _VISION_CHECK_SCRIPT subprocess still uses old suffix-only heuristic without audio-only exclusion, so subprocess and in-process detection can disagree (e.g. Whisper returns is_vision: true in subprocess).
1/7 Medium _load_model_config_metadata() does not call resolve_cached_repo_id_case(), so different casings of the same repo can bypass HF cache and produce duplicate downloads.
1/7 Low Type annotation dict[str, Any] (Python 3.9+) used in _load_model_config_metadata while rest of file uses Dict[str, Any] from typing.

Concrete suggestions for each finding below.

Fix 1: Restore caching layer (7/7 consensus)

The PR's new detection logic should be placed inside _is_vision_model_uncached() while keeping main's cache wrapper. This is the correct merge resolution.

Fix 2: Subprocess/fallback ordering for t5 models (2/7)

For needs_transformers_5 models, use subprocess first (as main did originally), then fall back to raw config.json. Do not try the known-failing in-process AutoConfig load.

Fix 3: Narrow _is_vlm_config() to exclude seq2seq (1/7)

Add _NON_VLM_CONDITIONAL_MODEL_TYPES set for t5, bart, mbart, mt5, pegasus, blenderbot to prevent false positives.

Fix 4: Update _VISION_CHECK_SCRIPT (1/7)

Add audio-only exclusion logic to the subprocess script to match _is_vlm_config().

Fix 5: Use resolve_cached_repo_id_case() in _load_model_config_metadata (1/7)

Fix 6: Use Dict[str, Any] instead of dict[str, Any] for consistency

Properly combines PR unslothai#4868's new detection logic with main's caching layer:
- Add _is_vlm_config() helper for unified VLM detection across config types
- Add _load_model_config_metadata() as raw config.json fallback when both
  AutoConfig and subprocess fail for transformers-5-only models
- For needs_transformers_5 models: subprocess first, then raw config fallback
- Add _NON_VLM_CONDITIONAL_MODEL_TYPES to prevent seq2seq false positives
- Update _VISION_CHECK_SCRIPT with audio-only and seq2seq exclusion
- Preserve vision detection cache and error classification from main
@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors VLM detection logic by centralizing it into a helper function and adding a fallback to raw config metadata when subprocess checks fail. The reviewer identified logic duplication between the module and subprocess script that should be addressed to ensure consistency, and recommended restoring detailed logging in the main detection path to maintain observability.

Comment on lines +536 to 557
audio_only_types = {"csm", "whisper"}
non_vlm_seq2seq_types = {"bart", "mbart", "t5", "mt5", "pegasus",
"blenderbot", "blenderbot-small"}

is_vlm = False
if hasattr(config, "architectures"):
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in config.architectures
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and hasattr(config, "model_type"):
vlm_types = {"phi3_v","llava","llava_next","llava_onevision",
"internvl_chat","cogvlm2","minicpmv"}
if config.model_type in vlm_types:
if model_type not in audio_only_types and model_type not in non_vlm_seq2seq_types:
if hasattr(config, "architectures"):
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in config.architectures
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and model_type in {
"phi3_v", "llava", "llava_next", "llava_onevision",
"internvl_chat", "cogvlm2", "minicpmv",
}:
is_vlm = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The VLM detection logic and model type sets are duplicated inside the _VISION_CHECK_SCRIPT string. This creates a maintenance risk where updates to the module-level constants (like _AUDIO_ONLY_MODEL_TYPES, _NON_VLM_CONDITIONAL_MODEL_TYPES, or _VLM_MODEL_TYPES) might not be reflected in the subprocess check. Consider using string formatting to inject these constants into the script template to ensure consistency across both detection paths.


try:
config = load_model_config(model_name, use_auth = True, token = hf_token)
return _is_vlm_config(config)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The refactoring of _is_vision_model_uncached removed detailed logging that explained why a model was detected as a VLM. While the new _is_vlm_config helper improves code reuse, restoring logging here would maintain observability for the standard detection path, matching the logging added to the subprocess fallback path at line 796.

        is_vlm = _is_vlm_config(config)
        if is_vlm:
            logger.info(
                "Model %s detected as VLM: model_type=%s architectures=%s",
                model_name,
                getattr(config, "model_type", None),
                getattr(config, "architectures", []),
            )
        return is_vlm

…ion markers

- Remove ForConditionalGeneration from VLM arch suffix heuristic. This
  caused false positives for seq2seq models (T5, BART, mBART, UMT5,
  M2M100, Switch Transformers, LongT5, Pegasus, Blenderbot)
- Rely on explicit vision signals: vision_config, img_processor,
  image_token_index, ForVisionText2Text suffix, and known model types
- Remove _NON_VLM_CONDITIONAL_MODEL_TYPES (no longer needed)
- Add _classify_detection_error helper for permanent vs transient error
  classification, now used in both t5 fallback and non-t5 except paths
- _load_model_config_metadata now returns (data, error) tuple so callers
  can classify permanent errors (repo not found, gated) for caching
- Update _VISION_CHECK_SCRIPT subprocess to match new detection logic

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Loop 2 - Consolidated Findings

After loop 1 fixes, ran 7 reviewer.py reviews + 3 Sonnet subagents + gemini_review. Results: 4/7 APPROVE, 3/7 REQUEST_CHANGES.

Reviewers Severity Finding
1/7 High ForConditionalGeneration heuristic still false-positives on UMT5, M2M100, Switch Transformers, LongT5 -- the exclusion set approach is fundamentally fragile
1/7 High Seq2seq exclusion can suppress real multimodal configs that use a base model_type (e.g., T5-based VLMs with vision_config)
2/7 Medium t5 raw-config fallback path returns None for permanent errors (RepositoryNotFoundError, GatedRepoError), preventing caching

Fixes Applied (committed)

  1. Removed ForConditionalGeneration from VLM detection entirely. Instead of maintaining an ever-growing exclusion list, we now rely on explicit vision signals only:

    • vision_config (most VLMs: LLaVA, Gemma-3/4, Qwen-VL, etc.)
    • img_processor (Phi-3.5 Vision)
    • image_token_index (common in VLMs)
    • ForVisionText2Text architecture suffix (VLM-specific)
    • Known _VLM_MODEL_TYPES set

    This eliminates all seq2seq false positives while still detecting all real VLMs. Models with T5 base but vision capabilities (like Pix2Struct) are still correctly detected via their vision_config.

  2. Removed _NON_VLM_CONDITIONAL_MODEL_TYPES -- no longer needed since we don't use ForConditionalGeneration.

  3. Added _classify_detection_error() helper to classify permanent vs transient errors in both the t5 fallback and non-t5 paths. _load_model_config_metadata now returns (data, error) tuple so the t5 path can cache permanent failures.

  4. Updated _VISION_CHECK_SCRIPT subprocess to match the new detection logic.

Test Results

  • 33 simulation tests covering VLMs, audio-only, seq2seq, edge cases: all pass
  • 8 subprocess script simulation tests: all pass
  • UMT5, M2M100, Switch Transformers, LongT5: all correctly classified as non-VLM
  • T5 with vision_config: correctly detected as VLM (vision signals take priority)

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

…ration

Since ForConditionalGeneration was removed as a VLM signal, known VLM
model types that use it need to be in _VLM_MODEL_TYPES as a safety net.
In practice, all these models expose vision_config in their real HF
configs so they would be detected regardless, but the model_type list
ensures compatibility with simplified mock configs (test_vision_cache.py)
and edge cases.

Added: gemma3, gemma4, qwen2_vl, qwen2_5_vl, qwen3_5, paligemma,
pix2struct, video_llava, blip-2, blip_2, idefics2, idefics3, mllama,
chameleon. Also updated subprocess script to match.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors VLM detection logic in model_config.py by adding a fallback to raw config metadata and implementing error classification for improved caching. Review feedback recommends simplifying the detection logic's boolean structure for better maintainability and restoring docstring details regarding the function's return values and transient failure handling.

I am having trouble creating individual review comments. Click here to see my feedback.

studio/backend/utils/models/model_config.py (524-543)

medium

This logic for VLM detection can be improved for readability and maintainability.

  1. The chain of if not is_vlm checks can be simplified into a single boolean expression.
  2. The audio_only_types set and the VLM model types set are duplicates of module-level constants (_AUDIO_ONLY_MODEL_TYPES and _VLM_MODEL_TYPES). To avoid inconsistencies, consider making _VISION_CHECK_SCRIPT an f-string and injecting these constants.

Here's a suggestion that addresses the first point:

    audio_only_types = {"csm", "whisper"}

    is_vlm = False
    if model_type not in audio_only_types:
        is_vlm = (
            hasattr(config, "vision_config") or
            hasattr(config, "img_processor") or
            hasattr(config, "image_token_index") or
            (hasattr(config, "architectures") and any(
                x.endswith("ForVisionText2Text")
                for x in config.architectures
            )) or
            model_type in {
                "phi3_v", "llava", "llava_next", "llava_onevision",
                "internvl_chat", "cogvlm2", "minicpmv",
            }
        )

studio/backend/utils/models/model_config.py (560-563)

medium

The removed part of the docstring explaining the return value (Optional[bool]) is important for understanding the function's contract. It clarifies that None indicates a transient failure, which is crucial for the caller's logic. Please consider adding this information back to maintain clarity.

- Remove FileNotFoundError from permanent error classification to avoid
  stale negative caching for local model dirs still being set up
- Return (None, None) instead of (None, FileNotFoundError) for local
  paths missing config.json -- absence is unknown state, not an error
- Add xgenmm and qwen3_vl to _VLM_MODEL_TYPES for completeness

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Loop 3 - Consolidated Findings

Ran 7 reviewer.py reviews + 3 Sonnet subagents + gemini_review. Results: 4/7 APPROVE, 3/7 REQUEST_CHANGES.

Reviewers Severity Finding
3/7 High FileNotFoundError in _classify_detection_error was caching local path misses as permanent False -- stale for in-progress model dirs
2/7 High _VLM_MODEL_TYPES missing xgenmm and qwen3_vl

Fixes Applied

  1. Removed FileNotFoundError from permanent error classification. Local paths without config.json now return (None, None) instead of (None, FileNotFoundError), treating absence as unknown rather than a cacheable failure.

  2. Added xgenmm and qwen3_vl to _VLM_MODEL_TYPES and subprocess script.

All 3 loops complete. Summary of changes from original PR:

  1. Properly merged with main's caching layer (is_vision_model -> cache -> _is_vision_model_uncached)
  2. Removed ForConditionalGeneration as a VLM signal (caused seq2seq false positives)
  3. Added comprehensive _VLM_MODEL_TYPES as safety net for known VLMs
  4. Added _is_vlm_config() for unified detection across config types (dict and object)
  5. Added _load_model_config_metadata() as raw config.json fallback for t5 models
  6. Added _classify_detection_error() for consistent permanent/transient error handling
  7. Updated _VISION_CHECK_SCRIPT subprocess to match
  8. Preserved all error classification and caching semantics from main

@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the VLM detection logic by centralizing it into a new _is_vlm_config helper and introducing a fallback mechanism that retrieves raw config.json metadata when AutoConfig fails. It also expands the list of supported VLM model types and improves error classification for caching purposes. Review feedback suggests removing qwen3_5 from the VLM list to prevent incorrect detection of standard LLMs and recommends using the centralized helper within the detection script to eliminate code duplication.

"gemma4",
"qwen2_vl",
"qwen2_5_vl",
"qwen3_5",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

qwen3_5 appears to be the model type for the Qwen 3.5 LLM family (e.g., the MoE models), not specifically a VLM. Including it in _VLM_MODEL_TYPES will cause standard LLMs to be incorrectly detected as vision models. The VLM variant is typically identified as qwen3_vl (which is already in the list).

Suggested change
"qwen3_5",
"paligemma",

Comment on lines +542 to 566
model_type = getattr(config, "model_type", None)
audio_only_types = {"csm", "whisper"}

is_vlm = False
if hasattr(config, "architectures"):
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in config.architectures
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and hasattr(config, "model_type"):
vlm_types = {"phi3_v","llava","llava_next","llava_onevision",
"internvl_chat","cogvlm2","minicpmv"}
if config.model_type in vlm_types:
if model_type not in audio_only_types:
if hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and hasattr(config, "architectures"):
is_vlm = any(
x.endswith("ForVisionText2Text")
for x in config.architectures
)
if not is_vlm and model_type in {
"phi3_v", "llava", "llava_next", "llava_onevision",
"internvl_chat", "cogvlm2", "minicpmv", "gemma3", "gemma4",
"qwen2_vl", "qwen2_5_vl", "qwen3_5", "paligemma",
"pix2struct", "video_llava", "blip-2", "blip_2",
"idefics2", "idefics3", "mllama", "chameleon",
"xgenmm", "qwen3_vl",
}:
is_vlm = True

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The VLM detection logic and model lists are duplicated inside _VISION_CHECK_SCRIPT. Since backend_dir is added to sys.path, the script can import and use the centralized _is_vlm_config helper. This improves maintainability and ensures consistency between in-process and subprocess detection.

Suggested change
model_type = getattr(config, "model_type", None)
audio_only_types = {"csm", "whisper"}
is_vlm = False
if hasattr(config, "architectures"):
is_vlm = any(
x.endswith(("ForConditionalGeneration", "ForVisionText2Text"))
for x in config.architectures
)
if not is_vlm and hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and hasattr(config, "model_type"):
vlm_types = {"phi3_v","llava","llava_next","llava_onevision",
"internvl_chat","cogvlm2","minicpmv"}
if config.model_type in vlm_types:
if model_type not in audio_only_types:
if hasattr(config, "vision_config"):
is_vlm = True
if not is_vlm and hasattr(config, "img_processor"):
is_vlm = True
if not is_vlm and hasattr(config, "image_token_index"):
is_vlm = True
if not is_vlm and hasattr(config, "architectures"):
is_vlm = any(
x.endswith("ForVisionText2Text")
for x in config.architectures
)
if not is_vlm and model_type in {
"phi3_v", "llava", "llava_next", "llava_onevision",
"internvl_chat", "cogvlm2", "minicpmv", "gemma3", "gemma4",
"qwen2_vl", "qwen2_5_vl", "qwen3_5", "paligemma",
"pix2struct", "video_llava", "blip-2", "blip_2",
"idefics2", "idefics3", "mllama", "chameleon",
"xgenmm", "qwen3_vl",
}:
is_vlm = True
from utils.models.model_config import _is_vlm_config
is_vlm = _is_vlm_config(config)
model_type = getattr(config, "model_type", None)

- Add image_token_id as an explicit vision signal in _is_vlm_config and
  subprocess script (used by SmolVLM-style configs)
- Add gemma3n, qwen3_vl_moe, smolvlm to _VLM_MODEL_TYPES
- Keep subprocess script model type list in sync

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Loop 4 - Consolidated Findings

Ran 7 reviewer.py reviews + 3 Sonnet subagents + gemini_review. Results: 5/7 APPROVE, 2/7 REQUEST_CHANGES.

Reviewers Severity Finding
2/7 High Missing VLM model types (gemma3n, qwen3_vl_moe, smolvlm) and missing image_token_id as a vision signal

Fixes Applied

  1. Added image_token_id as an explicit vision signal in both _is_vlm_config() and _VISION_CHECK_SCRIPT. This is used by SmolVLM-style configs that have image_token_id instead of image_token_index.

  2. Added gemma3n, qwen3_vl_moe, smolvlm to _VLM_MODEL_TYPES and subprocess script.

Verification

  • SmolVLM2 (model_type=smolvlm) has both vision_config and image_token_id -- detected via vision_config, image_token_id now also serves as a secondary signal.
  • All previously verified models (Qwen3.5, Gemma4, Whisper, T5, etc.) remain correctly classified.

These VLMs have no explicit vision signals (vision_config, img_processor,
image_token_index, image_token_id) in their config.json, relying solely
on model_type for detection.
@danielhanchen

Copy link
Copy Markdown
Member Author

/gemini review

@danielhanchen

Copy link
Copy Markdown
Member Author

@codex review

@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@chatgpt-codex-connector

Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, add credits to your account and enable them for code reviews in your settings.

…ation

- _is_vlm_config now checks values are not None, not just key presence.
  Prevents false positives from configs with "vision_config": null
- _VISION_CHECK_SCRIPT updated to use getattr with None checks
- _classify_detection_error now treats EntryNotFoundError and
  RevisionNotFoundError as permanent (cacheable as False)

@danielhanchen danielhanchen left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Loop 5 - Consolidated Findings

Ran 7 reviewer.py reviews + 3 Sonnet subagents + gemini_review. Results: 5/7 APPROVE, 2/7 REQUEST_CHANGES.

Reviewers Severity Finding
1/7 High Null placeholder vision fields (e.g. "vision_config": null) treated as positive VLM signal -- key presence check should verify value is not None
1/7 Medium EntryNotFoundError/RevisionNotFoundError not in permanent error classification -- causes repeated probes for repos missing config.json

Fixes Applied

  1. Null-safe vision field checks: _is_vlm_config() now retrieves field values and checks is not None instead of just key/attribute presence. Prevents false positives from configs with placeholder null fields. Updated both in-process and subprocess detection.

  2. Added EntryNotFoundError and RevisionNotFoundError to _classify_detection_error() as permanent errors. Repos that exist but don't have config.json are now cached as False instead of retrying on every call.

Use getattr with None default instead of hasattr to prevent TypeError
when config.architectures is None (hasattr returns True for None values,
but iterating None raises TypeError).
@danielhanchen

Copy link
Copy Markdown
Member Author

Fixes pushed to original PR unslothai#4868. Closing staging copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Can't train Qwen3.5 or Gemma4 on multimodal datasets in Unsloth Studio

1 participant