studio: improve onboarding UX, tooltips, and training defaults by danielhanchen · Pull Request #4355 · unslothai/unsloth

danielhanchen · 2026-03-17T10:16:53Z

Summary

Change splash text to "Train and run LLMs locally"
Add "Chat Only" card with BubbleChatIcon on step 1 to skip directly to /chat
Add Skip/Skip to Chat buttons in sidebar (step 1 = "Skip to Chat", steps 2+ = "Skip") and footer (step 1 Back button = "Skip" returning to splash screen)
Make all tooltips clickable in addition to hover (via React context in tooltip component)
Strip surrounding quotes from pasted HF tokens (fixes "invalid or expired token" when pasting "hf_...")
Rename "Eval Split" to "Evaluation Split"
Add SparklesIcon to "Auto Detect" format option in dataset step
Change step 4 heading from "Training" to "Choose your training parameters"
Default max_steps changed to 60
Learning rate displayed in scientific notation (e.g. 2e-4) with +/- steppers following natural sequence (1e-4, 2e-4, ..., 9e-4, 1e-3, ...)
Context length options capped by model's max_position_embeddings fetched via AutoConfig; falls back to 64K if unavailable
Fix "QLORA"/"LORA" to "QLoRA"/"LoRA" in summary step
Change "Watch video guide" to "Get started with our guide" linking to https://unsloth.ai/docs/new/studio/start
Update intro text to "Choose a path - fine-tune LLMs, vision, embedding, audio models or just chat"
Backend: add max_position_embeddings field to model config endpoint response (via AutoConfig with text_config fallback)

Test plan

Verify onboarding splash screen shows "Train and run LLMs locally"
Click "Chat Only" card on step 1 -- should navigate to /chat
Verify Skip/Skip to Chat buttons appear correctly per step
Click tooltip info icons -- should toggle on click and hover
Paste HF token with quotes -- should strip quotes and validate
Verify context length dropdown is capped by model's max context
Verify learning rate shows scientific notation and +/- works
Verify summary step shows QLoRA/LoRA instead of QLORA/LORA

- Change splash text to "Train and run LLMs locally" - Add "Chat Only" card with BubbleChatIcon to skip directly to chat - Add Skip/Skip to Chat buttons in sidebar and footer - Back button on step 1 returns to splash screen instead of being disabled - Change "Watch video guide" to "Get started with our guide" with new URL - Update intro text to mention all model types + chat - Make all tooltips clickable (in addition to hover) via React context - Strip surrounding quotes from pasted HF tokens - Rename "Eval Split" to "Evaluation Split" - Add SparklesIcon to "Auto Detect" format option - Change step 4 heading to "Choose your training parameters" - Default max_steps to 60 - Learning rate displayed in scientific notation with +/- stepper - Context length options capped by model's max_position_embeddings (via AutoConfig) - Fix "QLORA"/"LORA" to "QLoRA"/"LoRA" in summary step - Backend: add max_position_embeddings to model config endpoint

chatgpt-codex-connector · 2026-03-17T10:17:01Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

for more information, see https://pre-commit.ci

gemini-code-assist · 2026-03-17T10:17:33Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the user experience of the Unsloth Studio onboarding process. It introduces clearer navigation options, improves the usability of interactive elements like tooltips and input fields, and refines the presentation of training parameters. Additionally, it adds backend support for model context length, allowing for more intelligent default settings and user guidance during model configuration.

Highlights

Onboarding UX Improvements: Updated splash screen text, introduced a 'Chat Only' card for direct navigation, and added 'Skip'/'Skip to Chat' buttons in the sidebar and footer.
Tooltip Functionality: Made all tooltips clickable in addition to their existing hover functionality.
Hugging Face Token Handling: Implemented automatic stripping of surrounding quotes from pasted Hugging Face tokens to prevent validation errors.
Training Parameter Refinements: Renamed 'Eval Split' to 'Evaluation Split', added a SparklesIcon to 'Auto Detect' dataset format, changed the step 4 heading, and defaulted max_steps to 60.
Learning Rate Display and Input: Updated learning rate display to scientific notation and introduced steppers that follow a natural sequence (e.g., 1e-4, 2e-4, ..., 1e-3).
Context Length Configuration: Capped context length options based on the model's max_position_embeddings, with a fallback to 64K if not available.
Terminology Consistency: Corrected 'QLORA'/'LORA' to 'QLoRA'/'LoRA' in the summary step.
Documentation and Guidance: Updated the introductory text and changed 'Watch video guide' to 'Get started with our guide' with a new link.
Backend Model Configuration: Added max_position_embeddings to the model config endpoint response, fetched via AutoConfig.

Changelog

studio/backend/models/models.py
- Added max_position_embeddings field to ModelDetails for storing the maximum context length.
studio/backend/routes/models.py
- Implemented logic to extract max_position_embeddings from ModelConfig or AutoConfig and include it in the ModelDetails response.
studio/frontend/src/components/ui/tooltip.tsx
- Modified the Tooltip component to support toggling visibility on click, in addition to hover.
studio/frontend/src/config/training.ts
- Updated the default maxSteps hyperparameter from 0 to 60.
studio/frontend/src/features/onboarding/components/splash-screen.tsx
- Changed the splash screen subtitle text to "Train and run LLMs locally".
studio/frontend/src/features/onboarding/components/steps/dataset-step.tsx
- Imported SparklesIcon and added it next to the "Auto Detect" option in the dataset format selection.
studio/frontend/src/features/onboarding/components/steps/hyperparameters-step.tsx
- Removed unused Input component import.
- Added formatLR and stepLR utility functions for scientific notation learning rate handling.
- Updated the FieldLegend text for the training parameters section.
- Implemented logic to filter CONTEXT_LENGTHS based on the model's maxPositionEmbeddings.
- Replaced the learning rate input field with custom stepper buttons that display and adjust values in scientific notation.
studio/frontend/src/features/onboarding/components/steps/model-type-step.tsx
- Imported BubbleChatIcon, markOnboardingDone, and useNavigate.
- Updated the introductory text and the link for the video guide.
- Added a new "Chat Only" card that allows users to skip directly to the chat interface.
studio/frontend/src/features/onboarding/components/steps/summary-step.tsx
- Corrected the display of training methods from "QLORA"/"LORA" to "QLoRA"/"LoRA".
- Changed the learning rate display to use exponential notation (toExponential()).
studio/frontend/src/features/onboarding/components/wizard-footer.tsx
- Modified the "Back" button to act as a "Skip" button, returning to the splash screen when on the first step.
studio/frontend/src/features/onboarding/components/wizard-layout.tsx
- Passed the onBackToSplash callback to WizardFooter.
studio/frontend/src/features/onboarding/components/wizard-sidebar.tsx
- Added "Skip" and "Skip to Chat" buttons to the sidebar, visible on both mobile and desktop.
studio/frontend/src/features/training/api/models-api.ts
- Added max_position_embeddings as an optional field to the ModelConfigResponse interface.
studio/frontend/src/features/training/components/hf-dataset-subset-split-selectors.tsx
- Renamed the "Eval Split" label to "Evaluation Split" in the dataset selectors.
studio/frontend/src/features/training/stores/training-config-store.ts
- Added maxPositionEmbeddings to the initial state and non-persisted keys.
- Updated loadAndApplyModelDefaults to store max_position_embeddings from model details.
- Modified setHfToken to automatically strip surrounding quotes from the input token.
studio/frontend/src/features/training/types/config.ts
- Added maxPositionEmbeddings to the TrainingConfigState interface.
studio/frontend/src/hooks/use-hf-token-validation.ts
- Implemented logic to strip surrounding quotes from the debounced Hugging Face token before validation.

Activity

No specific activity (comments, reviews, progress updates) was provided in the context for this pull request.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

for more information, see https://pre-commit.ci

gemini-code-assist

Code Review

This pull request introduces several valuable user experience enhancements to the Studio onboarding process. The changes, which span both the frontend and backend, include making tooltips clickable, improving training parameter controls with scientific notation for learning rates, and adding more flexible navigation options like skipping to chat. The backend now correctly determines and exposes the model's maximum context length. The code is generally well-structured, but I've noted a couple of areas for improvement: one regarding code duplication in the backend that could be refactored for better maintainability, and a minor UX inconsistency in a navigation flow.

gemini-code-assist · 2026-03-17T10:20:45Z

+        if max_position_embeddings is None:
+            try:
+                from transformers import AutoConfig as _AutoConfig
+


This logic for extracting max_position_embeddings is a duplicate of the logic in lines 388-391. To improve maintainability and reduce redundancy, consider extracting this into a helper function.

For example:

def _get_max_pos_embeddings(config_obj): if hasattr(config_obj, "max_position_embeddings"): return config_obj.max_position_embeddings if hasattr(config_obj, "text_config") and hasattr(config_obj.text_config, "max_position_embeddings"): return config_obj.text_config.max_position_embeddings return None

You could then call this helper in both places to keep the code DRY.

gemini-code-assist · 2026-03-17T10:20:45Z

+        className="mt-3 hidden text-xs text-muted-foreground md:flex"
+        onClick={() => {
+          markOnboardingDone();
+          navigate({ to: "/studio" });


There's a minor UX inconsistency here. The button's label is "Skip to Chat" on the first step, but it navigates to /studio. For a more intuitive experience, it should navigate to /chat to match the label, similar to how the "Chat Only" card on the same step behaves.

Suggested change

navigate({ to: "/studio" });

navigate({ to: currentStep === 1 ? "/chat" : "/studio" });

for more information, see https://pre-commit.ci

- Change Qwen3.5 thinking threshold from <=2B to <9B (0.8B, 2B, 4B all disable thinking by default; 9B+ enables it) - Always pass enable_thinking=False in AI Assist helper calls (_run_with_helper and _generate_with_backend) regardless of chat thinking settings

- Extract _get_max_position_embeddings helper to DRY config extraction - Fix "Skip to Chat" to navigate to /chat on step 1 (was /studio)

for more information, see https://pre-commit.ci

While streaming SVG content, the syntax highlighter (Shiki) re-parses the entire growing SVG on every token, blocking the main thread and freezing the code area until the fence closes. Show a plain-text preview for incomplete SVG fences instead, similar to how Mermaid diagrams show a placeholder while streaming.

Per Qwen3.5 docs (unsloth.ai/docs/models/qwen3.5), top_k should be 20 for both thinking and non-thinking modes. The model-specific config in inference_defaults.json already had top_k=20 for Qwen3.5, but the generic fallback defaults were wrong: - Frontend DEFAULT_INFERENCE_PARAMS.topK: 50 -> 20 - Backend generate_chat_completion top_k: 40 -> 20 - Backend generate_chat_completion_with_tools top_k: 40 -> 20 - Frontend title generation top_k: 40 -> 20

Default params for any model without specific config: temperature=0.6, top_p=0.95, top_k=20, min_p=0.01, presence_penalty=0.0, repetition_penalty=1.0 Models with entries in inference_defaults.json (Qwen3.5, Gemma-3, Llama, etc.) override these with their recommended values. Updated in: frontend DEFAULT_INFERENCE_PARAMS, backend Pydantic request models, and backend generate_chat_completion defaults.

Only set trust_remote_code=True when the model name starts with "unsloth/". All other models default to False for safety.

The "Generating" spinner was below the send message bar, causing the bar to jump up and down. Move it above the composer in both the regular thread view and the welcome/empty view.

Move the X close button on toasts (like "Starting model...") from top-1.5 to top-3 and add right-3, giving more breathing room from the top-right corner.

Reduce gap from 1.5 to 0.5, padding from px-2.5/py-1 to px-2/py-0.5, and icon from size-3.5 to size-3.

- Move Generating spinner above composer (fixes jumping send bar) - Make Think button smaller with tighter icon-text gap - Chat card now inside grid (same size as Audio/Embeddings cards) - Rename "Chat Only" to "Chat" - Chat card requires Continue to proceed (no auto-advance) - Continue on Chat selection skips onboarding and goes to /chat - Tooltip (i) click on Chat card doesn't trigger navigation - Step 1 footer Back button goes back to splash (label is "Back") - Splash "Skip Onboarding" renamed to "Skip to Chat", navigates to /chat - Toast close button moved away from edge

- Sidebar "Skip to Chat" now uses primary (green) Button style with arrow icon, full width, aligned like step items. Shows on all steps. - Footer: added "Skip" outline button next to Continue that goes directly to /studio with progress saved (markOnboardingDone)

The DEFAULT_MAX_STEPS in use-max-steps-epochs-toggle.ts was still 30, used as fallback when toggling from epochs back to max steps.

CONTEXT_LENGTHS now includes 65536, 131072, 262144 in addition to the existing 512-32768 range. The onboarding step filters these by the model's max_position_embeddings (e.g. Nemotron-3-Nano-4B has 262144), showing powers of 2 up to the model's maximum.

After selecting a model in onboarding, detect the total model weight file size from HF Hub (safetensors/bin files). Then estimate memory needed: model_size_gb * 1.5 * context_scale, where context_scale is: - <=8192 tokens: 1.0x - >8192 tokens: 1.7x - >=16384 tokens: 2.0x - >=32768 tokens: 4.0x If the estimate fits in free GPU VRAM, default to LoRA (16-bit). Otherwise default to QLoRA (4-bit). Backend changes: - Add model_size_bytes to ModelDetails (models.py) - Add _get_model_size_bytes() using HfApi.repo_info (routes/models.py) - Add vram_free_gb to get_gpu_summary (hardware.py) Frontend changes: - Add autoSelectTrainingMethod() in training-config-store.ts - Called after model defaults are loaded - Add model_size_bytes to ModelConfigResponse type - Add vramFreeGb to HardwareInfo hook

for more information, see https://pre-commit.ci

For GGUF repos, the trash icon now appears on each downloaded variant row inside the quantization expander instead of on the repo-level row. Backend accepts optional variant param to delete specific GGUF files (blob + symlink) rather than the entire repo cache.

The Max Tokens slider was capped at 32768 on page refresh because ggufContextLength was not restored from the status response. Now set it from statusRes.context_length on reconnect.

The train-on-responses-only feature uses template markers to find where the assistant response starts. The Qwen3.5 response marker included '<think>\n' which is only present when thinking mode is enabled. With thinking disabled (default for <9B), the marker never matched, causing 100% of samples to be dropped. Changed response marker from '<|im_start|>assistant\n<think>\n' to '<|im_start|>assistant\n' which works regardless of thinking mode.

Register python and terminal tools alongside web search. Python executor validates imports (stdlib only) via unsloth_zoo rl_environments, runs code in a subprocess sandbox with 5-min timeout and cancel support. Terminal executor blocks dangerous commands (rm, sudo, etc.) and runs in a temp directory. Update llama_cpp tool loop to show tool-specific status messages and pass cancel_event through to executors. Rename composer toggle from "Search" to "Tools" and show TerminalIcon for execution status pills.

for more information, see https://pre-commit.ci

… port binding Backend: - Dynamic transformers 5.x detection via tokenizer_config.json fetch (checks for TokenizersBackend class, cached per-model) - Bump transformers 5.x version from 5.2.0 to 5.3.0 across all workers, setup scripts (setup.sh, setup.ps1) - Auto-enable trust_remote_code for unsloth/* models needing transformers 5.x (workaround for NemotronH config parsing bug in transformers) - Auto-install mamba-ssm/causal-conv1d for SSM models (NemotronH, Falcon-H1) with --no-build-isolation --no-deps to avoid torch version conflicts - Add SO_REUSEADDR to port check in run.py (fixes Colab proxy stale connection falsely reporting port as in-use) Frontend: - Fix "Skip to Chat" navigation: use window.location.href instead of React Router navigate() to bypass useEffect redirect race - Fix "Skip Onboarding" on splash: navigates to /studio (not /chat) - Fix onboarding guard: only check isOnboardingDone() on initial mount - Fix Chat card on step 1: add sr-only spacer for consistent alignment - Fix Chat+Text both selected: clear RadioGroup value when Chat is selected

for more information, see https://pre-commit.ci

Replace the single "Tools" toggle with two independent toggles: - "Search" (globe icon) enables web search only - "Code" (terminal icon) enables Python and terminal execution Add enabled_tools list field to the inference payload so the backend only registers the tools the user has toggled on. Both toggles appear in the main composer and the compare composer.

for more information, see https://pre-commit.ci

Replace unsloth_zoo-dependent import checker with a standalone ast-based validator using sys.stdlib_module_names. This properly blocks non-stdlib imports (numpy, requests, etc.) and returns a clear error message to the model so it can rewrite using only stdlib. Add full traceback to tool streaming error logs for debugging.

for more information, see https://pre-commit.ci

gpt-oss models emit multi-channel output via harmony protocol tokens (<|channel|>analysis<|message|>... and <|channel|>final<|message|>...). TextIteratorStreamer with skip_special_tokens=True strips the special tokens but leaves channel names concatenated with content, producing garbled output like "analysisWe need to...assistantfinalHello!". Add HarmonyTextStreamer that decodes with skip_special_tokens=False, parses harmony markup via regex, and emits <think>analysis</think> for the analysis channel and plain text for the final channel -- reusing the existing frontend reasoning UI. Also expose supports_reasoning=True for non-GGUF gpt-oss models in the /status endpoint so the frontend enables the Think toggle.

for more information, see https://pre-commit.ci

Set UNSLOTH_IS_PRESENT=1 and import check_python_modules and check_signal_escape_patterns directly from unsloth_zoo instead of a standalone fallback. This gives us the full Unsloth validation including stdlib-only import checks and signal/timeout escape pattern detection.

for more information, see https://pre-commit.ci

Remove stdlib-only import restriction. Keep signal escape pattern detection via unsloth_zoo for safety.

The 0.5s read timeout used for cancel-checking during streaming also fires when waiting for the first response from llama-server (e.g. reasoning model thinking for 15+ seconds). Add _stream_with_retry() context manager that retries on ReadTimeout while checking cancel_event, so the model has unlimited time to think before producing the first token. Applied to both the regular streaming path and the tool-calling final pass.

The delta-on-transformed approach had two critical bugs: 1. Before the full <|channel|>X<|message|> pattern was complete, the strip-tokens fallback emitted "analysis" as plain text. Then when the regex matched, _transform returned a completely different format (<think>...</think>) and the delta was computed against the wrong base string, producing fragments like "think>", "nk>", ">". 2. Even with full matches, the closing </think> tag shifted position as content grew, so text[prev_len:] produced garbled deltas. Replace with stateful incremental parsing that: - Buffers until a complete channel+message pair is seen - Emits <think> once when analysis channel first appears - Streams analysis content deltas (computed on channel content directly) - Emits </think> once when final channel first appears - Streams final content deltas - Closes open think tags in end() Also skip the generic all_special_tokens stripping in _clean_generated_text for gpt-oss since HarmonyTextStreamer already produces clean output and the generic stripping was mangling <think> tags.

for more information, see https://pre-commit.ci

Integrates generalized model comparison into the onboarding-improvements branch. Resolves import conflict in shared-composer.tsx and fixes unused variable in compare flow.

…bset The gpt-oss tokenizer has added tokens like <|return|> (id=200002) that are not part of the harmony channel protocol but can leak into output. The previous regex only stripped channel|message|start|end tokens. Broaden the _clean_generated_text regex for gpt-oss to <\|[a-z_]+\|> which catches all pipe-delimited tokens (return, constrain, reserved, etc.) without matching <think>/<\/think> tags. Verified: gpt-oss all_special_tokens are only <|return|>, <|reserved_200017|>, <|startoftext|> -- none overlap with <think>. The harmony tokens (channel, message, start, end) are added_tokens but not in all_special_tokens.

Repos that only have metadata/config files cached (no .safetensors or .bin weight files) were showing up in the Downloaded list with tiny sizes like "1.8 KB" or "24 KB". These are just leftover config snapshots from architecture checks, not usable models. Filter the cached-models endpoint to only include repos that contain actual model weight files (.safetensors or .bin).

Add explicit !text-muted-foreground to toast description classNames so secondary text (e.g. "Releases VRAM and resets inference state.") is readable in dark mode.

Replace sr-only span (takes no space) with a size-4 shrink-0 div matching the RadioGroupItem dimensions in other cards, so the Chat icon aligns vertically with Text/Audio/Vision/Embeddings icons.

…thai#4355) * studio: improve onboarding UX, tooltips, and training defaults - Change splash text to "Train and run LLMs locally" - Add "Chat Only" card with BubbleChatIcon to skip directly to chat - Add Skip/Skip to Chat buttons in sidebar and footer - Back button on step 1 returns to splash screen instead of being disabled - Change "Watch video guide" to "Get started with our guide" with new URL - Update intro text to mention all model types + chat - Make all tooltips clickable (in addition to hover) via React context - Strip surrounding quotes from pasted HF tokens - Rename "Eval Split" to "Evaluation Split" - Add SparklesIcon to "Auto Detect" format option - Change step 4 heading to "Choose your training parameters" - Default max_steps to 60 - Learning rate displayed in scientific notation with +/- stepper - Context length options capped by model's max_position_embeddings (via AutoConfig) - Fix "QLORA"/"LORA" to "QLoRA"/"LoRA" in summary step - Backend: add max_position_embeddings to model config endpoint * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * compare for 2 diff models * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * resolving gemini comments * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: disable thinking for Qwen3.5 <9B and always for AI Assist - Change Qwen3.5 thinking threshold from <=2B to <9B (0.8B, 2B, 4B all disable thinking by default; 9B+ enables it) - Always pass enable_thinking=False in AI Assist helper calls (_run_with_helper and _generate_with_backend) regardless of chat thinking settings * studio: address PR review comments - Extract _get_max_position_embeddings helper to DRY config extraction - Fix "Skip to Chat" to navigate to /chat on step 1 (was /studio) * fix: comment out debug print statements * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: skip Shiki highlighting for incomplete SVG code fences While streaming SVG content, the syntax highlighter (Shiki) re-parses the entire growing SVG on every token, blocking the main thread and freezing the code area until the fence closes. Show a plain-text preview for incomplete SVG fences instead, similar to how Mermaid diagrams show a placeholder while streaming. * studio: fix default top_k from 50/40 to 20 for chat inference Per Qwen3.5 docs (unsloth.ai/docs/models/qwen3.5), top_k should be 20 for both thinking and non-thinking modes. The model-specific config in inference_defaults.json already had top_k=20 for Qwen3.5, but the generic fallback defaults were wrong: - Frontend DEFAULT_INFERENCE_PARAMS.topK: 50 -> 20 - Backend generate_chat_completion top_k: 40 -> 20 - Backend generate_chat_completion_with_tools top_k: 40 -> 20 - Frontend title generation top_k: 40 -> 20 * studio: set universal inference defaults for unknown models Default params for any model without specific config: temperature=0.6, top_p=0.95, top_k=20, min_p=0.01, presence_penalty=0.0, repetition_penalty=1.0 Models with entries in inference_defaults.json (Qwen3.5, Gemma-3, Llama, etc.) override these with their recommended values. Updated in: frontend DEFAULT_INFERENCE_PARAMS, backend Pydantic request models, and backend generate_chat_completion defaults. * studio: only trust_remote_code for unsloth/ models in AutoConfig Only set trust_remote_code=True when the model name starts with "unsloth/". All other models default to False for safety. * studio: move Generating spinner above the composer The "Generating" spinner was below the send message bar, causing the bar to jump up and down. Move it above the composer in both the regular thread view and the welcome/empty view. * studio: adjust toast close button position away from edge Move the X close button on toasts (like "Starting model...") from top-1.5 to top-3 and add right-3, giving more breathing room from the top-right corner. * studio: make Think button smaller with tighter icon-text gap Reduce gap from 1.5 to 0.5, padding from px-2.5/py-1 to px-2/py-0.5, and icon from size-3.5 to size-3. * studio: multiple onboarding and chat UX improvements - Move Generating spinner above composer (fixes jumping send bar) - Make Think button smaller with tighter icon-text gap - Chat card now inside grid (same size as Audio/Embeddings cards) - Rename "Chat Only" to "Chat" - Chat card requires Continue to proceed (no auto-advance) - Continue on Chat selection skips onboarding and goes to /chat - Tooltip (i) click on Chat card doesn't trigger navigation - Step 1 footer Back button goes back to splash (label is "Back") - Splash "Skip Onboarding" renamed to "Skip to Chat", navigates to /chat - Toast close button moved away from edge * studio: align Skip to Chat button, add Skip to footer - Sidebar "Skip to Chat" now uses primary (green) Button style with arrow icon, full width, aligned like step items. Shows on all steps. - Footer: added "Skip" outline button next to Continue that goes directly to /studio with progress saved (markOnboardingDone) * studio: change default max steps from 30 to 60 in toggle hook The DEFAULT_MAX_STEPS in use-max-steps-epochs-toggle.ts was still 30, used as fallback when toggling from epochs back to max steps. * studio: extend context length options to 262K CONTEXT_LENGTHS now includes 65536, 131072, 262144 in addition to the existing 512-32768 range. The onboarding step filters these by the model's max_position_embeddings (e.g. Nemotron-3-Nano-4B has 262144), showing powers of 2 up to the model's maximum. * studio: auto-select LoRA vs QLoRA based on model size and GPU memory After selecting a model in onboarding, detect the total model weight file size from HF Hub (safetensors/bin files). Then estimate memory needed: model_size_gb * 1.5 * context_scale, where context_scale is: - <=8192 tokens: 1.0x - >8192 tokens: 1.7x - >=16384 tokens: 2.0x - >=32768 tokens: 4.0x If the estimate fits in free GPU VRAM, default to LoRA (16-bit). Otherwise default to QLoRA (4-bit). Backend changes: - Add model_size_bytes to ModelDetails (models.py) - Add _get_model_size_bytes() using HfApi.repo_info (routes/models.py) - Add vram_free_gb to get_gpu_summary (hardware.py) Frontend changes: - Add autoSelectTrainingMethod() in training-config-store.ts - Called after model defaults are loaded - Add model_size_bytes to ModelConfigResponse type - Add vramFreeGb to HardwareInfo hook * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: rename "Importing ML libraries..." to "Importing Unsloth..." * studio: show model/dataset in training status, fix LoRA/QLoRA casing - Training status now shows 'Training "model_name"' and 'Dataset = ...' instead of generic "Starting training..." - Fix Studio progress section to show QLoRA/LoRA instead of QLORA/LORA * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: rename 'Skip to Chat' to 'Skip Onboarding' on splash screen * studio: add presence_penalty support for chat inference Add presence_penalty as a parameter across the full stack: - Backend: llama_cpp.py generate_chat_completion/with_tools, Pydantic models (inference.py), routes/inference.py pass-through - Frontend: InferenceParams type, DEFAULT_INFERENCE_PARAMS (0.0), chat-adapter.ts payload, chat-settings-sheet.tsx slider (0-2), model defaults loading from inference_defaults.json - Set Qwen3.5 default presence_penalty to 1.5 per official docs - Default for unknown models is 0.0 (off) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix Chat card deselecting Text and aligning with other cards * studio: fix presence_penalty not loading from inference defaults The inference_config.py load_inference_config() was not including presence_penalty in the returned config dict, so the Qwen3.5 default of 1.5 from inference_defaults.json never reached the frontend. Added it to the config builder. * studio: add delete button for cached models in model selector Add trash icon on each downloaded model row (GGUF and safetensors) with confirmation dialog. Backend DELETE /api/models/delete-cached endpoint uses huggingface_hub scan_cache_dir + delete_revisions to cleanly remove cached repos, refusing if the model is currently loaded. * studio: restore inference defaults, reasoning, and tools on page refresh On page refresh with a model already loaded, the frontend was not re-applying model-specific inference defaults (presence_penalty, temperature, etc.) or restoring reasoning/tools support flags. Backend: Add inference config, supports_reasoning, supports_tools, and context_length to InferenceStatusResponse. Frontend: In the refresh callback, when an active model is detected, apply mergeRecommendedInference and restore reasoning/tools flags with proper Qwen3.5 size-based defaults. * studio: fix delete dialog closing before async completes Prevent AlertDialogAction's default close behavior with e.preventDefault() so the dialog stays open during deletion. Also block onOpenChange dismiss while deleting is in progress. * fix: add Dict and Any imports to inference models * studio: fix Qwen3.5 reasoning threshold in frontend load path The frontend loadModel handler had the old threshold (<=2) for disabling reasoning on small Qwen3.5 models. Changed to <9 to match the backend. This was causing 4B to not properly disable thinking by default when auto-loaded. * studio: move GGUF delete to per-variant level For GGUF repos, the trash icon now appears on each downloaded variant row inside the quantization expander instead of on the repo-level row. Backend accepts optional variant param to delete specific GGUF files (blob + symlink) rather than the entire repo cache. * studio: restore ggufContextLength on page refresh The Max Tokens slider was capped at 32768 on page refresh because ggufContextLength was not restored from the status response. Now set it from statusRes.context_length on reconnect. * fix: remove <think> from Qwen3.5 response template marker The train-on-responses-only feature uses template markers to find where the assistant response starts. The Qwen3.5 response marker included '<think>\n' which is only present when thinking mode is enabled. With thinking disabled (default for <9B), the marker never matched, causing 100% of samples to be dropped. Changed response marker from '<|im_start|>assistant\n<think>\n' to '<|im_start|>assistant\n' which works regardless of thinking mode. * studio: fix sloth ASCII art alignment in training overlay * fix: correct sloth ASCII art alignment to match Unsloth banner * studio: add Python and terminal tool calling to chat Register python and terminal tools alongside web search. Python executor validates imports (stdlib only) via unsloth_zoo rl_environments, runs code in a subprocess sandbox with 5-min timeout and cancel support. Terminal executor blocks dangerous commands (rm, sudo, etc.) and runs in a temp directory. Update llama_cpp tool loop to show tool-specific status messages and pass cancel_event through to executors. Rename composer toggle from "Search" to "Tools" and show TerminalIcon for execution status pills. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix Nemotron/transformers 5.x support, onboarding navigation, port binding Backend: - Dynamic transformers 5.x detection via tokenizer_config.json fetch (checks for TokenizersBackend class, cached per-model) - Bump transformers 5.x version from 5.2.0 to 5.3.0 across all workers, setup scripts (setup.sh, setup.ps1) - Auto-enable trust_remote_code for unsloth/* models needing transformers 5.x (workaround for NemotronH config parsing bug in transformers) - Auto-install mamba-ssm/causal-conv1d for SSM models (NemotronH, Falcon-H1) with --no-build-isolation --no-deps to avoid torch version conflicts - Add SO_REUSEADDR to port check in run.py (fixes Colab proxy stale connection falsely reporting port as in-use) Frontend: - Fix "Skip to Chat" navigation: use window.location.href instead of React Router navigate() to bypass useEffect redirect race - Fix "Skip Onboarding" on splash: navigates to /studio (not /chat) - Fix onboarding guard: only check isOnboardingDone() on initial mount - Fix Chat card on step 1: add sr-only spacer for consistent alignment - Fix Chat+Text both selected: clear RadioGroup value when Chat is selected * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: split tools toggle into Search and Code buttons Replace the single "Tools" toggle with two independent toggles: - "Search" (globe icon) enables web search only - "Code" (terminal icon) enables Python and terminal execution Add enabled_tools list field to the inference payload so the backend only registers the tools the user has toggled on. Both toggles appear in the main composer and the compare composer. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: fix tool calling import validation and error logging Replace unsloth_zoo-dependent import checker with a standalone ast-based validator using sys.stdlib_module_names. This properly blocks non-stdlib imports (numpy, requests, etc.) and returns a clear error message to the model so it can rewrite using only stdlib. Add full traceback to tool streaming error logs for debugging. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: parse gpt-oss harmony channels for clean safetensors chat output gpt-oss models emit multi-channel output via harmony protocol tokens (<|channel|>analysis<|message|>... and <|channel|>final<|message|>...). TextIteratorStreamer with skip_special_tokens=True strips the special tokens but leaves channel names concatenated with content, producing garbled output like "analysisWe need to...assistantfinalHello!". Add HarmonyTextStreamer that decodes with skip_special_tokens=False, parses harmony markup via regex, and emits <think>analysis</think> for the analysis channel and plain text for the final channel -- reusing the existing frontend reasoning UI. Also expose supports_reasoning=True for non-GGUF gpt-oss models in the /status endpoint so the frontend enables the Think toggle. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: use unsloth_zoo for Python sandbox validation Set UNSLOTH_IS_PRESENT=1 and import check_python_modules and check_signal_escape_patterns directly from unsloth_zoo instead of a standalone fallback. This gives us the full Unsloth validation including stdlib-only import checks and signal/timeout escape pattern detection. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * studio: allow all imports in Python tool sandbox Remove stdlib-only import restriction. Keep signal escape pattern detection via unsloth_zoo for safety. * studio: fix ReadTimeout on tool streaming final pass The 0.5s read timeout used for cancel-checking during streaming also fires when waiting for the first response from llama-server (e.g. reasoning model thinking for 15+ seconds). Add _stream_with_retry() context manager that retries on ReadTimeout while checking cancel_event, so the model has unlimited time to think before producing the first token. Applied to both the regular streaming path and the tool-calling final pass. * fix: rewrite HarmonyTextStreamer with stateful incremental parsing The delta-on-transformed approach had two critical bugs: 1. Before the full <|channel|>X<|message|> pattern was complete, the strip-tokens fallback emitted "analysis" as plain text. Then when the regex matched, _transform returned a completely different format (<think>...</think>) and the delta was computed against the wrong base string, producing fragments like "think>", "nk>", ">". 2. Even with full matches, the closing </think> tag shifted position as content grew, so text[prev_len:] produced garbled deltas. Replace with stateful incremental parsing that: - Buffers until a complete channel+message pair is seen - Emits <think> once when analysis channel first appears - Streams analysis content deltas (computed on channel content directly) - Emits </think> once when final channel first appears - Streams final content deltas - Closes open think tags in end() Also skip the generic all_special_tokens stripping in _clean_generated_text for gpt-oss since HarmonyTextStreamer already produces clean output and the generic stripping was mangling <think> tags. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: strip all <|...|> tokens in gpt-oss cleanup, not just harmony subset The gpt-oss tokenizer has added tokens like <|return|> (id=200002) that are not part of the harmony channel protocol but can leak into output. The previous regex only stripped channel|message|start|end tokens. Broaden the _clean_generated_text regex for gpt-oss to <\|[a-z_]+\|> which catches all pipe-delimited tokens (return, constrain, reserved, etc.) without matching <think>/<\/think> tags. Verified: gpt-oss all_special_tokens are only <|return|>, <|reserved_200017|>, <|startoftext|> -- none overlap with <think>. The harmony tokens (channel, message, start, end) are added_tokens but not in all_special_tokens. * fix: hide config-only model repos from cached models list Repos that only have metadata/config files cached (no .safetensors or .bin weight files) were showing up in the Downloaded list with tiny sizes like "1.8 KB" or "24 KB". These are just leftover config snapshots from architecture checks, not usable models. Filter the cached-models endpoint to only include repos that contain actual model weight files (.safetensors or .bin). * studio: fix toast description text contrast in dark mode Add explicit !text-muted-foreground to toast description classNames so secondary text (e.g. "Releases VRAM and resets inference state.") is readable in dark mode. * studio: fix Chat card icon alignment with size-4 spacer Replace sr-only span (takes no space) with a size-4 shrink-0 div matching the RadioGroupItem dimensions in other cards, so the Chat icon aligns vertically with Text/Audio/Vision/Embeddings icons. --------- Co-authored-by: workspace <user@workspace.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Manan17 <shahmanan170602@gmail.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>

danielhanchen requested review from Manan17, rolandtannous and wasimysaid as code owners March 17, 2026 10:16

[pre-commit.ci] auto fixes from pre-commit.com hooks

aec1729

for more information, see https://pre-commit.ci

Manan17 and others added 2 commits March 17, 2026 10:17

compare for 2 diff models

23c08f3

[pre-commit.ci] auto fixes from pre-commit.com hooks

ad3deda

for more information, see https://pre-commit.ci

gemini-code-assist Bot reviewed Mar 17, 2026

View reviewed changes

Manan17 and others added 20 commits March 17, 2026 10:28

resolving gemini comments

c0398ed

[pre-commit.ci] auto fixes from pre-commit.com hooks

fd03415

for more information, see https://pre-commit.ci

studio: address PR review comments

6d67301

- Extract _get_max_position_embeddings helper to DRY config extraction - Fix "Skip to Chat" to navigate to /chat on step 1 (was /studio)

fix: comment out debug print statements

c85f51d

[pre-commit.ci] auto fixes from pre-commit.com hooks

e135b64

for more information, see https://pre-commit.ci

studio: only trust_remote_code for unsloth/ models in AutoConfig

25bc41c

Only set trust_remote_code=True when the model name starts with "unsloth/". All other models default to False for safety.

studio: move Generating spinner above the composer

5917c72

The "Generating" spinner was below the send message bar, causing the bar to jump up and down. Move it above the composer in both the regular thread view and the welcome/empty view.

studio: adjust toast close button position away from edge

94312c3

Move the X close button on toasts (like "Starting model...") from top-1.5 to top-3 and add right-3, giving more breathing room from the top-right corner.

studio: make Think button smaller with tighter icon-text gap

3188cd8

Reduce gap from 1.5 to 0.5, padding from px-2.5/py-1 to px-2/py-0.5, and icon from size-3.5 to size-3.

studio: change default max steps from 30 to 60 in toggle hook

531d61b

The DEFAULT_MAX_STEPS in use-max-steps-epochs-toggle.ts was still 30, used as fallback when toggling from epochs back to max steps.

[pre-commit.ci] auto fixes from pre-commit.com hooks

64c7164

for more information, see https://pre-commit.ci

studio: rename "Importing ML libraries..." to "Importing Unsloth..."

29b2e8c

danielhanchen and others added 26 commits March 17, 2026 12:28

studio: restore ggufContextLength on page refresh

40fb338

The Max Tokens slider was capped at 32768 on page refresh because ggufContextLength was not restored from the status response. Now set it from statusRes.context_length on reconnect.

studio: fix sloth ASCII art alignment in training overlay

0855f61

fix: correct sloth ASCII art alignment to match Unsloth banner

4d3176a

[pre-commit.ci] auto fixes from pre-commit.com hooks

3282ff4

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

b1a0d84

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

557bdd0

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

4e484a2

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

6f0ee3a

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

4f6127e

for more information, see https://pre-commit.ci

studio: allow all imports in Python tool sandbox

4c50608

Remove stdlib-only import restriction. Keep signal escape pattern detection via unsloth_zoo for safety.

[pre-commit.ci] auto fixes from pre-commit.com hooks

7e55cbe

for more information, see https://pre-commit.ci

Merge feature/compare-two-diff-models (PR #4356)

6221deb

Integrates generalized model comparison into the onboarding-improvements branch. Resolves import conflict in shared-composer.tsx and fixes unused variable in compare flow.

studio: fix toast description text contrast in dark mode

63dcee9

Add explicit !text-muted-foreground to toast description classNames so secondary text (e.g. "Releases VRAM and resets inference state.") is readable in dark mode.

studio: fix Chat card icon alignment with size-4 spacer

23c659a

Replace sr-only span (takes no space) with a size-4 shrink-0 div matching the RadioGroupItem dimensions in other cards, so the Chat icon aligns vertically with Text/Audio/Vision/Embeddings icons.

danielhanchen merged commit 0acd1c7 into main Mar 17, 2026
5 checks passed

danielhanchen deleted the studio/onboarding-improvements branch March 17, 2026 14:46

rolandtannous mentioned this pull request Apr 6, 2026

split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code #4878

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

studio: improve onboarding UX, tooltips, and training defaults#4355

studio: improve onboarding UX, tooltips, and training defaults#4355
danielhanchen merged 64 commits into
mainfrom
studio/onboarding-improvements

danielhanchen commented Mar 17, 2026

Uh oh!

chatgpt-codex-connector Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	navigate({ to: "/studio" });
	navigate({ to: currentStep === 1 ? "/chat" : "/studio" });

Uh oh!

Conversation

danielhanchen commented Mar 17, 2026

Summary

Test plan

Uh oh!

chatgpt-codex-connector Bot commented Mar 17, 2026

Uh oh!

gemini-code-assist Bot commented Mar 17, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants