Skip to content

fix(studio): harden sandbox security for terminal and python tools#4827

Merged
danielhanchen merged 22 commits into
mainfrom
fix/studio-sandbox-hardening
Apr 3, 2026
Merged

fix(studio): harden sandbox security for terminal and python tools#4827
danielhanchen merged 22 commits into
mainfrom
fix/studio-sandbox-hardening

Conversation

@danielhanchen
Copy link
Copy Markdown
Member

@danielhanchen danielhanchen commented Apr 3, 2026

Summary

Fixes #4818. The terminal and python tool sandboxing in Studio could be bypassed to run privileged commands like sudo and chmod (as reported and verified by the issue author).

The root cause is the command blocklist using naive str.split(), which is trivially bypassed via quoting, full paths, nested shells, shell metacharacters, subshells, process substitution, and cross-tool pivoting through Python os.system/subprocess.

This PR hardens sandbox security with multiple independent defense layers, all cross-platform (Linux, macOS, Windows) and with zero measurable performance overhead.

Changes

  1. Better command tokenization -- shlex.split + os.path.basename + regex scanning at shell command boundaries (subshells, process substitution, optional path prefixes). Targeted nested shell detection for bash/sh/zsh -c arguments.

  2. Sanitized subprocess environment -- build_safe_env strips all credentials (HF_TOKEN, WANDB_API_KEY, GH_TOKEN, AWS, LD_PRELOAD, DYLD_) while preserving the active Python interpreter and virtualenv directories in PATH.

  3. OS-level privilege restriction -- PR_SET_NO_NEW_PRIVS via prctl on Linux makes sudo/su/pkexec fail at the kernel level. RLIMIT_NPROC (256) prevents fork bombs. RLIMIT_FSIZE (100MB) prevents disk filling.

  4. Extended AST safety checker -- Detects os.system, os.popen, subprocess.run/Popen/call, os.exec*, os.spawn*, os.posix_spawn* with blocked commands or dynamic args. Tracks aliased imports (import os as myos) and from-imports (from subprocess import run as r). Inspects keyword arguments. Handles list and tuple literals. Includes exception_catching detection for timeout evasion.

  5. Cross-platform shell support -- cmd /c on Windows, bash -c on Unix. CREATE_NO_WINDOW on Windows. Windows-specific blocked commands (rmdir, takeown, icacls, runas, powershell, pwsh).

  6. Expanded blocklist -- From 7 to 14+ commands.

Test plan

  • 151 tests covering all bypass vectors, normal command pass-through, env isolation, resource limits, and false positive prevention
  • Normal commands work: ls, pwd, cat, echo, python, pip, git
  • No false positives on quoted text, filenames with blocked words, pip install
  • Environment variables not leaked to subprocesses
  • Active Python interpreter and virtualenv remain accessible in sandbox PATH
  • 3 rounds of parallel reviews (7x reviewer.py + 3x Sonnet subagents + Gemini Code Assist), all findings addressed

The existing command blocklist used naive str.split() which is trivially
bypassable via quoting, full paths, nested shells, variable expansion,
and cross-tool pivoting through Python os.system/subprocess. Fixes #4818.

Changes:
- Replace str.split() blocklist with shlex.split() + os.path.basename()
  tokenization and regex scanning at shell command boundaries
- Add sanitized subprocess environment (_build_safe_env) that strips
  credentials (HF_TOKEN, WANDB_API_KEY, GH_TOKEN, AWS_*, etc.) and
  restricts PATH to /usr/local/bin:/usr/bin:/bin
- Add PR_SET_NO_NEW_PRIVS via prctl on Linux so sudo/su/pkexec fail
  at the kernel level regardless of how they are invoked
- Add RLIMIT_NPROC (256) and RLIMIT_FSIZE (100MB) to prevent fork
  bombs and disk filling attacks
- Extend AST safety checker to detect os.system(), os.popen(),
  subprocess.run/Popen/call/check_output, os.exec*, os.spawn* calls
  containing blocked commands or dynamic (non-literal) arguments
- Add cross-platform support: cmd.exe on Windows, bash on Unix;
  CREATE_NO_WINDOW flag on Windows, preexec_fn on Unix
- Expand blocklist from 7 to 14 commands: add su, chown, passwd,
  mount, umount, fdisk, kill, killall, pkill
- Apply all layers to both _bash_exec and _python_exec

Zero measurable performance overhead -- shlex parsing and a single
prctl syscall per subprocess fork.
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly strengthens the security of the code execution sandbox by implementing robust command filtering via shlex and regex, stripping sensitive environment variables, and applying kernel-level privilege restrictions and resource limits. It also introduces AST-based detection of dangerous os and subprocess functions to prevent shell escapes. Feedback suggests expanding the blocked function list to include os.posix_spawn variants and using dynamic library discovery for libc to improve portability across different Linux distributions.

I am having trouble creating individual review comments. Click here to see my feedback.

studio/backend/core/inference/tools.py (553-562)

security-high high

The _SHELL_EXEC_FUNCS list is missing os.posix_spawn and os.posix_spawnp, which can also be used to execute external commands and should be included in the security blocklist for consistency.

studio/backend/core/inference/tools.py (115)

medium

Hardcoding the path to 'libc.so.6' is fragile and may fail on systems where libc is named differently (e.g., musl-based systems like Alpine Linux). Consider using ctypes.util.find_library('c') to locate the standard C library dynamically.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df7d0a5746

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +856 to +860
item.get("description", "") for item in info.get("shell_escapes", [])
]
all_reasons = [r for r in reasons + shell_reasons if r]
if all_reasons:
return (
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Include exception-catching findings in unsafe code rejection

_check_signal_escape_patterns() still marks loop-level except Exception patterns as unsafe, but _check_code_safety() now builds all_reasons from only signal_tampering and shell_escapes. When only exception_catching is present, all_reasons is empty and the function returns None, so code that was previously rejected is now allowed (e.g., infinite-loop patterns guarded by broad exception handlers).

Useful? React with 👍 / 👎.

Comment on lines +746 to +750
if isinstance(func, ast.Attribute):
if isinstance(func.value, ast.Name):
if func.value.id in self.os_aliases:
shell_func = f"os.{func.attr}"
elif func.value.id in self.subprocess_aliases:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle subprocess functions imported with from-import

Shell-escape detection only resolves calls where node.func is an attribute (subprocess.run), so from subprocess import run followed by run([...]) is not checked at all. This bypasses the new Python-side blocked-command filter and still allows dangerous commands through bare-name calls.

Useful? React with 👍 / 👎.

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +633 to +636
for s in strs:
base = os.path.basename(s).lower()
if base in _BLOCKED_COMMANDS:
found.add(base)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Tokenize list elements before blocked-command matching

In _check_args_for_blocked, list arguments are validated by os.path.basename() equality only, so strings like "rm -rf x" inside a list are not parsed and are treated as safe. A call such as subprocess.run(["bash", "-c", "rm -rf x"]) therefore evades detection even though it executes a blocked command via a nested shell.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits April 3, 2026 13:29
…rocess substitution

- Include exception_catching reasons in _check_code_safety so bare
  except-in-loop timeout evasion is actually blocked (was computed in
  _check_signal_escape_patterns but never read by the caller)
- Remove base.split() inner loop that caused false positives on quoted
  text arguments containing blocked words (e.g. echo "kill this process")
- Add targeted nested shell detection for bash/sh/zsh -c arguments
  instead, which catches bash -c 'sudo whoami' without false positives
- Add <() process substitution to the regex character class so
  diff <(rm -rf /path) is also caught
- Fix error message to say "unsafe patterns" instead of specifically
  mentioning signal manipulation when other categories trigger
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the security of the code execution sandbox by introducing robust blocked command detection using shlex and regex, environment variable stripping, and resource limits via pre-execution hooks. It also extends the AST-based safety checker to flag dangerous os and subprocess calls. Feedback focuses on addressing critical bypass vectors, including the need to inspect keyword arguments in the AST visitor, ensuring recursive literal checks for list elements, and updating regex patterns to account for absolute or relative paths to blocked binaries.

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +765 to +782
blocked_in_args = _check_args_for_blocked(node.args)
if blocked_in_args:
shell_escapes.append(
{
"type": "shell_escape",
"line": node.lineno,
"description": (
f"{shell_func}() invokes blocked command(s): "
f"{', '.join(sorted(blocked_in_args))}"
),
}
)
else:
has_non_literal = any(
_extract_string_from_node(a) is None
and not isinstance(a, ast.List)
for a in node.args
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The safety check for dynamic arguments and blocked commands in os/subprocess calls has two significant bypass vectors: 1. Keyword Argument Bypass: The check only inspects positional arguments (node.args), ignoring keyword arguments. 2. Dynamic List Bypass: The has_non_literal check explicitly ignores ast.List nodes, but does not verify that the contents of the list are literals.

            if shell_func and shell_func in _SHELL_EXEC_FUNCS:
                all_args = list(node.args) + [kw.value for kw in node.keywords]
                blocked_in_args = _check_args_for_blocked(all_args)
                if blocked_in_args:
                    shell_escapes.append(
                        {
                            "type": "shell_escape",
                            "line": node.lineno,
                            "description": (
                                f"{shell_func}() invokes blocked command(s): "
                                f"{', '.join(sorted(blocked_in_args))}"
                            ),
                        }
                    )
                else:
                    def _is_literal(n):
                        if _extract_string_from_node(n) is not None:
                            return True
                        if isinstance(n, ast.List):
                            return all(_extract_string_from_node(e) is not None for e in n.elts)
                        return False

                    has_non_literal = any(not _is_literal(a) for a in all_args)
                    if has_non_literal:
                        shell_escapes.append(
                            {
                                "type": "shell_escape_dynamic",
                                "line": node.lineno,
                                "description": (
                                    f"{shell_func}() called with non-literal argument "
                                    f"(potential shell escape)"
                                ),
                            }
                        )

Comment thread studio/backend/core/inference/tools.py Outdated
# (semicolons, pipes, &&, ||, backticks, $(), <(), newlines)
lowered = command.lower()
for word in _BLOCKED_COMMANDS:
pattern = rf"(?:^|[;&|`\n]\s*|[$]\(\s*|<\(\s*){re.escape(word)}\b"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The regex pattern for detecting blocked commands at shell boundaries does not account for absolute or relative paths (e.g., ls; /usr/bin/sudo). This allows users to bypass the blocklist by specifying the full path to a binary when using shell metacharacters.

Suggested change
pattern = rf"(?:^|[;&|`\n]\s*|[$]\(\s*|<\(\s*){re.escape(word)}\b"
pattern = rf"(?:^|[;&|`\n]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*[/\\])?{re.escape(word)}\b"

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +102 to +104
for word in tokens[j].lower().split():
if word in _BLOCKED_COMMANDS:
blocked.add(word)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The nested shell detection logic uses a simple split() and direct membership test, which fails to catch blocked commands invoked via paths (e.g., bash -c "/bin/rm -rf /"). It should use os.path.basename to normalize the command name before checking against the blocklist.

Suggested change
for word in tokens[j].lower().split():
if word in _BLOCKED_COMMANDS:
blocked.add(word)
for word in tokens[j].lower().split():
base = os.path.basename(word)
if base in _BLOCKED_COMMANDS:
blocked.add(base)

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +645 to +647
base = os.path.basename(s).lower()
if base in _BLOCKED_COMMANDS:
found.add(base)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The logic for checking list elements in subprocess calls is less robust than the one used for string arguments. It only checks the basename of each element, missing blocked commands embedded in shell-like strings (e.g., subprocess.run(["bash", "-c", "sudo whoami"])). Using _find_blocked_commands for list elements would provide consistent protection across both positional and list-based tool invocations.

Suggested change
base = os.path.basename(s).lower()
if base in _BLOCKED_COMMANDS:
found.add(base)
found |= _find_blocked_commands(s)

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06531267b4

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
shell_func = f"subprocess.{func.attr}"

if shell_func and shell_func in _SHELL_EXEC_FUNCS:
blocked_in_args = _check_args_for_blocked(node.args)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Inspect keyword arguments for subprocess command strings

The shell-escape checker only analyzes node.args, so subprocess calls that provide the command via keyword arguments bypass safety checks entirely. In this path, subprocess.run(args="rm -rf /tmp/x", shell=True) (or args=cmd) produces no blocked_in_args and no dynamic-arg warning, allowing blocked commands through _check_code_safety in the Python tool.

Useful? React with 👍 / 👎.

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +778 to +781
has_non_literal = any(
_extract_string_from_node(a) is None
and not isinstance(a, ast.List)
for a in node.args
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Flag non-literal list elements as dynamic command input

The dynamic-argument check explicitly excludes list arguments, but list inspection only reads constant string elements. As a result, a call like cmd = "rm"; subprocess.run([cmd, "-rf", target]) is treated as safe: the variable command element is neither token-checked nor marked dynamic, so blocked executables can still be launched through the Python sandbox.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits April 3, 2026 13:36
…ning

- Regex now matches blocked commands after optional path prefix at shell
  boundaries (catches ls; /usr/bin/sudo and similar)
- Nested shell detection uses os.path.basename so bash -c "/bin/rm" is
  caught
- AST checker now inspects keyword arguments (not just positional) so
  subprocess.run(args="sudo ...", shell=True) is detected
- List elements in subprocess calls are now checked via
  _find_blocked_commands for consistency (catches subprocess.run(["bash",
  "-c", "rm -rf /"]))
- Dynamic argument check uses _is_safe_literal that validates list
  contents are all string literals
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 32c7d34d3c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
shell_func = f"subprocess.{func.attr}"

if shell_func and shell_func in _SHELL_EXEC_FUNCS:
all_call_args = list(node.args) + [kw.value for kw in node.keywords]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Exclude non-command keyword values from dynamic checks

The shell-escape analyzer appends every keyword value to all_call_args, so control flags (for example check=True, text=True, capture_output=True) are treated as command inputs. Those boolean nodes fail _is_safe_literal, which makes has_non_literal true and rejects otherwise safe literal calls like subprocess.run(['echo', 'hi'], check=True) as shell_escape_dynamic. This is a functional regression that blocks common subprocess usage in the Python tool.

Useful? React with 👍 / 👎.

Comment on lines +78 to +79
base = os.path.basename(token).lower()
if base in _BLOCKED_COMMANDS:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Match blocked commands only at executable positions

_find_blocked_commands currently checks os.path.basename(token) for every token, not just command positions, so benign arguments whose basename matches a blocked command are denied. For example, commands like echo /tmp/rm or cat /data/kill are flagged even though they do not invoke rm/kill. Because this helper is reused across terminal and Python checks, it creates broad false positives and unnecessary command rejection.

Useful? React with 👍 / 👎.

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +686 to +687
elif node.module == "os":
self.os_aliases.add("os")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Track imported os callables in ImportFrom analysis

The ImportFrom handler for os only records the module name and does not capture callable aliases imported directly. Since visit_Call shell detection only handles attribute calls, code like from os import system followed by system('sudo whoami') is not recognized as a shell escape and passes _check_code_safety. This leaves a straightforward bypass for blocked-command execution via direct os imports.

Useful? React with 👍 / 👎.

bash -c 'script' arg0 arg1 -- only tokens[i+1] is the script body;
subsequent tokens are $0, $1 positional parameters passed to the script
and are not executed as shell commands. Scanning all remaining tokens
caused false positives.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the security of the sandbox environment by implementing more robust command-injection detection and restricting subprocess execution. Key changes include the introduction of a comprehensive blocklist for dangerous commands, improved shell-escape detection using shlex and regex, and the application of stricter environment and privilege-dropping configurations for subprocesses. My review highlights several bypass opportunities: the Windows-specific tokenization fails to strip quotes, nested shell command detection is currently unreliable, the AST checker misses direct imports of dangerous functions, and the regex pattern for shell boundaries is incomplete. I recommend addressing these security gaps to ensure the sandbox remains effective.

tokens = (
shlex.split(command)
if sys.platform != "win32"
else shlex.split(command, posix = False)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

On Windows, shlex.split(posix=False) preserves quotes around tokens. For example, "sudo" becomes ['"sudo"']. This causes the subsequent os.path.basename(token).lower() in _BLOCKED_COMMANDS check to fail because "sudo" is not in the blocklist. You should strip quotes from the token before checking it against the blocklist.

Suggested change
else shlex.split(command, posix = False)
else shlex.split(command, posix = False)
)
except ValueError:
tokens = command.split()
for token in tokens:
# Strip quotes for Windows (posix=False) and malformed POSIX tokens
token = token.strip("\"'")
base = os.path.basename(token).lower()

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +104 to +107
if i + 1 < len(tokens):
for word in tokens[i + 1].lower().split():
base = os.path.basename(word)
if base in _BLOCKED_COMMANDS:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Using split() on the nested shell command string is unreliable as it doesn't handle shell metacharacters or complex quoting. For example, bash -c "sudo;ls" would result in ['sudo;ls'], which bypasses the blocklist check. Instead, you should recursively call _find_blocked_commands on the nested command string to leverage the full suite of detection logic (shlex + regex).

            # The next token(s) are the shell command string
            for j in range(i + 1, len(tokens)):
                blocked |= _find_blocked_commands(tokens[j])

Comment on lines +758 to +763
shell_func = None
if isinstance(func, ast.Attribute):
if isinstance(func.value, ast.Name):
if func.value.id in self.os_aliases:
shell_func = f"os.{func.attr}"
elif func.value.id in self.subprocess_aliases:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The AST checker currently only detects dangerous functions when called as attributes (e.g., os.system()). It fails to detect them if they are imported directly (e.g., from os import system; system()). You should track imported names in visit_ImportFrom and check ast.Name nodes in visit_Call to close this bypass.

shell_func = f"os.{func.attr}"
elif func.value.id in self.subprocess_aliases:
shell_func = f"subprocess.{func.attr}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The AST safety checker should also flag eval() and exec() as they are common vectors for bypassing static analysis by dynamically constructing and executing dangerous code strings.

Comment thread studio/backend/core/inference/tools.py Outdated
lowered = command.lower()
for word in _BLOCKED_COMMANDS:
pattern = (
rf"(?:^|[;&|`\n]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?{re.escape(word)}\b"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The regex prefix group misses several common shell command boundaries such as quotes (for commands inside strings), parentheses (subshells), and braces. For example, bash -c "sudo ls" might bypass this regex because sudo is preceded by a double quote. Consider adding quotes and other shell metacharacters to the prefix set.

        pattern = (
            rf"(?:^|[;\u0026|`\n\s\"'({!]\s*|[$]\(\s*|\u003c\(\s*)(?:[\w./\\-]*/)?{re.escape(word)}\b"
        )

(sudo whoami) was not caught because ( was not in the regex character
class for shell command boundaries. Add ( to the set alongside ;, &,
|, backtick, newline.
- Track from-imports of dangerous functions (from os import system,
  from subprocess import run as r, etc.) via shell_exec_aliases dict
  so bare-name calls are detected by the AST checker
- Include the active Python interpreter and virtualenv directories
  in the sanitized PATH so pip, uv, and Studio packages remain
  accessible in the sandbox
- Add Windows-specific blocked commands (rmdir, takeown, icacls,
  runas, powershell, pwsh) only on win32 platform
- Add os.posix_spawn and os.posix_spawnp to _SHELL_EXEC_FUNCS
- Handle tuple literals same as list literals in AST argument
  inspection (both _extract_strings_from_list and _is_safe_literal)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1befb8fb17

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +818 to +820
# Check from-import aliases: from os import system; system(...)
shell_func = self.shell_exec_aliases.get(func.id)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Detect assigned aliases to os/subprocess exec callables

visit_Call resolves bare-name calls only via self.shell_exec_aliases, which is populated from import statements, so aliases created by assignment are missed. For example, runner = subprocess.run followed by runner(['rm','-rf','/tmp/x']) is treated as safe and bypasses the blocked-command check in _check_code_safety, allowing dangerous commands through the Python tool. Please also track assignment-derived aliases (or conservatively treat such bare calls as unsafe).

Useful? React with 👍 / 👎.

Comment on lines +1006 to +1007
if sys.platform != "win32":
popen_kwargs["preexec_fn"] = _sandbox_preexec
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid preexec_fn in threaded tool execution paths

This now injects _sandbox_preexec via preexec_fn, but Python’s preexec_fn path is unsafe in multi-threaded parents and can deadlock before exec when other threads hold runtime/import locks. The backend runs with background threads, so this can cause intermittent hangs for tool calls rather than clean execution/timeout behavior. Consider moving the sandbox setup to a non-Python launcher or another mechanism that does not execute Python in preexec_fn.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly strengthens the security of the code execution sandbox by introducing multi-layered protections. It implements a more sophisticated command blocklist using shlex and regex, a minimal environment configuration to prevent credential leakage, and a pre-execution hook to enforce resource limits and prevent privilege escalation. Additionally, the AST-based safety check is expanded to detect dangerous os and subprocess calls. Review feedback correctly identifies a vulnerability in the AST visitor where direct function imports (e.g., from os import system) could bypass detection and suggests using recursion to more effectively scan nested shell commands.

Comment on lines +687 to +690
s = _extract_string_from_node(arg)
if s is not None:
found |= _find_blocked_commands(s)
strs = _extract_strings_from_list(arg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-critical critical

The current AST analysis for shell escapes has a critical vulnerability: it doesn't handle functions imported directly with from os import system. An attacker could use from os import system; system('sudo ...') to bypass the sandbox, as the visitor only checks for attribute access like os.system(...).

Additionally, the logic here is incorrect. It assumes that any from os import ... statement makes os a valid alias for the module, which is not true and can lead to false positives (e.g., for from os import path).

To fix this, the SignalEscapeVisitor needs to be updated to correctly track imported functions and their aliases.

  1. In SignalEscapeVisitor.__init__, add a dictionary to track aliased functions:

    self.shell_func_aliases = {}
  2. Replace the current logic in visit_ImportFrom for os and subprocess with the following to correctly populate the alias dictionary:

    elif node.module in ("os", "subprocess"):
        for alias in node.names:
            full_name = f"{node.module}.{alias.name}"
            if full_name in _SHELL_EXEC_FUNCS:
                # Store the alias for the dangerous function
                self.shell_func_aliases[alias.asname or alias.name] = full_name
  3. Update visit_Call to check for these aliased functions when the called function is an ast.Name node:

    # In visit_Call, around line 758
    shell_func = None
    if isinstance(func, ast.Attribute):
        if isinstance(func.value, ast.Name):
            if func.value.id in self.os_aliases:
                shell_func = f"os.{func.attr}"
            elif func.value.id in self.subprocess_aliases:
                shell_func = f"subprocess.{func.attr}"
    elif isinstance(func, ast.Name):
        # Check for directly imported functions like `from os import system`
        shell_func = self.shell_func_aliases.get(func.id)
    
    if shell_func and shell_func in _SHELL_EXEC_FUNCS:
        # ... rest of the logic is fine

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +105 to +108
blocked.add(word)

# 3. Check for nested shell invocations (bash -c 'sudo whoami').
# When a token is an argument to a shell -c flag, scan its words
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The use of str.split() to parse the nested command string reintroduces a vulnerability that shlex.split() is intended to prevent throughout the rest of the function. An attacker could bypass this check with a quoted command, for example: bash -c '"sudo" whoami'. The .split() call would produce ['"sudo"', 'whoami'], and os.path.basename('"sudo"') would not match sudo in the blocklist.

A more robust approach would be to recursively call _find_blocked_commands on the nested command string. This ensures all the same parsing logic (shlex, regex, etc.) is applied consistently.

                # Recursively check the nested command string to handle
                # quoted commands and other shell metacharacters.
                blocked.update(_find_blocked_commands(tokens[i + 1]))

danielhanchen and others added 2 commits April 3, 2026 18:01
…anning

- Only inspect command-carrying keyword arguments (args, command,
  executable, path, file) in the AST checker, not control flags like
  check=True, text=True, capture_output=True which are booleans and
  were incorrectly flagged as non-literal dynamic arguments
- Replace split() in nested shell detection with recursive call to
  _find_blocked_commands so that quoted commands (bash -c '"sudo"
  whoami') and semicolons (bash -c "sudo;ls") within nested shells
  are properly detected through the full shlex + regex pipeline
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 03249794e3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +113 to +115
token == "-c"
and i > 0
and os.path.basename(tokens[i - 1]).lower() in _SHELLS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scan Windows /c payloads for blocked commands

The nested-shell branch only recurses when the flag token is -c, but Windows commands are executed via _get_shell_cmd() as cmd /c .... That means inputs like cmd /c "powershell -Command rm ..." never have their inner command string re-scanned, so blocked executables can pass through when they are inside the /c payload. This is a sandbox bypass specifically on Windows paths.

Useful? React with 👍 / 👎.

Comment on lines +93 to +94
base = os.path.basename(token).lower()
if base in _BLOCKED_COMMANDS:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Unquote Windows tokens before blocked-command matching

In the Windows tokenization path (shlex.split(..., posix=False)), quoted executables keep their quote characters, so os.path.basename(token) on a token like "powershell" does not equal powershell. Because the regex path also expects command-boundary forms without a leading quote, commands such as "powershell" -Command ... evade the blocklist while still executing correctly in cmd.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly hardens the sandbox environment for executing code and shell commands by implementing robust command blocking, environment sanitization, and privilege dropping. The changes introduce regex-based command detection, a secure environment builder, and pre-execution hooks to limit system resources and prevent privilege escalation. I have identified two critical areas for improvement: the use of preexec_fn in subprocess.Popen is known to be unsafe in multi-threaded applications due to potential deadlocks, and the hardcoded path to libc.so.6 should be replaced with a dynamic lookup using ctypes.util.find_library('c') to ensure cross-distribution compatibility.

env = safe_env,
)
if sys.platform != "win32":
popen_kwargs["preexec_fn"] = _sandbox_preexec
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The preexec_fn argument in subprocess.Popen is not thread-safe and can cause deadlocks in multi-threaded applications if the child process attempts to acquire locks held by the parent. Since this is a server environment, consider using start_new_session=True or preexec_fn with extreme caution, or preferably, move the privilege dropping logic to a wrapper script executed by the subprocess.

Comment thread studio/backend/core/inference/tools.py Outdated
try:
import ctypes

libc = ctypes.CDLL("libc.so.6", use_errno = True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Hardcoding the path to 'libc.so.6' is fragile and may fail on systems where libc is located elsewhere (e.g., musl-based systems like Alpine, or different Linux distributions). Use ctypes.util.find_library('c') to locate the standard C library dynamically.

Addresses two Gemini review findings:

1. preexec_fn thread safety: _sandbox_preexec previously imported ctypes
   and resource inside the function body, which runs between fork() and
   exec() in the child process. In a multi-threaded server, this could
   deadlock if the import machinery locks were held by another thread at
   fork time. Now all imports and the libc handle are resolved once at
   module load time, so _sandbox_preexec only calls C-level functions
   (prctl, setrlimit) with no Python import activity.

2. Hardcoded libc.so.6 path: replaced with ctypes.util.find_library("c")
   which works on glibc (libc.so.6), musl (libc.musl-*.so.1), and other
   Linux distributions where libc has a different soname.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the security of the sandbox environment by implementing robust command blocking, environment sanitization, and privilege restriction for subprocesses. It introduces a more sophisticated detection mechanism for blocked commands using shlex and regex, adds a safe environment builder to strip sensitive credentials, and applies pre-execution hooks to limit system resources and prevent privilege escalation. Additionally, it extends the AST-based static analysis to detect potential shell escapes in Python code. My feedback suggests optimizing the regex-based command detection by compiling a single pattern, simplifying the path deduplication logic using dict.fromkeys, and moving the _CMD_KWARGS constant to a higher scope to avoid redundant redefinitions.

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +121 to +127
lowered = command.lower()
for word in _BLOCKED_COMMANDS:
pattern = (
rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?{re.escape(word)}\b"
)
if re.search(pattern, lowered):
blocked.add(word)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This loop iterates through each blocked command, compiling and running a regex for each one. For better performance, you can combine all blocked words into a single regex pattern. This avoids repeated regex compilation and searching within the loop.

Suggested change
lowered = command.lower()
for word in _BLOCKED_COMMANDS:
pattern = (
rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?{re.escape(word)}\b"
)
if re.search(pattern, lowered):
blocked.add(word)
lowered = command.lower()
if _BLOCKED_COMMANDS:
blocked_words_pattern = "|".join(re.escape(w) for w in _BLOCKED_COMMANDS)
pattern = rf"(?:^|[;&|\u0060\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?({blocked_words_pattern})\b"
blocked.update(re.findall(pattern, lowered))

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +177 to +182
seen = set()
deduped = []
for p in path_entries:
if p and p not in seen:
seen.add(p)
deduped.append(p)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This pattern for deduplicating a list while preserving order can be expressed more concisely using dict.fromkeys(), which is order-preserving since Python 3.7. This improves code readability.

Suggested change
seen = set()
deduped = []
for p in path_entries:
if p and p not in seen:
seen.add(p)
deduped.append(p)
deduped = list(dict.fromkeys(p for p in path_entries if p))

Comment thread studio/backend/core/inference/tools.py Outdated
if shell_func and shell_func in _SHELL_EXEC_FUNCS:
# Only inspect keyword args that carry command content, not
# control flags like check=True, text=True, capture_output=True.
_CMD_KWARGS = {"args", "command", "executable", "path", "file"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _CMD_KWARGS set is a constant and is being redefined on every call to visit_Call. To improve performance and adhere to style conventions for constants, it should be defined at a higher scope, for example as a class attribute on SignalEscapeVisitor.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e2deca7a8c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +135 to +137
token == "-c"
and i > 0
and os.path.basename(tokens[i - 1]).lower() in _SHELLS
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Handle combined shell options in nested command scanning

The nested-shell recursion only triggers when the option token is exactly -c and the immediately previous token is the shell name, so commands like bash -lc 'rm -rf /tmp/x' or bash --login -c 'sudo ...' are treated as safe and pass the blocklist. This bypasses the new nested-shell protection in both terminal execution (_bash_exec uses _find_blocked_commands directly) and Python static checks (_check_args_for_blocked also relies on _find_blocked_commands), allowing blocked commands to execute via common shell flag combinations.

Useful? React with 👍 / 👎.

…nt hoisting

- Combine per-word regex loop into a single re.findall with alternation
  pattern, avoiding repeated regex compilation and searching
- Replace manual dedup loop with dict.fromkeys for PATH entries
- Hoist _CMD_KWARGS frozenset out of visit_Call to avoid recreating it
  on every AST node visit
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly enhances the security of the code execution sandbox by introducing robust command blocking, environment sanitization, and resource limits. Key additions include a more sophisticated blocked command detector using shlex and regex, a minimal environment builder, and AST-based analysis to prevent shell escapes in Python code. Feedback was provided to improve the nested shell detection logic by making it cross-platform, specifically addressing Windows-specific shell invocations like 'cmd /c' which were previously overlooked.

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +131 to +137
_SHELLS = {"bash", "sh", "zsh", "dash", "ksh", "csh", "tcsh", "fish"}
for i, token in enumerate(tokens):
if (
token == "-c"
and i > 0
and os.path.basename(tokens[i - 1]).lower() in _SHELLS
):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The nested shell detection logic is specific to Unix-like shells and the -c flag. It will not detect nested commands on Windows that use cmd /c, which is a potential security gap as cmd is not in the blocklist.

To make this check cross-platform, you should use platform-specific shell names and command flags.

Suggested change
_SHELLS = {"bash", "sh", "zsh", "dash", "ksh", "csh", "tcsh", "fish"}
for i, token in enumerate(tokens):
if (
token == "-c"
and i > 0
and os.path.basename(tokens[i - 1]).lower() in _SHELLS
):
if sys.platform == "win32":
_SHELLS, c_flag = {"cmd"}, "/c"
else:
_SHELLS, c_flag = {"bash", "sh", "zsh", "dash", "ksh", "csh", "tcsh", "fish"}, "-c"
for i, token in enumerate(tokens):
# On Windows, flags can be case-insensitive.
is_c_flag = token.lower() == c_flag if sys.platform == "win32" else token == c_flag
if (
is_c_flag
and i > 0
and os.path.basename(tokens[i - 1]).lower() in _SHELLS
):

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 57ca43792c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
lowered = command.lower()
if _BLOCKED_COMMANDS:
words_alt = "|".join(re.escape(w) for w in sorted(_BLOCKED_COMMANDS))
pattern = rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?({words_alt})\b"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block commands reconstructed from command substitutions

The blocklist scan only matches blocked executables when they appear literally in the command string (or immediately after $(), but bash lets callers synthesize the executable name from substitution output. For example, $(printf rm) -rf /tmp/x executes rm while _find_blocked_commands returns no hit, so _bash_exec allows the command. This is a practical bypass for destructive commands in Unix terminal execution.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits April 3, 2026 18:52
The nested shell scan only checked for Unix shells (bash -c, sh -c, etc).
Add cmd /c and cmd.exe /c detection so that Windows nested shell
invocations are also recursively scanned for blocked commands. The token
scan already catches blocked commands at any position, so this is
defense-in-depth for consistency across platforms.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d029e2f674

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +114 to +116
for token in tokens:
base = os.path.basename(token).lower()
if base in _BLOCKED_COMMANDS:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Block variable-expanded commands in shell checks

_find_blocked_commands only matches literal command tokens/regex words and never accounts for shell variable expansion, so blocked executables can be reconstructed at runtime and evade the filter. For example, cmd=rm; $cmd -rf /tmp/x produces no hit here, yet _bash_exec will execute rm; the same helper is also reused by the Python safety checker, so this bypass applies there too.

Useful? React with 👍 / 👎.

Comment on lines +832 to +836
if isinstance(func, ast.Attribute):
if isinstance(func.value, ast.Name):
if func.value.id in self.os_aliases:
shell_func = f"os.{func.attr}"
elif func.value.id in self.subprocess_aliases:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Flag indirect os/subprocess call targets as unsafe

The AST call resolution only recognizes attribute calls rooted at a simple name (e.g., os.system) or tracked bare-name aliases, so indirect forms such as getattr(os, 'system')(...) and importlib.import_module('subprocess').run(...) are skipped entirely. In those cases _check_code_safety returns None, which allows blocked commands to execute through the Python tool despite this hardening pass.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the security of the execution sandbox by introducing a robust command blocklist, a minimal environment builder, and a pre-execution hook to drop privileges and set resource limits. It also extends the AST-based safety checker to identify dangerous shell invocations. Reviewer feedback suggests improving the command detection logic by including Python interpreters in nested shell checks, refining token matching to avoid false positives, fixing Windows path compatibility in regex patterns, and pre-compiling regexes for better performance.

Comment on lines +131 to +132
_SHELLS = {"bash", "sh", "zsh", "dash", "ksh", "csh", "tcsh", "fish"}
_SHELLS_WIN = {"cmd", "cmd.exe"}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The nested shell detection logic currently only covers standard Unix shells and cmd.exe. However, a common bypass for terminal blocklists is to use python -c or perl -e to execute arbitrary code (e.g., python -c 'import os; os.system("sudo ...")'). Since the terminal tool allows running the Python interpreter, these should be included in the recursion list to ensure the nested command string is also scanned.

Suggested change
_SHELLS = {"bash", "sh", "zsh", "dash", "ksh", "csh", "tcsh", "fish"}
_SHELLS_WIN = {"cmd", "cmd.exe"}
_SHELLS = {"bash", "sh", "zsh", "dash", "ksh", "csh", "tcsh", "fish", "python", "python3"}
_SHELLS_WIN = {"cmd", "cmd.exe", "python.exe"}

Comment on lines +114 to +117
for token in tokens:
base = os.path.basename(token).lower()
if base in _BLOCKED_COMMANDS:
blocked.add(base)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The current logic in Step 1 iterates over all tokens and blocks the command if any token's basename matches a blocked command. This leads to false positives where a blocked word is used as an argument or within a string, e.g., echo sudo or git commit -m "fixed sudo bug". The blocklist should ideally only apply to tokens in command positions. Since Step 2 (regex scanning) already attempts to identify command boundaries, consider making Step 1 more selective (e.g., only checking the first token) or relying more on the regex/tokenization boundaries.

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +124 to +125
words_alt = "|".join(re.escape(w) for w in sorted(_BLOCKED_COMMANDS))
pattern = rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?({words_alt})\b"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The regex pattern for detecting blocked commands at shell boundaries only accounts for forward slashes in path prefixes. On Windows, commands may be invoked with backslash separators (e.g., C:\Windows\System32\sudo). The character class [\w./\-]* correctly includes the backslash, but the trailing / in the non-capturing group (?:[\w./\-]*/)? prevents it from matching paths that use backslashes as the final separator before the command name.

Suggested change
words_alt = "|".join(re.escape(w) for w in sorted(_BLOCKED_COMMANDS))
pattern = rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?({words_alt})\b"
words_alt = "|".join(re.escape(w) for w in sorted(_BLOCKED_COMMANDS))
pattern = rf"(?:^|[;&|\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*[\\/])?({words_alt})\b"

Comment on lines +123 to +126
if _BLOCKED_COMMANDS:
words_alt = "|".join(re.escape(w) for w in sorted(_BLOCKED_COMMANDS))
pattern = rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?({words_alt})\b"
blocked.update(re.findall(pattern, lowered))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The regex pattern for blocked commands is re-compiled on every call to _find_blocked_commands. Since _BLOCKED_COMMANDS is a constant, the regex should be pre-compiled at the module level to improve performance, especially given that this function may be called recursively.

danielhanchen and others added 2 commits April 3, 2026 19:10
… -c)

The nested shell scan only matched token == "-c" with the immediately
preceding token being a shell name. This missed:
- Combined flags: bash -lc 'rm ...' (-lc ends with c, is a valid
  combined flag meaning -l -c)
- Interleaved flags: bash --login -c 'sudo ...' (--login sits between
  bash and -c)

Now matches any short flag ending in 'c' (e.g. -lc, -xc, -ic) and
walks backwards past intermediate flags to find the shell binary.
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 539dc648db

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +149 to +150
if prev.startswith("-") or prev.startswith("/"):
continue # skip flags
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Recognize absolute shell paths in nested -c scanning

In _find_blocked_commands, the backward scan skips any token starting with / as if it were a flag, which causes absolute shell executables like /bin/bash and /bin/sh to be ignored. As a result, commands such as /bin/bash -c "rm -rf /tmp/x" are not recursively scanned and pass the blocklist, creating a direct bypass in both terminal execution and Python argument checks that rely on this helper.

Useful? React with 👍 / 👎.

Comment on lines +771 to +773
fq = f"{node.module}.{alias.name}"
if fq in _SHELL_EXEC_FUNCS:
self.shell_exec_aliases[alias.asname or alias.name] = fq
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject wildcard os/subprocess imports in safety analysis

visit_ImportFrom only records explicit imported names, so wildcard imports are treated as safe even when they expose dangerous callables. With from subprocess import * (or from os import *), later calls like run([...])/system(...) are unresolved and _check_code_safety returns None, which bypasses the new shell-escape guard for blocked commands.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request strengthens the execution sandbox by introducing environment sanitization, resource limits, and enhanced command blocking using shlex and regex. It also adds an AST-based safety checker to detect dangerous os and subprocess calls in Python code. Review feedback identifies several security improvements, including accounting for Windows drive letters in regex patterns, tracking local variable assignments in the AST checker to prevent aliasing bypasses, and including the Python interpreter in the recursive shell scanning to prevent command execution pivots.

Comment thread studio/backend/core/inference/tools.py Outdated
lowered = command.lower()
if _BLOCKED_COMMANDS:
words_alt = "|".join(re.escape(w) for w in sorted(_BLOCKED_COMMANDS))
pattern = rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?({words_alt})\b"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The regex pattern for detecting blocked commands with path prefixes does not account for Windows drive letters (e.g., C:). This could allow a bypass on Windows if a blocked command is invoked using an absolute path starting with a drive letter in a multi-command string (e.g., ls & C:\Windows\System32\runas.exe).

Suggested change
pattern = rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-]*/)?({words_alt})\b"
pattern = rf"(?:^|[;&|`\n(]\s*|[$]\(\s*|<\(\s*)(?:[\w./\\-:]*/|[a-zA-Z]:\\)?({words_alt})\b"

# Check from-import aliases: from os import system; system(...)
shell_func = self.shell_exec_aliases.get(func.id)

if shell_func and shell_func in _SHELL_EXEC_FUNCS:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The AST safety checker for Python code execution does not track local variable assignments. An attacker could bypass the check by assigning a dangerous function to a local variable before calling it (e.g., s = os.system; s('sudo ...')). While PR_SET_NO_NEW_PRIVS provides a strong kernel-level defense on Linux, this bypass remains a risk on other platforms like macOS where such kernel-level restrictions are not applied in this PR.

tokens = set(command.lower().split())
blocked = tokens & _BASH_BLOCKED_WORDS
# Block dangerous commands (shlex + regex based)
blocked = _find_blocked_commands(command)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

The terminal tool's blocklist can be bypassed by pivoting through the Python interpreter. Since python is not in the _SHELLS list for recursive scanning, an LLM could execute a blocked command via python -c "import os; os.system('sudo ...')". Although the subprocess environment is sanitized and restricted by PR_SET_NO_NEW_PRIVS on Linux, this bypass allows executing commands that the blocklist specifically intends to prevent, especially on non-Linux platforms.

danielhanchen and others added 2 commits April 3, 2026 19:49
Addresses three high-consensus findings from 20-reviewer pass:

1. /bin/bash -c 'sudo whoami' bypassed nested shell scan because the
   backwards flag-skip logic treated paths starting with / as flags.
   Now only skips tokens starting with - as Unix flags; on Windows
   only skips short /X flags (not /bin/bash style paths). [9/20]

2. RLIMIT_NPROC=256 caused subprocess.run to fail with EAGAIN because
   Linux enforces NPROC per real UID, not per process tree. Removed
   RLIMIT_NPROC entirely; RLIMIT_FSIZE and PR_SET_NO_NEW_PRIVS remain
   as the primary resource and privilege controls. [5/20]

3. AST checker rejected safe dynamic subprocess usage like
   cmd=["git","status"]; subprocess.run(cmd) as shell_escape_dynamic.
   Now only flags dynamic args for shell-string functions (os.system,
   os.popen, subprocess.getoutput, etc.) or when shell=True is
   explicitly set. List-based subprocess calls with shell=False (the
   default) do not pass through a shell and are not flagged. [12/20]
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e35e8a0990

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +895 to +899
shell_kwarg_true = any(
kw.arg == "shell"
and isinstance(kw.value, ast.Constant)
and kw.value.value is True
for kw in node.keywords
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat non-literal shell flags as unsafe

The dynamic-shell guard only activates when shell is a literal True, so calls like s = True; subprocess.run(cmd, shell=s) skip the dynamic-argument check entirely. In that path, _check_code_safety returns None even when cmd contains blocked commands (for example rm -f ...), and _python_exec will run them. This creates a straightforward bypass whenever shell is passed via a variable or expression.

Useful? React with 👍 / 👎.

Comment on lines +860 to +864
cmd_kw_values = [
kw.value
for kw in node.keywords
if kw.arg in _CMD_KWARGS or kw.arg is None # **kwargs
]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Expand kwargs dictionaries before command safety checks

When subprocess/os calls use **kwargs, the analyzer stores the whole kwargs expression as one opaque node instead of inspecting the contained keys/values, and _check_args_for_blocked only understands string/list literals. That means subprocess.run(**{"args": "rm -f x.txt", "shell": True}) is treated as safe and executes blocked commands, because neither the command string nor shell=True is actually extracted from the expanded kwargs payload.

Useful? React with 👍 / 👎.

…ction

Gemini review found that Windows absolute paths (C:\Windows\System32\
shutdown.exe) and executable extensions (.exe, .com, .bat, .cmd) were
not handled:

- Token scan now strips .exe/.com/.bat/.cmd extensions before checking
  the blocklist, so sudo.exe matches sudo, shutdown.bat matches shutdown
- Regex pattern now includes optional Windows drive letter prefix
  ([a-zA-Z]:[/\\]) and optional executable extension suffix, so commands
  after shell metacharacters with full Windows paths are also caught
@danielhanchen
Copy link
Copy Markdown
Member Author

/gemini review

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ac7cb80864

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread studio/backend/core/inference/tools.py Outdated
Comment on lines +912 to +913
if shell_func in _STRING_SHELL_FUNCS or shell_kwarg_true:

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject dynamic command paths in os.exec/spawn calls**

Dynamic-argument rejection is currently gated behind shell_func in _STRING_SHELL_FUNCS or shell_kwarg_true, which excludes direct-exec APIs like os.execvp, os.execvpe, and os.spawn*. As a result, blocked executables can bypass _check_code_safety when provided via variables (for example cmd='rm'; os.execvp(cmd, [cmd, '-f', 'victim.txt'])), and the Python tool then executes them. Since these functions directly launch binaries without shell=True, this condition leaves a practical sandbox bypass for the new blocklist logic.

Useful? React with 👍 / 👎.

danielhanchen and others added 2 commits April 3, 2026 20:32
…ion false positive

Addresses three findings from second 20-reviewer pass:

1. **kwargs dict expansion (9/20): subprocess.run(**{"args": "rm ...",
   "shell": True}) bypassed the AST checker because **kwargs were
   treated as opaque. Now expands literal dict **kwargs to inspect
   their keys, and flags opaque **kwargs (variable dicts) as unsafe.

2. Non-literal shell= values (7/20): shell=variable was treated as
   shell=False (safe). Now any shell= value that is not literally
   False is treated as potentially True (conservative default).

3. except Exception false positive (1/20): except Exception in a loop
   was flagged as timeout evasion, but Exception does not catch
   SystemExit or KeyboardInterrupt which are used for timeout
   enforcement. Narrowed to only flag except BaseException and
   except TimeoutError in loops.
@danielhanchen danielhanchen merged commit 2c73ab7 into main Apr 3, 2026
5 checks passed
@danielhanchen danielhanchen deleted the fix/studio-sandbox-hardening branch April 3, 2026 20:33
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 83de70d651

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

if _libc is not None:
try:
# PR_SET_NO_NEW_PRIVS = 38, arg2 = 1 (enable)
_libc.prctl(38, 1, 0, 0, 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check prctl result before assuming no-new-privs is active

_sandbox_preexec() calls prctl through ctypes, but syscall failures return -1 and set errno rather than raising OSError; the current try/except therefore treats denied/unsupported PR_SET_NO_NEW_PRIVS calls as success. In environments with older kernels or restrictive seccomp profiles, this silently disables the intended kernel-level privilege guard while execution continues normally, weakening the hardening this commit adds.

Useful? React with 👍 / 👎.

env = {
"PATH": os.pathsep.join(deduped),
"HOME": workdir,
"TMPDIR": workdir,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Set Windows temp variables in sandboxed child environment

The sanitized environment only sets TMPDIR, but Windows tooling (including Python's tempfile behavior) relies on TEMP/TMP, so sandboxed subprocesses on Windows will write temporary files to the global system temp directory instead of the per-session workdir. This is a sandbox-isolation regression specific to Windows runs and can leak artifacts across sessions.

Useful? React with 👍 / 👎.

shibizhao pushed a commit to shibizhao/unsloth-npu that referenced this pull request Apr 7, 2026
…nslothai#4827)

* fix(studio): harden sandbox security for terminal and python tools

The existing command blocklist used naive str.split() which is trivially
bypassable via quoting, full paths, nested shells, variable expansion,
and cross-tool pivoting through Python os.system/subprocess. Fixes unslothai#4818.

Changes:
- Replace str.split() blocklist with shlex.split() + os.path.basename()
  tokenization and regex scanning at shell command boundaries
- Add sanitized subprocess environment (_build_safe_env) that strips
  credentials (HF_TOKEN, WANDB_API_KEY, GH_TOKEN, AWS_*, etc.) and
  restricts PATH to /usr/local/bin:/usr/bin:/bin
- Add PR_SET_NO_NEW_PRIVS via prctl on Linux so sudo/su/pkexec fail
  at the kernel level regardless of how they are invoked
- Add RLIMIT_NPROC (256) and RLIMIT_FSIZE (100MB) to prevent fork
  bombs and disk filling attacks
- Extend AST safety checker to detect os.system(), os.popen(),
  subprocess.run/Popen/call/check_output, os.exec*, os.spawn* calls
  containing blocked commands or dynamic (non-literal) arguments
- Add cross-platform support: cmd.exe on Windows, bash on Unix;
  CREATE_NO_WINDOW flag on Windows, preexec_fn on Unix
- Expand blocklist from 7 to 14 commands: add su, chown, passwd,
  mount, umount, fdisk, kill, killall, pkill
- Apply all layers to both _bash_exec and _python_exec

Zero measurable performance overhead -- shlex parsing and a single
prctl syscall per subprocess fork.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix review findings: exception_catching dead code, false positives, process substitution

- Include exception_catching reasons in _check_code_safety so bare
  except-in-loop timeout evasion is actually blocked (was computed in
  _check_signal_escape_patterns but never read by the caller)
- Remove base.split() inner loop that caused false positives on quoted
  text arguments containing blocked words (e.g. echo "kill this process")
- Add targeted nested shell detection for bash/sh/zsh -c arguments
  instead, which catches bash -c 'sudo whoami' without false positives
- Add <() process substitution to the regex character class so
  diff <(rm -rf /path) is also caught
- Fix error message to say "unsafe patterns" instead of specifically
  mentioning signal manipulation when other categories trigger

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review feedback: regex paths, keyword args, list element scanning

- Regex now matches blocked commands after optional path prefix at shell
  boundaries (catches ls; /usr/bin/sudo and similar)
- Nested shell detection uses os.path.basename so bash -c "/bin/rm" is
  caught
- AST checker now inspects keyword arguments (not just positional) so
  subprocess.run(args="sudo ...", shell=True) is detected
- List elements in subprocess calls are now checked via
  _find_blocked_commands for consistency (catches subprocess.run(["bash",
  "-c", "rm -rf /"]))
- Dynamic argument check uses _is_safe_literal that validates list
  contents are all string literals

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix nested shell scan to only check the script body, not positional args

bash -c 'script' arg0 arg1 -- only tokens[i+1] is the script body;
subsequent tokens are $0, $1 positional parameters passed to the script
and are not executed as shell commands. Scanning all remaining tokens
caused false positives.

* Add subshell parentheses to regex command boundary detection

(sudo whoami) was not caught because ( was not in the regex character
class for shell command boundaries. Add ( to the set alongside ;, &,
|, backtick, newline.

* Address high-priority review findings from 7 parallel reviewers

- Track from-imports of dangerous functions (from os import system,
  from subprocess import run as r, etc.) via shell_exec_aliases dict
  so bare-name calls are detected by the AST checker
- Include the active Python interpreter and virtualenv directories
  in the sanitized PATH so pip, uv, and Studio packages remain
  accessible in the sandbox
- Add Windows-specific blocked commands (rmdir, takeown, icacls,
  runas, powershell, pwsh) only on win32 platform
- Add os.posix_spawn and os.posix_spawnp to _SHELL_EXEC_FUNCS
- Handle tuple literals same as list literals in AST argument
  inspection (both _extract_strings_from_list and _is_safe_literal)

* Fix false positive on check=True kwargs and recursive nested shell scanning

- Only inspect command-carrying keyword arguments (args, command,
  executable, path, file) in the AST checker, not control flags like
  check=True, text=True, capture_output=True which are booleans and
  were incorrectly flagged as non-literal dynamic arguments
- Replace split() in nested shell detection with recursive call to
  _find_blocked_commands so that quoted commands (bash -c '"sudo"
  whoami') and semicolons (bash -c "sudo;ls") within nested shells
  are properly detected through the full shlex + regex pipeline

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Move preexec_fn imports to module level and use find_library for libc

Addresses two Gemini review findings:

1. preexec_fn thread safety: _sandbox_preexec previously imported ctypes
   and resource inside the function body, which runs between fork() and
   exec() in the child process. In a multi-threaded server, this could
   deadlock if the import machinery locks were held by another thread at
   fork time. Now all imports and the libc handle are resolved once at
   module load time, so _sandbox_preexec only calls C-level functions
   (prctl, setrlimit) with no Python import activity.

2. Hardcoded libc.so.6 path: replaced with ctypes.util.find_library("c")
   which works on glibc (libc.so.6), musl (libc.musl-*.so.1), and other
   Linux distributions where libc has a different soname.

* Apply Gemini style suggestions: combined regex, dict.fromkeys, constant hoisting

- Combine per-word regex loop into a single re.findall with alternation
  pattern, avoiding repeated regex compilation and searching
- Replace manual dedup loop with dict.fromkeys for PATH entries
- Hoist _CMD_KWARGS frozenset out of visit_Call to avoid recreating it
  on every AST node visit

* Add cmd /c nested shell detection for Windows parity

The nested shell scan only checked for Unix shells (bash -c, sh -c, etc).
Add cmd /c and cmd.exe /c detection so that Windows nested shell
invocations are also recursively scanned for blocked commands. The token
scan already catches blocked commands at any position, so this is
defense-in-depth for consistency across platforms.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Handle combined shell flags (-lc, -xc) and interleaved flags (--login -c)

The nested shell scan only matched token == "-c" with the immediately
preceding token being a shell name. This missed:
- Combined flags: bash -lc 'rm ...' (-lc ends with c, is a valid
  combined flag meaning -l -c)
- Interleaved flags: bash --login -c 'sudo ...' (--login sits between
  bash and -c)

Now matches any short flag ending in 'c' (e.g. -lc, -xc, -ic) and
walks backwards past intermediate flags to find the shell binary.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix /bin/bash bypass, remove RLIMIT_NPROC, reduce AST false positives

Addresses three high-consensus findings from 20-reviewer pass:

1. /bin/bash -c 'sudo whoami' bypassed nested shell scan because the
   backwards flag-skip logic treated paths starting with / as flags.
   Now only skips tokens starting with - as Unix flags; on Windows
   only skips short /X flags (not /bin/bash style paths). [9/20]

2. RLIMIT_NPROC=256 caused subprocess.run to fail with EAGAIN because
   Linux enforces NPROC per real UID, not per process tree. Removed
   RLIMIT_NPROC entirely; RLIMIT_FSIZE and PR_SET_NO_NEW_PRIVS remain
   as the primary resource and privilege controls. [5/20]

3. AST checker rejected safe dynamic subprocess usage like
   cmd=["git","status"]; subprocess.run(cmd) as shell_escape_dynamic.
   Now only flags dynamic args for shell-string functions (os.system,
   os.popen, subprocess.getoutput, etc.) or when shell=True is
   explicitly set. List-based subprocess calls with shell=False (the
   default) do not pass through a shell and are not flagged. [12/20]

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Handle Windows drive letter paths and .exe extensions in command detection

Gemini review found that Windows absolute paths (C:\Windows\System32\
shutdown.exe) and executable extensions (.exe, .com, .bat, .cmd) were
not handled:

- Token scan now strips .exe/.com/.bat/.cmd extensions before checking
  the blocklist, so sudo.exe matches sudo, shutdown.bat matches shutdown
- Regex pattern now includes optional Windows drive letter prefix
  ([a-zA-Z]:[/\\]) and optional executable extension suffix, so commands
  after shell metacharacters with full Windows paths are also caught

* Handle **kwargs dict expansion, non-literal shell=, and except Exception false positive

Addresses three findings from second 20-reviewer pass:

1. **kwargs dict expansion (9/20): subprocess.run(**{"args": "rm ...",
   "shell": True}) bypassed the AST checker because **kwargs were
   treated as opaque. Now expands literal dict **kwargs to inspect
   their keys, and flags opaque **kwargs (variable dicts) as unsafe.

2. Non-literal shell= values (7/20): shell=variable was treated as
   shell=False (safe). Now any shell= value that is not literally
   False is treated as potentially True (conservative default).

3. except Exception false positive (1/20): except Exception in a loop
   was flagged as timeout evasion, but Exception does not catch
   SystemExit or KeyboardInterrupt which are used for timeout
   enforcement. Narrowed to only flag except BaseException and
   except TimeoutError in loops.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
danielhanchen added a commit to danielhanchen/unsloth-staging-2 that referenced this pull request May 17, 2026
Why the existing sandbox-hardening code is edited (FA3 rationale):

- `_sandbox_preexec` (tools.py:241) and its `_libc.prctl(PR_SET_NO_NEW_PRIVS)`
  block (tools.py:257-260) were introduced by unslothai#4827 / unslothai#5375 to harden tool
  subprocesses. That contract is preserved: the original function is renamed
  to `_sandbox_preexec_impl(apply_no_new_privs)` and a same-named
  `_sandbox_preexec()` wrapper calls it with `apply_no_new_privs=True`, so
  every direct caller still sets NNP. The new
  `_sandbox_preexec_for_bwrap()` skips NNP only on the Linux+bwrap path
  where bwrap itself reapplies NNP to the inner payload (setuid-helper
  bwrap installs cannot acquire helper privileges if NNP is set before
  execve). Net result: the inner sandboxed program still runs with NNP=1.

- The `if sys.platform != "win32": popen_kwargs["preexec_fn"] = ...` /
  `else: popen_kwargs["creationflags"] = subprocess.CREATE_NO_WINDOW`
  blocks in `_python_exec` (tools.py:1615-1618) and `_bash_exec`
  (tools.py:1709-1712) come from unslothai#4827. They are reorganised, not
  removed: the non-Windows path still sets `preexec_fn` and the Windows
  path still sets `CREATE_NO_WINDOW`. The only behavioural delta is the
  bwrap branch (Linux + sandbox_available()) selecting
  `_sandbox_preexec_for_bwrap` per above.

- `_workdirs[key] = workdir` (tools.py:348) from unslothai#4416 is replaced with
  `_workdirs[key] = os.path.realpath(workdir)`. Same caching behaviour;
  the realpath normalization is required because the bwrap argv binds
  `os.path.realpath(workdir)` while `tempfile.mkstemp(dir=workdir)`
  builds tmp_path from the non-realpath value, so on hosts where HOME
  or a parent is a symlink the script path passed via inner_argv is
  not reachable inside the sandbox namespace.

studio/backend/core/inference/sandbox.py
- Narrow Linux /etc bind from the whole directory to a runtime allowlist
  (hosts, resolv.conf, nsswitch.conf, localtime, ld.so.cache, ld.so.conf
  and ld.so.conf.d, ssl, ca-certificates, pki) so sandboxed tools cannot
  read sshd_config, machine-id, cloud-init data, etc.
- Narrow Seatbelt /private/etc the same way; replace the broad subpath
  with literal entries for hosts, resolv.conf, nsswitch.conf, localtime,
  protocols, services and subpaths for ssl and ca-certificates.
- Constrain Seatbelt allow process-exec to the same subpaths already
  used in file-map-executable so sandboxed Python cannot exec arbitrary
  system binaries such as osascript.
- Remove allow file-read and file-write on literal /dev/tty from the
  Seatbelt profile; tool output is captured via PIPE and there is no
  legitimate path that needs to bypass it.
- Gate site.getusersitepackages() behind
  UNSLOTH_STUDIO_SANDBOX_ALLOW_USER_SITE=1 so the sandbox no longer
  binds the real ~/.local/lib subtree by default (HOME is already
  rewritten to workdir for the child).
- Resolve _BWRAP_PROBE_BIN via shutil.which at module load with the
  /usr/bin/true fallback, so the probe works on Alpine and BusyBox
  layouts that place coreutils under /bin instead of /usr/bin.
- Replace the unreachable _linux_bwrap_path-or-bwrap fallback with an
  assertion that names the sandbox_available contract; same for the
  unreachable RuntimeError in build_sandbox_argv.

studio/backend/run.py
- Run sandbox_available on a background thread with a 2s bounded wait.
  On hardened kernels (unprivileged_userns_clone=0) the probe hangs
  for its full 5s timeout; running it inline delayed the access banner.
  The probe still runs (and caches) eagerly; if it has not reported
  back within 2s the banner is printed first and the unavailable notice
  is suppressed for this call.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] [Security] Unsloth Studio: Code execution (terminal) tool sandboxing can be bypassed to allow privileged commands like sudo/chmod

1 participant