Fix `accuracy_reward` crash when called from non-main thread by qgallouedec · Pull Request #5281 · huggingface/trl

qgallouedec · 2026-03-13T03:40:18Z

Summary

math_verify's parse() and verify() use signal.alarm() internally, which raises ValueError when called from a non-main thread (required for asynchronous training)
Detect non-main thread and disable signal-based timeouts (parsing_timeout=None, timeout_seconds=None)
Suppress the resulting "Timeout is disabled" warnings since the timeout removal is intentional
Affects both accuracy_reward and reasoning_accuracy_reward

Reproduce

import threading

from trl.rewards import accuracy_reward

solutions = [r"\frac{1}{3}"]
completions = [[{"role": "assistant", "content": r"The answer is \boxed{\frac{1}{3}}"}]]


def run_in_thread():
    result = accuracy_reward(completions, solutions)
    print("Result:", result)


# Works fine in the main thread
print("Main thread:")
run_in_thread()

# Fails in a worker thread
print("\nWorker thread:")
t = threading.Thread(target=run_in_thread)
t.start()
t.join()

Before:

Main thread:
Result: [1.0]

Worker thread:
Exception in thread Thread-1 (run_in_thread):
Traceback (most recent call last):
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/math_verify/parser.py", line 702, in parse
    return timeout(timeout_seconds=parsing_timeout)(extract_target_from_pred)(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        pred,
        ^^^^^
    ...<2 lines>...
        extraction_mode=extraction_mode,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/math_verify/utils.py", line 60, in wrapper
    signal.signal(signal.SIGALRM, handler)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/signal.py", line 58, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
ValueError: signal only works in main thread of the main interpreter

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/threading.py", line 1044, in _bootstrap_inner
    self.run()
    ~~~~~~~~^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/threading.py", line 995, in run
    self._target(*self._args, **self._kwargs)
    ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/trl/reproduce_thread_issue.py", line 12, in run_in_thread
    result = accuracy_reward(completions, solutions)
  File "/fsx/qgallouedec/trl/trl/rewards/accuracy_rewards.py", line 57, in accuracy_reward
    gold_parsed = parse(sol)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/math_verify/parser.py", line 711, in parse
    raise ValueError(
        "Math-Verify 'parse' function doesn't support threaded environment due to usage of signal.alarm() in timeout mechanism. If you need to run in multithreaded environment it's recommended to set the parsing_timeout=None, which will run without timeout (and signal handling). In this case you need to handle the timeouting yourself."
    ) from e
ValueError: Math-Verify 'parse' function doesn't support threaded environment due to usage of signal.alarm() in timeout mechanism. If you need to run in multithreaded environment it's recommended to set the parsing_timeout=None, which will run without timeout (and signal handling). In this case you need to handle the timeouting yourself.

After

Main thread:
Result: [1.0]

Worker thread:
Result: [1.0]

Note

Medium Risk
Changes reward evaluation behavior by disabling math_verify timeouts off the main thread, which avoids crashes but can allow parses/verifications to run unbounded in worker threads.

Overview
Fixes accuracy_reward and reasoning_accuracy_reward crashing in non-main threads by detecting thread context and disabling math_verify’s signal-based parse/verify timeouts when not on the main thread.

Also suppresses the expected "timeout disabled" warnings from math_verify in worker threads and adds a regression test that calls accuracy_reward from a threading.Thread.

^{Written by Cursor Bugbot for commit e0e6a76. This will update automatically on new commits. Configure here.}

chatgpt-codex-connector · 2026-03-13T03:40:28Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

HuggingFaceDocBuilderDev · 2026-03-13T03:42:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

cursor · 2026-03-13T03:43:33Z

+    # Suppress the "Timeout is disabled" warnings from math_verify when we intentionally disable timeouts
+    if not is_main_thread:
+        logging.getLogger("math_verify.parser").setLevel(logging.ERROR)
+        logging.getLogger("math_verify.grader").setLevel(logging.ERROR)


Logger level permanently modified as global side effect

Medium Severity

logging.getLogger() returns a process-wide singleton, so calling setLevel(logging.ERROR) permanently suppresses warnings from math_verify.parser and math_verify.grader for the entire process. After any single call from a non-main thread, all subsequent calls — including from the main thread — will have these warnings silenced. The log levels are never restored after the function returns. A scoped approach (e.g., saving and restoring the original level in a try/finally) would avoid this persistent global side effect.

Additional Locations (1)

trl/rewards/accuracy_rewards.py#L166-L169

commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

commit 3972d66 Author: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Date: Wed Mar 18 22:26:44 2026 +0100 Suggest the `Json()` type for tool calling dataset format (#5307) Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> commit 5c6e915 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 14:55:19 2026 -0600 Update `RewardFunc` type annotation to allow `None`values in reward list (#5297) commit ee96845 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Wed Mar 18 17:03:54 2026 +0100 Fix DPOTrainer collators to truncate sequences before padding (#5305) commit 435c2ae Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 08:09:42 2026 -0600 Add guidance to avoid `hasattr` and `getattr` with defaults in `AGENTS.md` (#5294) commit 26ce6a3 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Wed Mar 18 00:44:12 2026 -0600 Apply docstyle (#5296) commit 52cd0cc Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:31:26 2026 +0100 Fix UNEXPECTED lm_head.weight warning when loading a CausalLM as a reward model (#5295) commit 7b42fc4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:29:11 2026 +0100 Prevent corruption of DPO VLM training if "keep_end" truncation_mode (#5286) commit 3acb8e8 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Tue Mar 17 15:27:10 2026 +0100 Support max_length in DPO VLM training (#5284) commit ee339a0 Author: Carlos Miguel Patiño <carlos.patino@huggingface.co> Date: Tue Mar 17 14:01:44 2026 +0100 [GKD] Buffer Implementation for Distillation Trainer (#5137) Co-authored-by: lewtun <lewis.c.tunstall@gmail.com> commit d46131f Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:27:19 2026 +0100 Remove custom get_train/eval_dataloader from OnlineDPO (#5291) commit 85cf8f4 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 15:24:24 2026 +0100 Remove TrainingArguments import from experimental trainers (#5290) commit 91e3da0 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Mon Mar 16 07:19:51 2026 -0600 Fix `accuracy_reward` crash when called from non-main thread (#5281) commit 4996631 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:44:28 2026 +0100 Fix support for model_init_kwargs in MiniLLM when passed as CLI JSON string (#5274) commit 5fceaa7 Author: Albert Villanova del Moral <8515462+albertvillanova@users.noreply.github.com> Date: Mon Mar 16 07:43:34 2026 +0100 Simplify structured outputs logic across vLLM versions in scripts/vllm_serve (#5273) commit 406d406 Author: casinca <47400729+casinca@users.noreply.github.com> Date: Sat Mar 14 04:12:49 2026 +0100 feat(`grpo_trainer.py`): Variational Sequence-Level Soft Policy Optimization (VESPO) (#5199) commit d0ac7ef Author: LeonEricsson <70749762+LeonEricsson@users.noreply.github.com> Date: Sat Mar 14 02:53:33 2026 +0100 Allow nullable logprobs in vLLM serve responses (#5203) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com> commit c0eabc4 Author: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Date: Fri Mar 13 18:19:15 2026 -0600 Change default `vllm_mode` to `"colocate"` and add v0→v1 migration guide (#5255) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> commit 6c0fccd Author: Mario Šaško <mariosasko777@gmail.com> Date: Sat Mar 14 00:19:38 2026 +0100 35% faster packing + rename `bfd-requeue` to `bfd_split` (#5189) Co-authored-by: Quentin Gallouédec <45557362+qgallouedec@users.noreply.github.com> Co-authored-by: Quentin Gallouédec <gallouedec.quentin@gmail.com>

…face#5281)

Fix accuracy_reward crash when called from non-main thread

ae80da9

qgallouedec requested review from AmineDiro and albertvillanova March 13, 2026 03:40

cursor Bot reviewed Mar 13, 2026

View reviewed changes

code style

e0e6a76

AmineDiro approved these changes Mar 16, 2026

View reviewed changes

qgallouedec merged commit 91e3da0 into main Mar 16, 2026
14 checks passed

qgallouedec deleted the fix-math-verify-thread-safety branch March 16, 2026 13:19

songhappy pushed a commit to songhappy/trl that referenced this pull request Apr 20, 2026

Fix accuracy_reward crash when called from non-main thread (hugging…

3a672b1

…face#5281)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `accuracy_reward` crash when called from non-main thread#5281

Fix `accuracy_reward` crash when called from non-main thread#5281
qgallouedec merged 2 commits into
mainfrom
fix-math-verify-thread-safety

qgallouedec commented Mar 13, 2026 •

edited by cursor Bot

Loading

Uh oh!

chatgpt-codex-connector Bot commented Mar 13, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qgallouedec commented Mar 13, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Reproduce

Uh oh!

chatgpt-codex-connector Bot commented Mar 13, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Mar 13, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Mar 13, 2026

Choose a reason for hiding this comment

Logger level permanently modified as global side effect

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qgallouedec commented Mar 13, 2026 •

edited by cursor Bot

Loading