handle `TypeError("Can only compare inequalities with Expr")` from sympy

From the GRPO [huggingface/open-r1](https://github.com/huggingface/open-r1) run, some cases seem to receive this error from sympy and the training run stops:

```
[rank24]:   File "/home/lewis/open-r1/src/open_r1/grpo.py", line 81, in accuracy_reward
[rank24]:     reward = float(verify(answer_parsed, gold_parsed))
...
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 1380, in is_ge
[rank24]:     raise TypeError("Can only compare inequalities with Expr")
```

<details>
<summary>Full trace</summary>
<pre>
[INFO|trainer.py:2347] 2025-02-02 07:39:12,815 >> ***** Running training *****
[INFO|trainer.py:2348] 2025-02-02 07:39:12,815 >>   Num examples = 72,441
[INFO|trainer.py:2349] 2025-02-02 07:39:12,815 >>   Num Epochs = 1
[INFO|trainer.py:2350] 2025-02-02 07:39:12,815 >>   Instantaneous batch size per device = 1
[INFO|trainer.py:2353] 2025-02-02 07:39:12,815 >>   Total train batch size (w. parallel, distributed & accumulation) = 48
[INFO|trainer.py:2354] 2025-02-02 07:39:12,815 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:2355] 2025-02-02 07:39:12,815 >>   Total optimization steps = 1,510
[INFO|trainer.py:2356] 2025-02-02 07:39:12,816 >>   Number of trainable parameters = 1,777,088,000
[INFO|integration_utils.py:817] 2025-02-02 07:39:12,817 >> Automatic Weights & Biases logging enabled, to disable set os.environ["WANDB_DISABLED"] = "true"
wandb: WARNING The `run_name` is currently set to the same value as `TrainingArguments.output_dir`. If this was not intended, please specify a different run name by setting the `TrainingArguments.run_name` parameter.
wandb: Currently logged in as: ctjlewis (spellcraft) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin
wandb: Using wandb-core as the SDK backend.  Please refer to https://wandb.me/wandb-core for more information.
wandb: Tracking run with wandb version 0.19.5
wandb: Run data is saved locally in /home/lewis/open-r1/wandb/run-20250202_073912-ztq8omt8
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run data/Qwen2.5-1.5B-Open-R1-GRPO
wandb: ⭐️ View project at https://wandb.ai/spellcraft/huggingface
wandb: 🚀 View run at https://wandb.ai/spellcraft/huggingface/runs/ztq8omt8
  0%|          | 0/1510 [00:00<?, ?it/s]Invalidate trace cache @ step 0 and module 740: cache has only 0 modules
{'loss': 0.0, 'grad_norm': 0.08911343143040204, 'learning_rate': 1.3245033112582784e-07, 'completion_length': 902.9401245117188, 'rewards/accuracy_reward': 0.1770833432674408, 'rewards/format_reward': 0.0, 'reward': 0.1770833432674408, 'reward_std': 0.18087130784988403, 'kl': 0.0, 'epoch': 0.0}
  0%|          | 1/1510 [00:41<17:28:26, 41.69s/it]Invalidate trace cache @ step 0 and module 1110: cache has only 0 modules
  0%|          | 2/1510 [01:12<14:42:56, 35.13s/it]{'loss': -0.0, 'grad_norm': 0.06913274440244728, 'learning_rate': 2.649006622516557e-07, 'completion_length': 940.0521240234375, 'rewards/accuracy_reward': 0.1953125, 'rewards/format_reward': 0.0, 'reward': 0.1953125, 'reward_std': 0.12511277198791504, 'kl': 0.0, 'epoch': 0.0}
  0%|          | 2/1510 [01:12<14:42:56, 35.13s/it]Invalidate trace cache @ step 0 and module 1480: cache has only 0 modules
{'loss': 0.0, 'grad_norm': 0.09778616677653701, 'learning_rate': 3.973509933774835e-07, 'completion_length': 895.9479370117188, 'rewards/accuracy_reward': 0.1588541716337204, 'rewards/format_reward': 0.0, 'reward': 0.1588541716337204, 'reward_std': 0.18937695026397705, 'kl': 7.486343383789062e-05, 'epoch': 0.0}
  0%|          | 3/1510 [01:42<13:44:19, 32.82s/it]Invalidate trace cache @ step 0 and module 1850: cache has only 0 modules
  0%|          | 4/1510 [02:12<13:17:37, 31.78s/it]{'loss': 0.0, 'grad_norm': 0.08629236272071814, 'learning_rate': 5.298013245033113e-07, 'completion_length': 912.5494995117188, 'rewards/accuracy_reward': 0.1979166716337204, 'rewards/format_reward': 0.0, 'reward': 0.1979166716337204, 'reward_std': 0.1985924243927002, 'kl': 7.963180541992188e-05, 'epoch': 0.0}
  0%|          | 4/1510 [02:12<13:17:37, 31.78s/it]Invalidate trace cache @ step 0 and module 2220: cache has only 0 modules
  0%|          | 5/1510 [02:42<13:02:42, 31.20s/it]{'loss': 0.0, 'grad_norm': 0.09091995954155511, 'learning_rate': 6.622516556291392e-07, 'completion_length': 887.7369995117188, 'rewards/accuracy_reward': 0.1953125, 'rewards/format_reward': 0.0, 'reward': 0.1953125, 'reward_std': 0.20831291377544403, 'kl': 8.7738037109375e-05, 'epoch': 0.0}
  0%|          | 5/1510 [02:42<13:02:42, 31.20s/it]Invalidate trace cache @ step 0 and module 2590: cache has only 0 modules
  0%|          | 6/1510 [03:14<13:07:58, 31.44s/it]{'loss': 0.0, 'grad_norm': 0.08661526046919345, 'learning_rate': 7.94701986754967e-07, 'completion_length': 946.9271240234375, 'rewards/accuracy_reward': 0.1796875, 'rewards/format_reward': 0.0, 'reward': 0.1796875, 'reward_std': 0.19866567850112915, 'kl': 9.298324584960938e-05, 'epoch': 0.0}
  0%|          | 6/1510 [03:14<13:07:58, 31.44s/it]Invalidate trace cache @ step 0 and module 2960: cache has only 0 modules
{'loss': 0.0, 'grad_norm': 0.054895955660683546, 'learning_rate': 9.271523178807948e-07, 'completion_length': 958.1171875, 'rewards/accuracy_reward': 0.0885416716337204, 'rewards/format_reward': 0.0, 'reward': 0.0885416716337204, 'reward_std': 0.09810718148946762, 'kl': 7.486343383789062e-05, 'epoch': 0.0}
  0%|          | 7/1510 [03:47<13:18:19, 31.87s/it]Invalidate trace cache @ step 0 and module 3330: cache has only 0 modules
  1%|          | 8/1510 [04:18<13:08:57, 31.52s/it]{'loss': 0.0, 'grad_norm': 0.09800075149435915, 'learning_rate': 1.0596026490066227e-06, 'completion_length': 871.3463745117188, 'rewards/accuracy_reward': 0.2161458432674408, 'rewards/format_reward': 0.0, 'reward': 0.2161458432674408, 'reward_std': 0.1820903718471527, 'kl': 9.012222290039062e-05, 'epoch': 0.01}
  1%|          | 8/1510 [04:18<13:08:57, 31.52s/it]Invalidate trace cache @ step 0 and module 3700: cache has only 0 modules
{'loss': 0.0, 'grad_norm': 0.09927968702051182, 'learning_rate': 1.1920529801324504e-06, 'completion_length': 922.265625, 'rewards/accuracy_reward': 0.1744791716337204, 'rewards/format_reward': 0.0, 'reward': 0.1744791716337204, 'reward_std': 0.2220773994922638, 'kl': 8.106231689453125e-05, 'epoch': 0.01}
  1%|          | 9/1510 [04:47<12:53:34, 30.92s/it]Invalidate trace cache @ step 0 and module 4070: cache has only 0 modules
  1%|          | 10/1510 [05:16<12:39:42, 30.39s/it]{'loss': 0.0, 'grad_norm': 0.07315654095976738, 'learning_rate': 1.3245033112582784e-06, 'completion_length': 910.578125, 'rewards/accuracy_reward': 0.1666666716337204, 'rewards/format_reward': 0.0, 'reward': 0.1666666716337204, 'reward_std': 0.1368560642004013, 'kl': 8.7738037109375e-05, 'epoch': 0.01}
  1%|          | 10/1510 [05:16<12:39:42, 30.39s/it]Invalidate trace cache @ step 0 and module 4440: cache has only 0 modules
[rank24]: Traceback (most recent call last):
[rank24]:   File "/home/lewis/open-r1/src/open_r1/grpo.py", line 237, in <module>
[rank24]:     main(script_args, training_args, model_args)
[rank24]:   File "/home/lewis/open-r1/src/open_r1/grpo.py", line 189, in main
[rank24]:     train_result = trainer.train(resume_from_checkpoint=checkpoint)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2175, in train
[rank24]:     return inner_training_loop(
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 2490, in _inner_training_loop
[rank24]:     tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/transformers/trainer.py", line 3598, in training_step
[rank24]:     loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/trl/trainer/grpo_trainer.py", line 494, in compute_loss
[rank24]:     output_reward_func = reward_func(prompts=prompts, completions=completions, **reward_kwargs)
[rank24]:   File "/home/lewis/open-r1/src/open_r1/grpo.py", line 81, in accuracy_reward
[rank24]:     reward = float(verify(answer_parsed, gold_parsed))
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/math_verify/grader.py", line 447, in verify
[rank24]:     return any(compare_single_extraction_wrapper(g, t) for g, t in product(gold, target))
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/math_verify/grader.py", line 447, in <genexpr>
[rank24]:     return any(compare_single_extraction_wrapper(g, t) for g, t in product(gold, target))
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/math_verify/grader.py", line 438, in compare_single_extraction_wrapper
[rank24]:     return compare_single_extraction(g, t)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/math_verify/utils.py", line 50, in wrapper
[rank24]:     return func(*args, **kwargs)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/math_verify/grader.py", line 420, in compare_single_extraction
[rank24]:     return sympy_expr_eq(gold, target, precision)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/math_verify/grader.py", line 365, in sympy_expr_eq
[rank24]:     return sympy_compare_sets(gold, pred, precision)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/math_verify/grader.py", line 316, in sympy_compare_sets
[rank24]:     if a_set.symmetric_difference(b_set).is_empty:
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 259, in symmetric_difference
[rank24]:     return SymmetricDifference(self, other)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 2183, in __new__
[rank24]:     return SymmetricDifference.reduce(a, b)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 2189, in reduce
[rank24]:     result = B._symmetric_difference(A)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 262, in _symmetric_difference
[rank24]:     return Union(Complement(self, other), Complement(other, self))
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 1721, in __new__
[rank24]:     return Complement.reduce(a, b)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 1731, in reduce
[rank24]:     if B == S.UniversalSet or A.is_subset(B):
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 413, in is_subset
[rank24]:     ret = self._eval_is_subset(other)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 2056, in _eval_is_subset
[rank24]:     return fuzzy_and(other._contains(e) for e in self.args)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/logic.py", line 142, in fuzzy_and
[rank24]:     for ai in args:
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 2056, in <genexpr>
[rank24]:     return fuzzy_and(other._contains(e) for e in self.args)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/sets/sets.py", line 2053, in _contains
[rank24]:     return Or(*[Eq(e, other, evaluate=True) for e in self.args])
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/operations.py", line 513, in __new__
[rank24]:     _args = frozenset(cls._new_args_filter(args))
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/logic/boolalg.py", line 741, in _new_args_filter
[rank24]:     c = x.canonical
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 333, in canonical
[rank24]:     args = tuple([i.canonical if isinstance(i, Relational) else i for i in self.args])
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 333, in <listcomp>
[rank24]:     args = tuple([i.canonical if isinstance(i, Relational) else i for i in self.args])
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 335, in canonical
[rank24]:     r = self.func(*args)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 852, in __new__
[rank24]:     return cls._eval_relation(lhs, rhs, **options)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 859, in _eval_relation
[rank24]:     val = cls._eval_fuzzy_relation(lhs, rhs)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 1186, in _eval_fuzzy_relation
[rank24]:     return is_lt(lhs, rhs)
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 1265, in is_lt
[rank24]:     return fuzzy_not(is_ge(lhs, rhs, assumptions))
[rank24]:   File "/opt/conda/lib/python3.10/site-packages/sympy/core/relational.py", line 1380, in is_ge
[rank24]:     raise TypeError("Can only compare inequalities with Expr")
[rank24]: TypeError: Can only compare inequalities with Expr
[rank32]:[E202 07:45:01.714262093 ProcessGroupNCCL.cpp:542] [Rank 32] Collective WorkNCCL(SeqNum=23529, OpType=_ALLGATHER_BASE, NumelIn=8, NumelOut=384, Timeout(ms)=1800000) raised the following async exception: NCCL error: remote process exited or there was a network error, NCCL version 2.21.5
ncclRemoteError: A call failed possibly due to a network error or a remote process exiting prematurely.
</pre>
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

handle `TypeError("Can only compare inequalities with Expr")` from sympy #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

handle TypeError("Can only compare inequalities with Expr") from sympy #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

handle `TypeError("Can only compare inequalities with Expr")` from sympy #6