Skip to content

FreedomIntelligence/UVM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UVM, Uncertainty-Aware Value Models for Language Model Reasoning

Code for the paper Robust Search with Uncertainty-Aware Value Models for Language Model Reasoning

Training

Train the generator

To run the script train_generator.sh (under scripts/gsm8k or scripts/math), you should first set WANDB_API_KEY, WANDB_ENTITY, model_name_or_path, save_dir. The generator is named by save_generator_id

cd UVM
bash scripts/math/train_generator.sh

Train the UVM

Generation

First use the generator generator_id to generate n_solutions for each question in the training set,

cd UVM
bash scripts/math/generate.sh

You should first config the path of your generator checkpoint model_name_or_path

The output will be saved to data/math/model_generation/

Training

Train UVM using train_hyperscorer.sh. First set WANDB_API_KEY, WANDB_ENTITY, save_dir, and model_name_or_path (the path of generator checkpoint).

cd UVM
bash scripts/math/train_hyperscorer.sh

Inference

Experiments are accelerated by vllm. Both the generation and the UVM scoring are implemented by vllm.

UVM-Guided Beam Search

Config your generator checkpoint path model_name_or_path and UVM checkpoint path scorer_model_name_or_path in search.sh

cd UVM
bash scripts/math/search.sh

The output will be saved to eval_results/math/generator_with_scorer/test500 for example

OVM-Guided Beam Search

First set --action_noise_type none in search.sh, then run

cd UVM
bash scripts/math/search.sh

Citation

@misc{yu2025robustsearchuncertaintyawarevalue,
      title={Robust Search with Uncertainty-Aware Value Models for Language Model Reasoning}, 
      author={Fei Yu and Yingru Li and Benyou Wang},
      year={2025},
      eprint={2502.11155},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2502.11155}, 
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published