Contextual Bilevel Reinforcement Learning

This code repository accompanies the paper "Contextual Bilevel Reinforcement Learning" by Vinzenz Thoma, Barna Pásztor, Andreas Krause, Giorgia Ramponi, and Yifan Hu. The work is accepted at the Neural Information Processing Systems (NeurIPS) 2024 conference. Preprint available on arXiv.

Requirements

Use Python 3.10 or above and to install requirements in a clean environment, run the following command:

pip install -r requirements.txt

We recommend using a virtual environment, e.g., Conda, to avoid conflicts with other packages.

Four-Rooms Environment

To train HPGD, run this command:

python train_stochastic_bilevel_opt.py --experiment_dir <path_to_data>

We provided the config.yaml files required to reproduce the results reported in the paper in the data/experiment_reg_lambda_0_00* directories.

Comparison algorithms

To train the algorithm (AMD) from Chen et al. 2022, run the following command:

python train_amd.py --experiment_dir <path_to_data>

and to train the Zero-Order gradient estimator, run the following command:

python train_zero_order.py --experiment_dir <path_to_data>

Evaluation and visualization

The evaluation and visualization scripts are provided in the notebooks/experiment_visualization.ipynb notebook. Run this notebook to reproduce the Figures and Tables from the paper.

Results

Our model achieves the following performance on (Table 1) from Section 4 of the paper:

MDP Reg. Parameter	Upper-Level Regularization parameter	HPGD	AMD	Zero-Order
0.001	1	0.91 ± 0.088	0.58 ± 0.000	0.59 ± 0.059
0.001	3	0.51 ± 0.006	0.51 ± 0.000	0.50 ± 0.005
0.001	5	0.46 ± 0.006	0.46 ± 0.003	0.46 ± 0.007
0.003	1	0.95 ± 0.002	1.00 ± 0.000	0.91 ± 0.048
0.003	3	0.73 ± 0.001	0.39 ± 0.000	0.40 ± 0.028
0.003	5	0.29 ± 0.003	0.32 ± 0.000	0.32 ± 0.002
0.005	1	1.17 ± 0.011	1.28 ± 0.003	1.15 ± 0.026
0.005	3	1.01 ± 0.002	1.13 ± 0.004	1.02 ± 0.027
0.005	5	0.87 ± 0.003	0.97 ± 0.009	0.79 ± 0.027

Performance over hyperparameters for the Four-Rooms Problem averaged over 10 random seeds with standard errors. Algorithms perform on-par for most hyperparameters while HPGD outperforms others in few. AMD enjoys low variance due to the non-stochastic gradient updates while Zero-Order suffers from the most variation.

Tax Design Environment

To train HPGD, run this command:

python train_tax_design.py --experiment_dir <path_to_data>

and to train the Zero-Order gradient estimator, run the following command:

python train_tax_design_zero_order.py --experiment_dir <path_to_data>

We provided the config.yaml files required to reproduce the results reported in the paper in the data/experiment_tax_design_reg_lambda_0_00* directories.

Evaluation and visualization

The evaluation and visualization scripts are provided in the notebooks/experiment_visualization_tax_design.ipynb notebook. Run this notebook to reproduce the Figures and Tables from the paper.

References and Contact

With any question about the code, please reach out to Barna Pásztor and, if you find our code useful for your research, cite our work as follows:

@misc{thoma2024stochasticbileveloptimizationlowerlevel,
      title={Stochastic Bilevel Optimization with Lower-Level Contextual Markov Decision Processes}, 
      author={Vinzenz Thoma and Barna Pasztor and Andreas Krause and Giorgia Ramponi and Yifan Hu},
      year={2024},
      eprint={2406.01575},
      archivePrefix={arXiv},
      primaryClass={math.OC},
      url={https://arxiv.org/abs/2406.01575}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contextual Bilevel Reinforcement Learning

Requirements

Four-Rooms Environment

Comparison algorithms

Evaluation and visualization

Results

Tax Design Environment

Evaluation and visualization

References and Contact

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
notebooks		notebooks
src		src
LICENSE		LICENSE
README.md		README.md
figure.png		figure.png
requirements.txt		requirements.txt
train_amd.py		train_amd.py
train_stochastic_bilevel_opt.py		train_stochastic_bilevel_opt.py
train_tax_design.py		train_tax_design.py
train_tax_design_zero_order.py		train_tax_design_zero_order.py
train_zero_order.py		train_zero_order.py

License

lasgroup/HPGD

Folders and files

Latest commit

History

Repository files navigation

Contextual Bilevel Reinforcement Learning

Requirements

Four-Rooms Environment

Comparison algorithms

Evaluation and visualization

Results

Tax Design Environment

Evaluation and visualization

References and Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages