Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

1 Introduction

Multi-agent reinforcement learning (MARL) is employed to develop autonomous agents that can learn to adopt cooperative or competitive strategies within complex environments. However, the linear increase in the number of agents leads to a combinatorial explosion of the action space, which always results in algorithmic instability, difficulty in convergence, or entrapment in local optima. While researchers have designed a variety of effective algorithms to compress the action space, these methods also introduce new challenges, such as the need for manually designed prior knowledge or reliance on the structure of the problem, which diminishes the applicability of these techniques. In this paper, we introduce Evolutionary action SPAce Reduction with Knowledge (eSpark), an exploration function generation framework driven by large language models (LLMs) to boost exploration and prune unnecessary action space in MARL. Using just a basic prompt that outlines the overall task and setting, eSpark is capable of generating exploration functions in a zero-shot manner, identifying and pruning redundant or irrelevant state-action pairs, and then achieving autonomous improvement from policy feedback. In reinforcement learning tasks involving inventory management and traffic light control encompassing a total of 15 scenarios, eSpark consistently outperforms the combined MARL algorithm in all scenarios, achieving an average performance gain of 34.4% and 9.9% in the two types of tasks respectively. Additionally, eSpark has proven to be capable of managing situations with a large number of agents, securing a 29.7% improvement in scalability challenges that featured over 500 agents.

2 Contents

Folder	Description
ReplenishmentEnv	Replenishment env source code.
ReplenishmentEnv/config	Config for building the env.
ReplenishmentEnv/data	Csv based data for skus, including sku_list, info and other dynamic data
ReplenishmentEnv/env	Kernel simulator for env
Baseline	Algorithms source code.
Baseline/MARL_algorithm/	Source code and hyperparameters for MARL baselines and eSpark.
Baseline/OR_algorithm/	Source code for OR baselines.
Baseline/MARL_algorithm/prompts	GPT Prompts for eSpark.
Baseline/MARL_algorithm/config	Algorithms hyperparameters and environment settings.
Baseline/MARL_algorithm/run	The logic of the algorithms interacting with the environment.

3 Requirements

Create a new virtual environment (Optional)

conda create -n re_rl_algo python==3.8 
conda activate re_rl_algo

Install required packages

bash setup.sh

4 Usage

To run eSpark, you need to setup your wandb and GPT key in ./Baseline/MARL_algorithm/config/keys.yaml:

# ======== Wandb Configuration ========
wandb_key: "YOUR_WANDB_KEY"
# ======== GPT Configuration ========
api_key: "YOUR_API_KEY"

And set up the GPT configuration ./Baseline/MARL_algorithm/config/algs/{your-chosen-alg-config}.yaml:

# ======== GPT Configuration ========
api_type: "YOUR_API_TYPE"
api_base: "YOUR_API_BASE"
api_version: "YOUR_API_VERSION"
engine: "GPT_ENGINE"
gpt_sample_num: YOUR-PROCESS-NUMBER (An integer)

Last, run eSpark by: python main.py --config={your-chosen-alg-config} --env-config={your-chosen-env-config}

You can see all the algorithm configs in ./Baseline/MARL_algorithm/config/algs/ and all environment configs in ./Baseline/MARL_algorithm/config/envs/. For example, if we want to run algorithm eSpark_base_ippo on scenario sku100.single_store.standard, we can use:

python main.py --config=eSpark_base_ippo --env-config=sku100.single_store.standard

eSpark distributes all processes equally across all available GPUs. If you want to assign the GPUs to use, add CUDA_VISIBLE_DEVICES command. For example, if yor just want to use GPU0 and GPU1:

CUDA_VISIBLE_DEVICES=0,1 python main.py --config=eSpark_base_ippo --env-config=sku100.single_store.standard

5 Run MARL baselines

To run MARL baselines, you only need to assign the algorithm config and environment config.

IPPO training
- Specify hyper parameter if needed in algorithm file, such as ippo.yaml
- Run python main.py --config=ippo --env-config={your-chosen-env-config}
QTRAN training
- Specify hyper parameter if needed in algorithm file, such as qtran.yaml
- Run python main.py --config=qtran --env-config={your-chosen-env-config}
MAPPO training
- Specify hyper parameter if needed in algorithm file, such as ippo.yaml
- Run python main.py --config=mappo --env-config={your-chosen-env-config}
QTRAN training
- Specify hyper parameter if needed in algorithm file, such as qtran.yaml
- Run python main.py --config=qtran --env-config={your-chosen-env-config}
Get training curve in wandb

6 Run heuristic pruning baselines

To run heuristic pruning baselines for IPPO, you have to set up the heuristic method and the related hyperparameters in ./Baseline/MARL_algorithm/config/algs/ippo_heuristic_prune.yaml:

# ======== Heuristic pruning hyperparameters ========

mask_func_name: "Ss"                    # "Ss", "random" or "upbound"
p: 0.3                                  # The size of the action space to be randomly cropped
r0: 1                                   # Reduction coefficient of s in (S,s) pruning
r1: 2                                   # Amplification factor of S in (S,s) pruning
r2: 0.5                                 # Reduction coefficient of S in (S,s) pruning
upbound: 4                              # upbound limit threshold for upbound pruning

after which you can run heuristic pruning methods with:

python main.py --config=ippo_heuristic_prune --env-config={your-chosen-env-config}

7 Run OR algorithm

Base stock

# Firstly specify scenario ./Baseline/OR_algorithm/base_stock.py, the run the command below:
python Baseline/OR_algorithm/base_stock.py

(S,s)

# Firstly specify scenario in ./Baseline/OR_algorithm/search_sS.py, the run the command below:
python Baseline/OR_algorithm/search_sS.py

8 Evaluate

You need to capture results from wandb to evaluate performance. First, set up the project you want to evaluate in .get_performance_from_wandb.py:

# Replace 'your-entity' with your W&B entity (username or team name)
# Replace 'your-project-name' with your W&B project name
entity = 'your-entity'
project_name = 'your-project-name'

And then run:

python get_performance_from_wandb.py

You can see the performance in .wandb_averaged_experiment_results.csv.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
Baseline		Baseline
ReplenishmentEnv		ReplenishmentEnv
dist		dist
media		media
replenishment.egg-info		replenishment.egg-info
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eSparkframework.png		eSparkframework.png
get_performance_from_wandb.py		get_performance_from_wandb.py
main.py		main.py
setup.py		setup.py
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

1 Introduction

2 Contents

3 Requirements

4 Usage

5 Run MARL baselines

6 Run heuristic pruning baselines

7 Run OR algorithm

Base stock

(S,s)

8 Evaluate

About

Releases

Packages

Languages

License

LiuZhihao2022/eSpark

Folders and files

Latest commit

History

Repository files navigation

Knowing What Not to Do: Leverage Language Model Insights for Action Space Pruning in Multi-agent Reinforcement Learning

1 Introduction

2 Contents

3 Requirements

4 Usage

5 Run MARL baselines

6 Run heuristic pruning baselines

7 Run OR algorithm

Base stock

(S,s)

8 Evaluate

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages