Skip to content
/ AutoAct Public

[ACL 2024] AUTOACT: Automatic Agent Learning from Scratch for QA via Self-Planning

License

Notifications You must be signed in to change notification settings

zjunlp/AutoAct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AutoAct

Automatic Agent Learning from Scratch for QA via Self-Planning

📄arXiv🤗HFPaper🌐Web

Awesome License: MIT

Table of Contents


🌻Acknowledgement

Our code of training module is referenced and adapted from FastChat, while the code of inference module is implemented based on BOLAA. Various baseline codes use ReAct, Reflexion, BOLAA, Chameleon, ReWOO, FireAct respectively. We use LangChain with open models via Fastchat. Thanks for their great contributions!

🌟Overview

Language agents have achieved considerable performance on various complex tasks. Despite the incessant exploration in this field, existing language agent systems still struggle with costly, non-reproducible data reliance and face the challenge of compelling a single model for multiple functions. To this end, we introduce AutoAct, an automatic agent learning framework that does not rely on large-scale annotated data and synthetic trajectories from closed-source models (e.g., GPT-4). Given limited data with a tool library, AutoAct first automatically synthesizes planning trajectories without any assistance from humans or strong closed-source models. Then, AutoAct leverages a division-of-labor strategy to automatically differentiate based on the target task information and synthesized trajectories, producing a sub-agent group to complete the task. We conduct comprehensive experiments with different LLMs, which demonstrates that AutoAct yields better or parallel performance compared to various strong baselines.

method

🔧Installation

git clone https://github.com/zjunlp/AutoAct
cd AutoAct
pip install -r requirements.txt

Before the experiments, you need to apply for a Bing Search key here (not free).

✏️Self-Instruct

We conduct self-instruct on Meta-Agent to acquire a sufficient amount of task data and provide an ample training resource.

python Self_Instruct/data_generation.py \
    --source_data Self_Instruct/Meta_sample/Meta_Hotpotqa.json \
    --target_data Self_Instruct/hotpotqa_metaqa.json \
    --dataset_name hotpotqa  \
    --generate_all_num 800 \
    --generate_per_round_num 10 \
    --model_name llama-2-13b-chat \

The source_data contains data examples from the target task information. The target_data consists of data generated through self-instruct. The variable generate_all_num represents the total number of generated data instances. In order to improve generation efficiency and avoid duplication, we generate generate_per_round_num data instances per round.

📝Self-Planning

Automatic Tool Selection

With the tool library at hand, we ask the Meta-Agent to select applicable tools for each task automatically.

python Self_Planning/Tool_Selection/tool_selected.py \
    --model_name llama-2-13b-chat \
    --task_name ScienceQA \
    --top_k 40 \
    --top_p 0.75 \
    --max_tokens 1024 \
    --tool_save_path Self_Planning/Tool_Selection/{task_name}_Tools.json

The information of the selected tools will be stored in tool_save_path.

Trajectories Synthesis

python Self_Plan/Traj_Syn/run_task.py \
    --agent_name ZeroshotThink_HotPotQA_run_Agent \
    --llm_name llama-2-13b-chat \
    --max_context_len 4096 \
    --task Hotpotqa \
    --task_path Self_Instruct/hotpotqa_metaqa.json \
    --save_path Self_Plan/Traj_Syn/output/hotpotqa_train_data.jsonl

In order to obtain high-quality synthesized trajectories, we filter out all the trajectories with $\texttt{reward}<1$ and collect trajectories with exactly correct answers ($\texttt{reward}=1$) as the training source for self-differentiation. We release the trajectories synthesized by Llama-{13,70}b-chat after filtering in Google Drive (but you should also run filter_data.py for trajectory differentiation).

python Scripts/filter_data.py \
    --source_path Self_Plan/Traj_Syn/output/hotpotqa_train_data.jsonl \
    --save_path Self_Plan/Traj_Syn/output \
    --task_name HotpotQA \
    --filter_num 200

Self-Differentiation

In order to establish a clear division-of-labor, we leverage synthesized planning trajectories to differentiate the Meta-Agent into three sub-agents with distinct functionalities:

  • Plan-Agent undertakes task decomposition and determines which tool to invoke in each planning loop.
  • Tool-Agent is responsible for how to invoke the tool by deciding the parameters for the tool invocation.
  • Reflect-Agent engages in reflection by considering all the historical trajectories and providing a reflection result.

Agent training:

for agent in plan tool reflect
do
echo "####################"
echo $agent
echo "####################"
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 deepspeed Self_Plan/Train/train_lora.py \
    --model_name_or_path llama-2-13b-chat \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path Self_Plan/Traj_Syn/output/data_$agent.json \
    --output_dir Self_Plan/Train/lora/HotpotQA/13b-$agent-5-epoch \
    --num_train_epochs 5 \
    --per_device_train_batch_size 2 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 10000 \
    --save_total_limit 1 \
    --learning_rate 1e-4 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --fp16 True \
    --model_max_length 4096 \
    --gradient_checkpointing True \
    --q_lora False \
    --deepspeed Self_Plan/Train/deepspeed_config_s3.json \
    --resume_from_checkpoint False 
done

Group Planning

After obtaining the task-specific sub-agents, any new question is processed through group planning among the sub-agents to achieve the desired outcome.

python Self_Planning/Group_Planning/run_eval.py \
    --agent_name ZeroshotThink_HotPotQA_run_Agent \
    --plan_agent plan \
    --tool_agent tool \
    --reflect_agent reflect \
    --max_context_len 4096 \
    --task HotpotQA \
    --task_path Self_Planning/Group_Planning/benchmark_run/data/hotpotqa \
    --save_path Self_Planning/Group_Planning/output/13b

We release the trajectories of text sets generated by Llama-{7,13,70}b-chat in Google Drive.

The prompts used in our experiments are in directory [Prompts]https://github.com/zjunlp/AutoAct/tree/main/Prompts.

🚩Citation

Please cite our repository if you use AutoAct in your work. Thanks!

@article{DBLP:journals/corr/abs-2401-05268,
  author       = {Shuofei Qiao and
                  Ningyu Zhang and
                  Runnan Fang and
                  Yujie Luo and
                  Wangchunshu Zhou and
                  Yuchen Eleanor Jiang and
                  Chengfei Lv and
                  Huajun Chen},
  title        = {{AUTOACT:} Automatic Agent Learning from Scratch via Self-Planning},
  journal      = {CoRR},
  volume       = {abs/2401.05268},
  year         = {2024},
  url          = {https://doi.org/10.48550/arXiv.2401.05268},
  doi          = {10.48550/ARXIV.2401.05268},
  eprinttype    = {arXiv},
  eprint       = {2401.05268},
  timestamp    = {Thu, 25 Jan 2024 15:41:08 +0100},
  biburl       = {https://dblp.org/rec/journals/corr/abs-2401-05268.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

🎉Contributors

We will offer long-term maintenance to fix bugs and solve issues. So if you have any problems, please put issues to us.