[Feature] Loading the same model multiple times in opencompass for evaluation on MMLU dataset #120

Sirius222 · 2023-07-27T11:18:32Z

Sirius222
Jul 27, 2023

Describe the feature

I am currently using LLaMa to evaluate PPL's ACC on MMLU dataset and I have a question about opencompass toolkit. When I use opencompass to divide the evaluation task into 40 sub-tasks, does this mean that the toolkit loads the same model 40 times for evaluation? From my observation, the time it takes to load the model is equivalent to the inference time. Therefore, I am concerned that the evaluation may take a long time due to the repeated loading of the model.

Additionally, I am wondering if it is possible to configure the opencompass config to support batch inference? This could potentially improve the efficiency of the evaluation process.

I would greatly appreciate it if you could provide detailed guidance on these issues.

Will you implement it?

I would like to implement this feature and create a PR!

Answered by gaotongxiao

Jul 27, 2023

It's a good catch. OpenCompass's task division system is designed for cluster management system like Slurm, which would dispatch tasks to different nodes for parallel evaluation. However, it can hamper the evaluation process if it is just running on a single node, since each task requires a complete reloading of the weights. The simplest way is to increase the task size to reduce the number of tasks; or you may divide deeper into the docs about Partitioner and switch the strategy to NaivePartitioner (https://opencompass.readthedocs.io/en/latest/user_guides/evaluation.html#task-partition-partitioner).

Batch inference is natively supported, you can specify batch_size in model's config, as a…

View full answer

gaotongxiao · 2023-07-27T12:03:42Z

gaotongxiao
Jul 27, 2023

It's a good catch. OpenCompass's task division system is designed for cluster management system like Slurm, which would dispatch tasks to different nodes for parallel evaluation. However, it can hamper the evaluation process if it is just running on a single node, since each task requires a complete reloading of the weights. The simplest way is to increase the task size to reduce the number of tasks; or you may divide deeper into the docs about Partitioner and switch the strategy to NaivePartitioner (https://opencompass.readthedocs.io/en/latest/user_guides/evaluation.html#task-partition-partitioner).

Batch inference is natively supported, you can specify batch_size in model's config, as already mentioned in Getting Started.

2 replies

Sirius222 Jul 28, 2023
Author

I currently have a single node with 8 gpus. How should I configure infer and eval to accelerate the evaluation of MMLU and Ceval datasets?

gaotongxiao Jul 28, 2023

By default, OpenCompass fully utilizes all the GPUs on the machine and no extra configuration is needed. The only thing that may concern you is the number of task splits, which can controlled by --max-partition-size that customizes the size of each task (https://opencompass.readthedocs.io/en/latest/get_started.html#launch-evaluation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Loading the same model multiple times in opencompass for evaluation on MMLU dataset #120

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

[Feature] Loading the same model multiple times in opencompass for evaluation on MMLU dataset #120

Sirius222 Jul 27, 2023

Describe the feature

Will you implement it?

Replies: 1 comment · 2 replies

gaotongxiao Jul 27, 2023

Sirius222 Jul 28, 2023 Author

gaotongxiao Jul 28, 2023

Sirius222
Jul 27, 2023

Replies: 1 comment 2 replies

gaotongxiao
Jul 27, 2023

Sirius222 Jul 28, 2023
Author