Skip to content

Benchmark for evaluating llm agents to solve real-world standard operating procedures

Notifications You must be signed in to change notification settings

Norditech-AB/SOP-bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

SOP-bench

SOP-bench is a Benchmark for evaluating llm agents to solve real-world standard operating procedures. We will Release the dataset and metrics in 2024

About

Benchmark for evaluating llm agents to solve real-world standard operating procedures

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published