SOP-bench is a Benchmark for evaluating llm agents to solve real-world standard operating procedures. We will Release the dataset and metrics in 2024
-
Notifications
You must be signed in to change notification settings - Fork 1
Norditech-AB/SOP-bench
About
Benchmark for evaluating llm agents to solve real-world standard operating procedures
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published