Repository for the paper "SECURE: Benchmarking Generative Large Language Models as a Cyber Advisory," submitted at 40th Annual Computer Security Applications Conference (ACSAC'24).
This paper introduces SECURE (Security Extraction, Understanding & Reasoning Evaluation), a benchmark designed to assess LLMs performance in realistic cybersecurity scenarios. SECURE includes six datasets focussed on the Industrial Control System sector to evaluate knowledge extraction, understanding, and reasoning based on industry-standard sources.
The dataset folder consists of the 6 TSV files corresponding to the 6 tasks with prompt for the large language models and ground truth as correct answer.