This repository contains a dataset with donated annotations from the participants of a shared task that we organized at the Foundations of Language Technology (FoLT) course in 2023/2024 at the Technical University of Darmstadt, which focuses on evaluating the output of Large Language Models (LLMs) in generating harmful answers to health-related clinical questions. Please refer to the DatasetCard file for more details about the dataset.
![image](https://private-user-images.githubusercontent.com/11052445/344799228-f151d67c-1a1d-4be8-9e65-89d62f1f87f1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk3Mjg4ODksIm5iZiI6MTczOTcyODU4OSwicGF0aCI6Ii8xMTA1MjQ0NS8zNDQ3OTkyMjgtZjE1MWQ2N2MtMWExZC00YmU4LTllNjUtODlkNjJmMWY4N2YxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNTAyMTYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjUwMjE2VDE3NTYyOVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ3ZDQwYTNkY2JiYTUxYTFkMDc3NGMzMGM5ZDQ3ODZhNDY3YjJkNmQ0NWNhYWQ0ZjQxZjVmNjg3ZjlkYTk0MjAmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.lHTnIfYabqSuwGXfKLMZ-QQtX2S1lSCrJ8hlhjd8jiI)
📃 paper
Contact person: Yufang Hou
When using our dataset, please cite us with
@inproceedings{yufang-2024-FoLT,
title = {A Course Shared Task on Evaluating LLM Output for Clinical Questions},
author={Yufang Hou, Thy Thy Tran, Doan Nam Long Vu, Yiwen Cao, Kai Li, Lukas Rohde, Iryna Gurevych},
booktitle = {Proceedings of the Sixth Workshop on Teaching NLP},
year = {2024}
}