EMNLP-2023-Papers

Resources and Evaluation

Title	Repo	Paper	Video
HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models			➖
TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models			➖
BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification			➖
IDTraffickers: An Authorship Attribution Dataset to Link and Connect Potential Human-Trafficking Operations on Text Escort Advertisements			➖
This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models			➖
You Told Me That Joke Twice: A Systematic Investigation of Transferability and Robustness of Humor Detection Models			➖
Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization			➖
Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark			➖
It Ain't Over: A Multi-Aspect Diverse Math Word Problem Dataset			➖
Syllogistic Reasoning for Legal Judgment Analysis			➖
TempTabQA: Temporal Question Answering for Semi-Structured Tables			➖
Multilingual Previously Fact-Checked Claim Retrieval			➖