EMNLP-2023-Papers Resources and Evaluation Title Repo Paper Video HaluEval: A Large-Scale Hallucination Evaluation Benchmark for Large Language Models ➖ TRIGO: Benchmarking Formal Mathematical Proof Reduction for Generative Language Models ➖ BanglaAbuseMeme: A Dataset for Bengali Abusive Meme Classification ➖ IDTraffickers: An Authorship Attribution Dataset to Link and Connect Potential Human-Trafficking Operations on Text Escort Advertisements ➖ This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models ➖ You Told Me That Joke Twice: A Systematic Investigation of Transferability and Robustness of Humor Detection Models ➖ Unveiling the Essence of Poetry: Introducing a Comprehensive Dataset and Benchmark for Poem Summarization ➖ Do LLMs Understand Social Knowledge? Evaluating the Sociability of Large Language Models with SocKET Benchmark ➖ It Ain't Over: A Multi-Aspect Diverse Math Word Problem Dataset ➖ Syllogistic Reasoning for Legal Judgment Analysis ➖ TempTabQA: Temporal Question Answering for Semi-Structured Tables ➖ Multilingual Previously Fact-Checked Claim Retrieval ➖