The repository holds competition data, winning solutions / code, and presentations.
Visit the competition portal to learn how the competition process works and view the leaderboard.
This year, we decided to focus on large language models (LLMs). For more information on LLMs, please visit our LLM primer.
We chose to focus on LLMs because they have demonstrated impressive abilities in NLP (and NLU/NLG). By training on massive text datasets, LLMs can generate human-like text and excel at diverse linguistic tasks. However, thoughtfully harnessing the potential of LLMs for the field of I-O Psychology requires rigorous design and evaluation. This year's competition focused on developing best practices for applying LLMs to I-O tasks. Competitors were required to develop LLM workflows through techniques like prompt engineering, few-shot learning, and fine-tuning using standardized datasets relevant to I-O Psychology. The goal is to benchmark techniques that unlock LLMs' potential as aids for I-O Psychologists through careful design and experimentation. Our goal was to organize a competition that reveals the current abilities of LLMs to assist with workflows in I-O Psychology using public benchmark datasets. Participants report reproducible prompts, results, and analyses to advance best practices for thoughtfully eliciting the strengths of LLMs for professional applications.
Benchmark Datasets
- Predicting Empathy: Job candidates were asked to provide empathetic responses to a difficult workplace situation. Your task is to classify whether empathy was demonstrated or not in each simulated response.
- Generating Interview Responses: Job candidates responded to 5 common interview questions. You will be given the text of 4 question and response pairs. Your task is to generate a likely text response for the 5th question based on the previous responses.
- Rating Item Clarity: Respondents rated the clarity of personality test items using a 7-point scale from 1 = extremely unclear to 7 = extremely clear. Your task is to predict the average clarity rating for each item based on the responses.
- Identifying Fairness Perceptions: Respondents compared two organizational policies and voted on which was fairest. Your task is to identify which policy received the majority vote as the fairer option.
Please visit the competition slide deck for an overview of this year's competition and winners.
- Zihao Jia
- Mina Son
- Philseok Lee
Final score = .666
- Mustafa Akben
- Aaron Satko
Final score = .652
- Jennifer Gibson
- Shane Halder
- Blake Hoffman
- Hannah Johnson
- Joseph Nicolas Luchman
- Nick McCann
- Selena Tran
Final score = .643
- Guglielmo Menchetti (Wonderlic)
- Lea Cleary (Wonderlic)
- Annie Brinza (Wonderlic)
Final score = .630