Paper list and datasets for the paper: A Survey on Data Selection for LLM Instruction Tuning
Latest update date: February 2, 2024 UTC.
Labels: publisher year
πPDF, πCodes, π‘Report
Self-Instruct: Aligning Language Models with Self-Generated Instructions. ACL 2023
πPDF, πData
Alpaca: A Strong, Replicable Instruction-Following Model. Report 2023
π‘Blog, πData
WizardLM: Empowering Large Language Models to Follow Complex Instructions. arXiv 2023
πPDF, πData
LIMA: Less Is More for Alignment. arXiv 2023
πPDF, πData
Free Dolly: Introducing the World's First Truly Open Instruction-Tuned LLM. Report 2023
π‘Blog, πData
Multitask Prompted Training Enables Zero-Shot Task Generalization. ICLR 2022
πPDF, πData
Instruction Mining: High-Quality Instruction Data Selection for Large Language Models. arXiv 2023
πPDF
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4. arXiv 2023
πPDF, πCodes
Dataset Quantization. ICCV 2023
πPDF, πCodes
From Quantity to Quality: Boosting LLM Performance with Self-Guided Data Selection for Instruction Tuning. arXiv 2023
πPDF, πCodes
Self-Alignment with Instruction Backtranslation. arXiv 2023
πPDF
One Shot Learning as Instruction Data Prospector for Large Language Models. arXiv 2023
πPDF, πCodes
Self-Evolved Diverse Data Sampling for Efficient Instruction Tuning. arXiv 2023
πPDF, πCodes
TeGit: Generating High-Quality Instruction-Tuning Data with Text-Grounded Task Design. ICLR 2024
πPDF
Active Instruction Tuning: Improving Cross-Task Generalization by Training on Prompt Sensitive Tasks. EMNLP 2023
πPDF, πCodes
AlpaGasus: Training A Better Alpaca with Fewer Data. arXiv 2023
πPDF, π‘Blog
#InsTag: Instruction Tagging for Analyzing Supervised Fine-tuning of Large Language Models. arXiv 2023
πPDF, πCodes
Rethinking the Instruction Quality: {LIFT} is What You Need. arXiv 2023
πPDF
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning. arXiv 2023
πPDF, πCodes
A Preliminary Study of the Intrinsic Relationship between Complexity and Alignment. arXiv 2023
πPDF
WaveCoder: Widespread And Versatile Enhanced Instruction Tuning with Refined Data Generation. arXiv 2023
πPDF
MoDS: Model-oriented Data Selection for Instruction Tuning. arXiv 2023
πPDF, πCodes
Maybe Only 0.5% Data is Needed: A Preliminary Exploration of Low Training Data Instruction Tuning. arXiv 2023
πPDF, πCodes