Skip to content

Code for the paper "Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark".

License

Notifications You must be signed in to change notification settings

niklaswretblad/the-effects-of-noise-in-text-to-SQL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Updates

  • 2024-05-15: The paper got accepted into ACL 2024 main! 🎉

Introduction

This is the official repository for the paper "Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark".

We found that the popular BIRD-Bench text-to-SQL dataset and benchmark contains significant noise in questions and SQL queries, and investigate the effect of noise on model performance. The presence of incorrect gold SQL queries, which then generate incorrect gold answers, has a significant impact on the benchmark's reliability. Surprisingly, when evaluating models on corrected SQL queries, zero-shot baselines surpassed the performance of state-of-the-art prompting methods as can be seen in the above picture. We conclude that informative noise labels and reliable benchmarks are crucial to developing new Text-to-SQL methods that can handle varying types of noise.

Datasets

As part of the study, we curate three different datasets which can all be found in the /datasets folder based on the identified errors and annotations:

  1. financial.json The original financial domain of BIRD-Bench, which consists of 106 question and SQL query pairs.
  2. financial_corrected.json A version of the financial domain where noise has been removed from both questions and SQL queries.
  3. financial_corrected_sql.json A version of the financial domain where only noise in the SQL queries has been removed.

Annotations & Corrections

The finalized annotations and corrections agreed upon by the annotators are in the /annotations folder. It includes files for the 106 financial data points and 20 sampled points from each of the other domains (california_schools, superhero, thrombosis_prediction, toxicology) in both Excel and CSV formats. The CSV files can be viewed directly on GitHub. Additionally, there's a UML database schema diagram for the financial domain to aid in understanding the annotations. The annotations where made according to the following categories:

Category Annotation
Spelling/Syntactical error 1
Vague/Ambiguous Questions 2
Incorrect SQL 3
Synonyms 4
String Capitalization 5
Question does not map to DB 6

Statistics over the found noise and errors can be seen in the tables below.

Table 1: Statistics of the total amount of data points that contains errors and the amount of errors in questions and gold queries across five domains.

Statistic Financial California Schools Superhero Toxicology Thrombosis Prediction
Data points with noise 52/106 (49%) 9/20 (45%) 3/20 (15%) 7/20 (35%) 8/20 (40%)
Noisy questions 44/106 (41.5%) 5/20 (25%) 2/20 (10%) 6/20 (30%) 3/20 (15%)
Erroneous gold queries 22/106 (20.7%) 8/20 (40%) 1/20 (5%) 2/20 (10%) 6/20 (30%)

Table 2: Distribution of different types of noise encountered in the domains.

Noise Type Financial California Schools Superhero Toxicology Thrombosis Prediction
Spelling/Syntactical Errors 23 2 1 4 2
Vague/Ambiguous Questions 17 1 1 1 1
Incorrect SQL 22 8 1 2 6
Synonyms 2 0 0 0 0
String Capitalization 7 0 0 0 0
Question does not map to DB 1 4 1 0 0
Total number of errors 72 15 4 7 9

Prerequisites

Install the dependencies:

pip install -r requirements.txt

Set the OPENAI_API_KEY environment variable:

export OPENAI_API_KEY="YOUR_OPENAI_API_KEY"

Running the code

Only implementations of the zero-shot model and the DIN-SQL model used in the experiments can be found in this repository. For the MAC-SQL model, we instead refer to the models own repository which can be found here, and the datasets found in the /datasets folder.

Use the following command format to run a model:

python run_model.py [--model MODEL_NAME] [--dataset DATASET_NAME] [--llm LLM_NAME]
  • --model sets which of the two models to use. The available options are zero_shot and din_sql
  • --dataset sets which of the datasets to use. The available options are financial, financial_corrected and financial_corrected_sql
  • --llm sets which of the openAI LLMs to use. See here for the available models.

Citation

Bibtex:

@misc{wretblad2024understanding,
      title={Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark}, 
      author={Niklas Wretblad and Fredrik Gordh Riseby and Rahul Biswas and Amin Ahmadi and Oskar Holmström},
      year={2024},
      eprint={2402.12243},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

About

Code for the paper "Understanding the Effects of Noise in Text-to-SQL: An Examination of the BIRD-Bench Benchmark".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages