REDSQL

This README provides comprehensive instructions for setting up the environment, downloading the dataset, and running REDSQL.

Dataset Structure

The dataset is organized into the following directory structure:

.
├── bird
│   ├── database             # Database directory
│   ├── dev_annotation.json  # Generated annotations
│   ├── dev.json            # Development set input
│   ├── dev.sql             # Ground truth SQL queries
│   └── dev_tables.json     # Database schema
├── spider
│   ├── database
│   ├── dev_annotation.json
│   ├── dev_gold.sql
│   ├── dev.json
│   └── dev_tables.json
├── preds
│   └── Predicted_SQLs      # SQL predictions from baseline methods (e.g., PURPLE, Codes)
...

Dataset Components

bird: Contains the BIRD dataset files including database, annotations, development set, ground truth SQL queries, and schema information.
spider: Contains the Spider dataset files with similar structure to BIRD.
preds: Contains SQL predictions from various baseline methods.

Environment Setup

1. System Requirements

sudo apt-get update
sudo apt-get install -y openjdk-11-jdk

2. Create Conda Environment

conda create -n red python=3.9
conda activate red

3. Install Dependencies

# Install PyTorch
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch

# Install NMSLib
conda install -c conda-forge nmslib

# Install remaining requirements
pip install -r requirements.txt

Usage Instructions

1. Directory Setup

mkdir output logs

2. Build Value Search Index

python -m pre_processing.build_contents_index \
    --output_dir=./index/bird/db_contents_index/ \
    --db_dir=./datasets/bird/dev_database/

3. Generate Annotations (Optional)

Note: This step can be skipped as we provide pre-generated annotations for the following datasets:

BIRD

Science

Spider

These annotations are available in our open-source repository.

If you need to generate annotations for a custom dataset, use the following command:

python -m pre_processing.doc \
    --model_name=gpt-4o-2024-08-06 \
    --output_file=./annotation.json \
    --table_file=./datasets/bird/dev_tables.json \
    --db_dir=./datasets/bird/database/

4. Run REDSQL

python -m main.run \
    --model_name=model_name \
    --batch_size=2 \
    --exp_name=exp_name \
    --bug_fix \
    --consistency_num=30 \
    --stage=dev \
    --preds=/path/to/predicted/sql.txt \
    --db_content_index_path=/path/to/db/content/index \
    --annotation=/path/to/dev_annotation.json \
    --output_dir=./output \
    --dev_file=/path/to/dev.json \
    --table_file=/path/to/dev_tables.json \
    --db_dir=/path/to/database

Command Line Arguments

Argument	Description
`--model_name`	Name of the Language Model to use
`--batch_size`	Batch size for processing (default: 2)
`--exp_name`	Name of the experiment
`--bug_fix`	Enable bug fixing functionality
`--bug_only`	Only fix SQL when errors are detected
`--consistency_num`	Number of consistency checks (default: 30)
`--stage`	Processing stage (e.g., 'dev')
`--preds`	Path to predicted SQL statements
`--db_content_index_path`	Path to database content index
`--annotation`	Path to annotation file
`--output_dir`	Directory for output files
`--dev_file`	Path to development set file
`--table_file`	Path to table schema file
`--db_dir`	Path to database directory

Note: Ensure all required files are in place and paths are correctly configured before running the commands.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bug_fix		bug_fix
datasets		datasets
eval		eval
llms		llms
main		main
pre_processing		pre_processing
red		red
.DS_Store		.DS_Store
README.md		README.md
env.sh		env.sh
red.sh		red.sh
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

REDSQL

Dataset Structure

Dataset Components

Environment Setup

1. System Requirements

2. Create Conda Environment

3. Install Dependencies

Usage Instructions

1. Directory Setup

2. Build Value Search Index

3. Generate Annotations (Optional)

4. Run REDSQL

Command Line Arguments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

httdty/REDSQL_VLDB

Folders and files

Latest commit

History

Repository files navigation

REDSQL

Dataset Structure

Dataset Components

Environment Setup

1. System Requirements

2. Create Conda Environment

3. Install Dependencies

Usage Instructions

1. Directory Setup

2. Build Value Search Index

3. Generate Annotations (Optional)

4. Run REDSQL

Command Line Arguments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages