Cloning the repository and downloading the dataset

This repository contains the dataset named "jssp_llm_format_120k.json" by Git Large File Storage (LFS). Follow the steps below to ensure you can properly clone the repository and access the large files.

Prerequisites

Git
Git LFS (download and install from Git LFS website)

Setup Instructions

Step 1: Install Git LFS

Ensure Git LFS is installed on your system. If it is not already installed, you can install it by running:

git lfs install

Step 2: Clone the Repository

git clone https://github.com/starjob42/datasetjsp.git
cd datasetjsp

Step 3: Pull LFS Objects After cloning the repository, ensure Git LFS pulls the large files:

git lfs pull

Step 4: Viewing the DataCard To view the datacard please run the following code:

python read_datacard.py

If Git LFS does not work, the dataset can also be downloaded from: (google drive)

Setting Up Your Python Environment

Follow these instructions to create a virtual environment and install the necessary libraries.

Step 1: Create a Virtual Environment

python3 -m venv llm_env

Activate the Virtual Environment After creating the virtual environment, activate it using the following command:

On Windows

.\llm_env\Scripts\activate

On macOS and Linux

source llm_env/bin/activate

Install the Required Libraries

pip install -r requirements.txt

Training

Make sure to pass correct path to the trainig dataset. The default path is './jssp_llm_format_120k.json'

python train_phi3_lora_jssp.py

Inference

Please download and unzip the checkpoint-1750.zip and put the entire foder inside the checkpoints directory from (google drive). The checkpoints directory should look like this afterwards: ./checkpoints/checkpoint-1750/ . To infer use the following command, which uses 'test_2000.json' testing dataset

python infer_phi3.py

JSSP LLM Format Dataset

Dataset Overview

Dataset Name: jssp_llm_format_120k.json Number of Entries: 120,000
Number of Fields: 5

Fields Description

num_jobs
- Type: int64
- Number of Unique Values: 12
num_machines
- Type: int64
- Number of Unique Values: 12
prompt_jobs_first
- Type: object
- Number of Unique Values: 120,000
prompt_machines_first

Type: object
Number of Unique Values: 120,000

output
- Type: object
- Number of Unique Values: 120,000

Usage

This dataset can be used for training LLMs for job-shop scheduling problems (JSSP). Each entry provides information about the number of jobs, the number of machines, and other relevant details formatted in natural language.

License

This dataset is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). For more details, see the license description. The dataset will remain accessible for an extended period.

Citation

If you use this dataset in your research, please cite it as follows:

@dataset{jssp_for_llm,
author = {Anonymous},
title = {LLMs can Schedule},
year = {2024},
url = {https://github.com/starjob42/datasetjsp.git},
note = {Submitted to NeurIPS 2024 Datasets and Benchmarks}

}

Job Shop Scheduling Dataset for LLM

General Statistics

total_samples: 120000
unique_sizes: 50
data_size_of_group_size_of_instances: 2400
average_jobs: 8.24
average_machines: 5.64
average_makespan: 1434.3538358961537
makespan_variance: 1292087.1381092232
median_makespan: 1211.0
min_makespan: 5.0
max_makespan: 9852.0
Correlations
correlation_jobs_makespan: 0.5658973838191064
correlation_machines_makespan: 0.5480905138716899

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cloning the repository and downloading the dataset

Prerequisites

Setup Instructions

Step 1: Install Git LFS

Step 2: Clone the Repository

Setting Up Your Python Environment

Step 1: Create a Virtual Environment

Install the Required Libraries

Training

Inference

JSSP LLM Format Dataset

Dataset Overview

Fields Description

Usage

License

Citation

Job Shop Scheduling Dataset for LLM

General Statistics

Correlations

Size Distribution

Average Makespan per Size

Variance of Makespan per Size

Median Makespan per Size

Minimum and Maximum Makespan per Size

Histograms

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
checkpoints		checkpoints
plots		plots
utils		utils
README.md		README.md
infer_phi3.py		infer_phi3.py
jssp_llm_format_120k.json		jssp_llm_format_120k.json
jssp_llm_format_120k_datacard.json		jssp_llm_format_120k_datacard.json
read_datacard.py		read_datacard.py
requirements.txt		requirements.txt
test_2000.json		test_2000.json
train_phi3_lora_jssp.py		train_phi3_lora_jssp.py

starjob42/datasetjsp

Folders and files

Latest commit

History

Repository files navigation

Cloning the repository and downloading the dataset

Prerequisites

Setup Instructions

Step 1: Install Git LFS

Step 2: Clone the Repository

Setting Up Your Python Environment

Step 1: Create a Virtual Environment

Install the Required Libraries

Training

Inference

JSSP LLM Format Dataset

Dataset Overview

Fields Description

Usage

License

Citation

Job Shop Scheduling Dataset for LLM

General Statistics

Correlations

Size Distribution

Average Makespan per Size

Variance of Makespan per Size

Median Makespan per Size

Minimum and Maximum Makespan per Size

Histograms

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages