This repository contains the dataset named "jssp_llm_format_120k.json" by Git Large File Storage (LFS). Follow the steps below to ensure you can properly clone the repository and access the large files.
- Git
- Git LFS (download and install from Git LFS website)
Ensure Git LFS is installed on your system. If it is not already installed, you can install it by running:
git lfs install
git clone https://github.com/starjob42/datasetjsp.git
cd datasetjsp
Step 3: Pull LFS Objects After cloning the repository, ensure Git LFS pulls the large files:
git lfs pull
Step 4: Viewing the DataCard To view the datacard please run the following code:
python read_datacard.py
If Git LFS does not work, the dataset can also be downloaded from: (google drive)
Follow these instructions to create a virtual environment and install the necessary libraries.
python3 -m venv llm_env
Activate the Virtual Environment After creating the virtual environment, activate it using the following command:
On Windows
.\llm_env\Scripts\activate
On macOS and Linux
source llm_env/bin/activate
pip install -r requirements.txt
Make sure to pass correct path to the trainig dataset. The default path is './jssp_llm_format_120k.json'
python train_phi3_lora_jssp.py
Please download and unzip the checkpoint-1750.zip and put the entire foder inside the checkpoints directory from (google drive). The checkpoints directory should look like this afterwards: ./checkpoints/checkpoint-1750/ . To infer use the following command, which uses 'test_2000.json' testing dataset
python infer_phi3.py
Dataset Name: jssp_llm_format_120k.json
Number of Entries: 120,000
Number of Fields: 5
-
num_jobs
- Type: int64
- Number of Unique Values: 12
-
num_machines
- Type: int64
- Number of Unique Values: 12
-
prompt_jobs_first
- Type: object
- Number of Unique Values: 120,000
-
prompt_machines_first
- Type: object
- Number of Unique Values: 120,000
- output
- Type: object
- Number of Unique Values: 120,000
This dataset can be used for training LLMs for job-shop scheduling problems (JSSP). Each entry provides information about the number of jobs, the number of machines, and other relevant details formatted in natural language.
This dataset is licensed under the Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0). For more details, see the license description. The dataset will remain accessible for an extended period.
If you use this dataset in your research, please cite it as follows:
@dataset{jssp_for_llm,
author = {Anonymous},
title = {LLMs can Schedule},
year = {2024},
url = {https://github.com/starjob42/datasetjsp.git},
note = {Submitted to NeurIPS 2024 Datasets and Benchmarks}
}
-
total_samples: 120000
-
unique_sizes: 50
-
data_size_of_group_size_of_instances: 2400
-
average_jobs: 8.24
-
average_machines: 5.64
-
average_makespan: 1434.3538358961537
-
makespan_variance: 1292087.1381092232
-
median_makespan: 1211.0
-
min_makespan: 5.0
-
max_makespan: 9852.0
-
correlation_jobs_makespan: 0.5658973838191064
-
correlation_machines_makespan: 0.5480905138716899