Skip to content

Barnard-PL-Labs/TSL_LLM_Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TSL LLM Benchmark

Code gen models struggle with long context generation, and have been shown to perform better when using code snippets to condition output. We propose a novel approach that uses LLMs to generate TSL specs which are then synthesized to code. The LLM then uses the synthesized code as a seed code to generate the desired program.

Benchmark Overview

We propose a set of benchmarks to test the capability of LLMs in generating TSL specs for reactive synthesis. The benchmarks use a simple TSL spec to generate complex state machines, and are designed to test the ability of LLMs to generate specs as opposed to generating entire state machines. In this way performance of long context high risk code gen applications can be simplified and made more transparent. The snake death is due to user error... The final death is due to illegal keypresses and not a bug.

Ball Game of Life Vending Machine
Ball Gol Vending
Snake Game Space Invaders (ship only) Rotating Cube
Snake Invaders Cube
Expanding Cube
Cube

Using the benckmarks and overview

Each folder contains a set of files that make up a benchmark. Call run.py from the main dir to walk through the process of a benchmark.

Demo:

File Organization

Root Directory: The files in the root directory are used for the generation of any state machine. For each particular state machine, there is a folder with the model-specific files. The state machines created each serve as a benchmark.

  • Impl_template.prompt: The template to be filled in by wrapper_template.html and Headers.txt to create Impl.prompt.
    • Impl_withoutFunctions_template.prompt: The template used when there is no functions under the 'Functions:' section in Headers.txt
  • run.py: Runs the program. Takes files from a benchmark folder which aren't in the computed folder and creates the files of the computed folder.
  • shotPrompt.txt: A text file which provides documentation for TSL and NL->TSL examples. This can help the model with ICL and improves NL to TSL translation.
  • Spec_template.prompt: The prompt template which is used as the query to the LLM in order to generate the TSL specification.
    • Spec_withoutAssumptions_template.prompt: The template used when there is no assumptions under the 'Assumptions:' section in NL.txt
    • Spec_withoutFunctions_template.prompt: The template used when there is no functions under the 'Functions:' section in Headers.txt
    • Spec_withoutFA_template.prompt: The template used when there is no functions and assumptions under the 'Functions:' and 'Assumptions:' sections in Headers.txt and NL.txt

Benchmark Folders: The files within each folder (e.g. Ball, GameOfLife, etc.) are used to create that benchmark.

  • NL.summary.txt: The natural language, high-level summary of the benchmark.
  • NL.txt: The natural language description of the benchmark with a clear list of requirements (Assumptions & Guarantees).
  • Headers.txt: The function and predicate term header definitions.
  • wrapper_template.html: The template html file that the LLM will implement from the Impl.js and Synth.js files that it generates.
  • computed: This folder contains files generated by the LLM (this is where the benchmark part comes in).
    • Spec.tsl: The TSL specification that the LLM generates.
    • Spec.prompt: The prompt which is used to query the LLM for a TSL specification.
    • Impl.prompt: The prompt which is used to query the LLM for a javascript implementation of the function and predicates.
    • Synth.js: The javascript translation from the LLM generated TSL specification. This translation is performed by the TSL API.
    • <BENCHMARK_NAME>.html: The LLM generated html implementation.

Benchmark Walkthrough

  1. NL.txt, NL.summary.txt, and Headers.txt are handmade and used to fill in Spec_template.prompt and create Spec.prompt, which is fed into the LLM.
  2. The LLM outputs Spec.tsl, its formulation of TSL.
  3. This TSL specification is passed into the TSL API and a javascript translation is stored in Synth.js.
  4. The handmade Headers.txt is used to fill in Impl_template.prompt and create Impl.prompt, which is fed into the LLM.
  5. The LLM outputs Impl.js, its javascript implementation of the functions and predicates.
  6. Finally, using wrapper_template.html, Synth.js, and Impl.js, the LLM fills in <BENCHMARK_NAME>.html for the final benchmark implementation.
  7. OPTIONAL: Combine steps 4-6.

Extra Commands

Arbiter

The arbiter examples are generated via:

cd Arbiter
python3 build_up.py

This generates a folder Arbiter/builtup/{timestamp}. Then, run:

py build_nl.py --dir builtup/{timestamp}
py ../run.py -d builtup/{timestamp}