lOcal MessagE GenerAtor (OMEGA)

Official repository for the ICSE 2025 paper Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation

Candidate Models

Candidate OLLMs

We quantized the following models using the AutoAWQ library:

Candidate SLMs

Setup

Create the conda environment we used by running the following command:

conda env create --file ollm.yml

Setup the VLLM inference endpoint by refering to the VLLM documentation.
Create a .env file in the root directory with the following content:

INFERENCE_PORT=YOUR_INFERENCE_PORT
INFERENCE_URL=http://<YOUR_INFERENCE_ENDPOINT_IP_ADDRESS>/:${INFERENCE_PORT}/v1
USE_OPEN_SOURCE=1 # Set to 0 if you want to try GPT-4
OLLM_SERVER_TYPE=vllm 
MODEL_TEMPERATURE=0
MODEL_NAME=YOUR_MODEL_NAME # e.g. TechxGenus/Meta-Llama-3-70B-Instruct-AWQ
UNDERSTAND_PATH=THE_PATH_TO_UNDERSTAND_EXECUTABLE # e.g. /path/to/und
GITHUB_API_TOKEN=YOUR_GITHUB_API_TOKEN
OPENAI_API_KEY=YOUR_OPENAI_API_KEY # Only required if you want to use GPT-4
OPENAI_ORGANIZATION=YOUR_OPENAI_ORGANIZATION # Only required if you want to use GPT-4

Download the Java projects locally by running the following command:

cd CMG ; python download_projects.py

Download and unzip datasets.zip to the root directory of this project. (A folder named data should be available in the root directory after unzipping).
Download training_data_semantic_embedding.pt and copy it to the CMG folder.
Download java-jars.zip and unzip it to the CMG folder. It should create the JavaParser jar files in the program_contexts folder of the CMG folder.
Download MMS.zip and unzip it to the CMG folder. It should create the requires for MMS and CMMS in the program_contexts folder of the CMG folder.

OLLMs' Performance in CMG With Different Diff Augmentation Techniques

Mistral 7B Instruct v0.3 AWQ

Diff Augmentation	OMG_BLEU	OMG_METEOR	OMG_ROUGEL	HUMAN_BLEU	HUMAN_METEOR	HUMAN_ROUGEL
None	10.86	31.67	28.28	0.58	13.96	7.16
Diff Narrative	11.65	32.65	29.65	0.83	14.94	7.74
FIDEX-generated Diff Summary	12.29	33.85	29.76	0.67	14.79	7.37

CodeQwen1.5 7B Chat AWQ

The changes for this model were not consistent due to the model's issue in repeating the same sentence in its output. See the CSV files in the cmg_results/CodeQwen1.5-7B-Chat-AWQ folder for the generated CMs by this model.

Diff Augmentation	OMG_BLEU	OMG_METEOR	OMG_ROUGEL	HUMAN_BLEU	HUMAN_METEOR	HUMAN_ROUGEL
None	4.82	28.40	21.75	0.41	12.17	5.80
Diff Narrative	5.08	28.34	22.36	0.53	13.23	6.54
FIDEX-generated Diff Summary	4.85	29.26	21.63	0.47	12.37	5.82

Llama3 8B Instruct AWQ

Reported in the paper.

Llama3 70B Instruct AWQ

Diff Augmentation	OMG_BLEU	OMG_METEOR	OMG_ROUGEL	HUMAN_BLEU	HUMAN_METEOR	HUMAN_ROUGEL
None	14.19	36.44	32.06	0.95	16.38	8.16
FIDEX-generated Diff Summary	15.78	38.02	33.80	0.88	17.12	8.45

Files Structure

CMG: Contains the scripts to download the Java projects and generate the commit messages.
cmg_results: Contains the generated commit messages by each SLM/OLLM.
common: Contains the common scripts used by the different models.
evaluation: Contains the scripts for calculating our automatic evaluation metrics.
survey-1 to survey-3: Contains the survey data and the analysis scripts for each survey.
quantization: Contains the scripts used to quantize two of the candidate OLLMs.
data: Contains the datasets used in the paper.

Running Scripts

RQ1

Run the Meta-Llama-3-70B-Instruct-AWQ model for inference on your machine using VLLM.
Uncomment the answering_instructions variable that has the # Original - Used For Survey comment and comment out any other variable with the same name.
Run following command:

cd CMG
REMOVE_COMMENTS=FALSE METHOD_SUMMARIES=OLD python omega.py ../data/omg_data_preprocessed.csv all

RQ2

Run the Meta-Llama-3-70B-Instruct-AWQ model for inference on your machine using VLLM.
Uncomment the answering_instructions variable that has the # Modified - Used After Survey to make it more comprehensive comment and comment out any other variable with the same name.
Run following command:

cd CMG
python omega.py ../data/omg_data_preprocessed.csv all

RQ3

Run the casperhansen/llama-3-8b-instruct-awq model for inference on your machine using VLLM.
Uncomment the answering_instructions variable that has the # Modified - Used After Survey to make it more comprehensive comment and comment out any other variable with the same name.
Run following command to use Diff Narrative:

cd CMG
python omega.py ../data/omg_data_preprocessed.csv all --dn

Run following command to use FIDEX-generated Diff Summary:

cd CMG
python omega.py ../data/omg_data_preprocessed.csv all --fidex

Citation

If you found this work helpful, please consider citing it using the following:

@misc{imani2024contextconquersparametersoutperforming,
      title={Context Conquers Parameters: Outperforming Proprietary LLM in Commit Message Generation}, 
      author={Aaron Imani and Iftekhar Ahmed and Mohammad Moshirpour},
      year={2024},
      eprint={2408.02502},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2408.02502}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lOcal MessagE GenerAtor (OMEGA)

Candidate Models

Candidate OLLMs

Candidate SLMs

Setup

OLLMs' Performance in CMG With Different Diff Augmentation Techniques

Mistral 7B Instruct v0.3 AWQ

CodeQwen1.5 7B Chat AWQ

Llama3 8B Instruct AWQ

Llama3 70B Instruct AWQ

Files Structure

Running Scripts

RQ1

RQ2

RQ3

Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
CMG		CMG
cmg_results		cmg_results
common		common
evaluation		evaluation
quantization		quantization
survey-1		survey-1
survey-2		survey-2
survey-3		survey-3
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
ollm.yml		ollm.yml
us-vs-omg.jpg		us-vs-omg.jpg

aaron-imani/omega

Folders and files

Latest commit

History

Repository files navigation

lOcal MessagE GenerAtor (OMEGA)

Candidate Models

Candidate OLLMs

Candidate SLMs

Setup

OLLMs' Performance in CMG With Different Diff Augmentation Techniques

Mistral 7B Instruct v0.3 AWQ

CodeQwen1.5 7B Chat AWQ

Llama3 8B Instruct AWQ

Llama3 70B Instruct AWQ

Files Structure

Running Scripts

RQ1

RQ2

RQ3

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages