Calibration of Pre-trained Code models

This repository contains the code and data for the paper "On Calibration of Pre-trained Code models"

Environment configuration

To reproduce our experiments, machines with GPUs and NVIDIA CUDA toolkit are required.

The environment dependencies are listed in the file "requirements.txt". You can create conda environment to install required dependencies:

conda create --name <env> --file requirements.txt

Tasks and Datasets

We evaluate the calibraiton of pre-trained code models on different code understanding tasks. The datasets can be downloaded from the following sources:

Code Classification: we use three datasets in our code classification experiments, namely POJ104, Java250 and Python800. Note that the partition of datasets follows CodeNet paper. For each dataset, 20% of the data is used as a testing set, while the rest is divided in 4:1 for training and validation.
Clone Detection: https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Clone-detection-BigCloneBench
Defect Detection: https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/Defect-detection
Exception Type: We directly use the dataset from this paper

ECE Calculation

We employ Expected Calibration Error(ECE) to measure the calibration of pre-trained code models , and we use reliability diagrams to visualize the calibration the difference between the average predicted accuracy and the confidence of the models.

The file "ece_utils" provides the utils to calculate the ECE and draw the reliability diagrams.

Calibaration Methods

We evaluate the effectiveness of two popular and simple calibration methods on pre-trained code models.

Temperature Scaling
Lable Smoothing

How to Run

For each experiment presented in our paper, the code files and scripts are organized in separate subfolders. To facilitate ease of use, we have provided a "run.sh" file within each subfolder, which contains the necessary commands to train or evaluate the corresponding models. For instance, to fine-tune the CodeBERT model on the task of code clone detection, please run the following instructions:

CUDA_VISIBLE_DEVICES=0 python main.py \
    --do_train \
    --model_name=microsoft/codebert-base \
    --train_data_file=../dataset/train_sampled.txt \
    --eval_data_file=../dataset/valid_sampled.txt \
    --output_dir=./models/codebert/ \
    --epoch 5 \
    --block_size 512 \
    --train_batch_size 12 \
    --eval_batch_size 64 \
    --learning_rate 5e-5 \
    --max_grad_norm 1.0 \
    --evaluate_during_training \
    --seed 123456 2>&1| tee ./logs/train_codebert.log

And to evaluate the fine-tuned models, run:

CUDA_VISIBLE_DEVICES=0 python main.py \
    --do_eval \
    --model_name=microsoft/codebert-base \
    --train_data_file=../dataset/train_sampled.txt \
    --eval_data_file=../dataset/test.txt \
    --output_dir=./models/codebert/ \
    --epoch 5 \
    --block_size 512 \
    --train_batch_size 12 \
    --eval_batch_size 64 \
    --seed 123456 2>&1| tee ./logs/test_codebert.log

To calibrate the fine-tuned models with Temperature Scaling, following the instructions in "train_ts.sh" files:

CUDA_VISIBLE_DEVICES=1 python train_ts.py \
    --model_name="microsoft/codebert-base" \
    --model_path="models/codebert/checkpoint-best-f1/model.bin" \
    --eval_data_file="../dataset/valid_sampled.txt" \
    --test_data_file="../dataset/test_sampled.txt" \
    --eval_batch_size=16 \
    --block_size=512

Acknowledgement

We are very grateful that the authors of CodeBERTa, CodeBERT, GraphCodeBERT, CodeT5, UniXcoder, TextCNN and ASTNN make their models and code publicly available so that we can build this repository on top of their code.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
clone_detection_BigCloneBench		clone_detection_BigCloneBench
code_classification		code_classification
defect_detection		defect_detection
exception_type		exception_type
APPENDIX_calibration.pdf		APPENDIX_calibration.pdf
LICENSE		LICENSE
README.md		README.md
ece_utils.py		ece_utils.py
requirements.txt		requirements.txt
temperature_scaling.py		temperature_scaling.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Calibration of Pre-trained Code models

Environment configuration

Tasks and Datasets

ECE Calculation

Calibaration Methods

How to Run

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

queserasera22/Calibration-of-Pretrained-Code-Models

Folders and files

Latest commit

History

Repository files navigation

Calibration of Pre-trained Code models

Environment configuration

Tasks and Datasets

ECE Calculation

Calibaration Methods

How to Run

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages