Code-commenting

Dataset

We use the dataset of the DeepCom code for our training. We have trained mostly on Google Colaboratory. Recommend trying for the university's HPC (if not too busy) or GCloud credits (if ANY of your cards can get through without a refund).

Our model did not cross SOTA performance, which is something we have expected. It has however managed to produce semantically correct comments, occasionally more informative than the user's comments themselves.
Many of the comments within the first epoch were repetitive, but the number of meaningful comments increased significantly over time.

NOTE: The first line is the comment spit out by the machine. The second one is the true human comment:

The model requires more training for rarer tokens:

Here the model fails to spit a grammatically correct word, but it can capture the inner semantics of the code:

Due to teacher-forcing, the machine has been confused, but otherwise, it still tried to produce a meaningful comment when humans gave a bad comment:

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
src-tf		src-tf
src-torch		src-torch
.gitignore		.gitignore
BLEU4.png		BLEU4.png
CONTRIBUTING.md		CONTRIBUTING.md
Figure_1.png		Figure_1.png
Figure_2.png		Figure_2.png
Figure_3.png		Figure_3.png
METEOR.png		METEOR.png
README.md		README.md
bleu_plotter.py		bleu_plotter.py
boxplotter.py		boxplotter.py
hist.txt		hist.txt
plotter.py		plotter.py
requirements.txt		requirements.txt