Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About big clone bench #3

Open
raven4752 opened this issue Jan 22, 2019 · 3 comments
Open

About big clone bench #3

raven4752 opened this issue Jan 22, 2019 · 3 comments

Comments

@raven4752
Copy link

Hello, thank you for your great work!
Would you release the preprocessing steps for code snippets in big clone bench?

@preesee
Copy link

preesee commented Feb 17, 2019

Hi I am also researching on dataset bigcloneBench. However, I don't understand how you did the evaluation so that the F1 score is almost 1 in your paper, do you make any negative samples for each clone-type or just classify the type-1,type-2type-3type-4 with the softmax? Any explanation will be helpful , thanks

@raven4752
Copy link
Author

from the code and the paper, I guess the high f1 score comes from dominating positive class - the type-3/type-4 clones.

@preesee
Copy link

preesee commented Feb 18, 2019

@raven4752 Sorry to say from my limited research, I am afraid it is really hard to reproduce the score. The other solid paper indicates they only detect no greater than 1% Type3/Type-4 clones
Name |T1 (35,802) |T2 (4,577) |VST3 (4,156) |ST3 (15,031) |MT3 (80,023)| WT3/T4 (7,804,868)
Oreo| 100 |35,798| 99%| 4,547 |100 4,139 89| 13,391 30%| 23,834|0.7 %|57,273
whereas if the dataset is shrinked, it could be happened. The matrix representation is really novel and innovative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants