Many longstanding and emergent problems in software engineering are at their root a question of measuring correspondence between source code and natural language (e.g. query response and feature localization). It has been proposed that word embedding approaches will allow us to bridge this lexical gap, as word embeddings allow a representation of relative semantic relatedness in a language- agnostic space. This paper shows an attempt at replication of a well-cited publication addressing the problem of bug localization using word embeddings. We use a novel training dataset as our source for developing word embeddings but test on a common, standardized dataset. We provide insights on the process behind experiment replication, offering advice to those wishing to increase the replicability of their publications. We demonstrate the influence of choices in preprocessing steps, further highlighting the need for extensive experiment reporting as the field of software engineering continues to integrate machine learning tools.
-
Notifications
You must be signed in to change notification settings - Fork 1
nirtiac/EmbeddingBugs
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Source Code for Applying Embeddings to the Bug Localization Task
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published