GitMiner is a Pharo Smalltalk library that helps developers in analyzing git repositories. With GitMiner, retrieving data about source code, diffs, developer identities, changed files, and commits in a Smalltalk environment will be simpler than ever :)
Originally developed by Stefano Campanella and Carmen Armenti in the REVEAL group.
Note
To ensure the project works without any issues, you have to make sure that you are also running the cloc submodule on your machine. To do so, go check the gitminer-cloc repository and follow the instructions to install it.
If you want to only mine the git repositories, pull from the most recent stable version v1.0.0
:
Metacello new
baseline: 'GitMiner';
repository: 'github://USIREVEAL/gitminer:v1.0.0';
load.
If you want to use the APIs to mine GitHub repositories, you need to pull from the github
branch:
Metacello new
baseline: 'GitMiner';
repository: 'github://USIREVEAL/gitminer:github';
load.
Once your service is running, ensure that the miner is pointing to the correct URL of your cloc service. You can do this by executing the following code (before executing any mining operation):
GMFlagStore uniqueInstance CLOCEndpoint: 'http://localhost:8080/'
To use GitMiner, you can start by creating a new GMRepository
instance with the path to your local git repository:
repo := GMRepository from: aGitRepoPath.
This will create a new repository instance and start mining. Once the mining is done, you can access the mined data through the repository instance. You can eventually save the repository to a file for later use:
repo serializeTo: aPath.
Or if you want to update an already existing repository:
repo := GMRepository from: aGitRepoPath basedOn: aPathWithSerializedData.
To load a previously serialized repository, you can use the following code:
repo := GMRepository deserializeFrom: aPathWithSerializedData.
You can also mine repositories using the command-line interface. For more information, see the GitMiner-CLI
package.
Note
The Serialization format used by GitMiner is proprietary and, even if based on JSON, it is not compatible with other tools.
Gitminer was used to support the following scientific research papers:
- C. Armenti and M. Lanza (2025), "Telling Software Evolution Stories With Sonification", International Conference on Program Comprehension (ICPC), pp. 398–402, IEEE. doi: 10.1109/ICPC66645.2025.00050
- C. Armenti and M. Lanza (2024), "Using Animations to Understand Commits", International Conference on Software Maintenance and Evolution (ICSME), pp. 660–665, IEEE. doi: 10.1109/ICSME58944.2024.00069
- C. Armenti and M. Lanza (2024), "Using Interactive Animations to Analyze Fine-grained Software Evolution", Working Conference on Software Visualization (VISSOFT), pp. 36–47, IEEE. doi: 10.1109/VISSOFT64034.2024.00014
- S. Campanella and M. Lanza (2024), "Hidden in the Code: Visualizing True Developer Identities", Working Conference on Software Visualization (VISSOFT), pp. 24–35, IEEE. doi: 10.1109/VISSOFT64034.2024.00013