Skip to content

Limiting history indexing to select repositories only #4667

@aerofeev2k

Description

@aerofeev2k

Hi,

We've recently added a few GIT directories to the already checked out and indexed tree (that used to have only CVS before), and since then producing an index of history of changes went downhill. What used to take an hour is now running for days.

It seems that for every source file OpenGrok is using JGIT to get the history of changes, and that's taking 3-5s per file. I can see in lsof/strace how java is reading huge GIT pack files all the time, I can see in jstack how it's decompressing them and then walking the revision tree (a typical stack is below).

Not passing -H to the indexer helps things tremendously, but we lose the index of CVS history as well, and that used to work just fine before.

Is there a way to prevent OpenGrok from indexing history of all GIT repositories or of select repositories (specified by path) while still preserving index of CVS repositories in one big combined tree?

We've tried a few things, but short of not passing -H, nothing seemed to work. E.g. we've tried using --repository command line option to explicitly list only CVS repositories, or setting historyEnabled to false for GIT repositories in configuration.xml, but OpenGrok still tries indexing history for files from the GIT section of the checked out tree.

java.lang.Thread.State: RUNNABLE
        at org.eclipse.jgit.treewalk.TreeWalk.next(TreeWalk.java:820)
        at org.eclipse.jgit.revwalk.TreeRevFilter.include(TreeRevFilter.java:170)
        at org.eclipse.jgit.revwalk.PendingGenerator.next(PendingGenerator.java:108)
        at org.eclipse.jgit.revwalk.BlockRevQueue.<init>(BlockRevQueue.java:40)
        at org.eclipse.jgit.revwalk.FIFORevQueue.<init>(FIFORevQueue.java:37)
        at org.eclipse.jgit.revwalk.StartGenerator.next(StartGenerator.java:133)
        at org.eclipse.jgit.revwalk.RevWalk.next(RevWalk.java:591)
        at org.eclipse.jgit.revwalk.RevWalk.nextForIterator(RevWalk.java:1526)
        at org.eclipse.jgit.revwalk.RevWalk.iterator(RevWalk.java:1550)
        at org.opengrok.indexer.history.GitRepository.getHistory(GitRepository.java:520)
        at org.opengrok.indexer.history.GitRepository.getHistory(GitRepository.java:476)
        at org.opengrok.indexer.history.GitRepository.getHistory(GitRepository.java:443)
        at org.opengrok.indexer.history.GitRepository.getHistory(GitRepository.java:438)
        at org.opengrok.indexer.history.FileHistoryCache.get(FileHistoryCache.java:683)
        at org.opengrok.indexer.history.HistoryGuru.getHistoryFromCache(HistoryGuru.java:248)
        at org.opengrok.indexer.history.HistoryGuru.getHistory(HistoryGuru.java:309)
        at org.opengrok.indexer.history.HistoryGuru.getHistory(HistoryGuru.java:215)
        at org.opengrok.indexer.analysis.AnalyzerGuru.populateDocument(AnalyzerGuru.java:601)
        at org.opengrok.indexer.index.IndexDatabase.addFile(IndexDatabase.java:831)
        at org.opengrok.indexer.index.IndexDatabase.lambda$indexParallel$4(IndexDatabase.java:1361)
        at org.opengrok.indexer.index.IndexDatabase$$Lambda$324/0x00000008002f3440.apply(Unknown Source)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions