Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin causes the build to hang with no other information logged #468

Open
rherrick opened this issue Mar 18, 2021 · 4 comments
Open

Plugin causes the build to hang with no other information logged #468

rherrick opened this issue Mar 18, 2021 · 4 comments

Comments

@rherrick
Copy link

What happened?

On a new Jenkins server, a couple of our builds that work locally and on an old Jenkins server were hanging. I looked into it and narrowed the cause down to the the com.palantir.git-version plugin: if I remove the references to this plugin, the build works fine. If I include this plugin (even if I don't reference any tasks or methods from the plugin), it hangs and just cycles on the daemon waiting to acquire a lock, acquiring a lock, then releasing the lock (including when I run the build with the --no-daemon flag specified explicitly and/or org.gradle.daemon set to false in ~/.gradle/gradle.properties.

After I had started writing this issue up, we discovered the critical factor that causes this failure. I don't know why this critical factor causes this failure, so I'm still reporting it. This plugin fails as described below whenever the user running the build is authenticated via an external repo (LDAP) and/or when the project being built is located somewhere that's mounted to the server, in our case from ZFS storage via NFS. Tomorrow I'll try to determine whether the fault is with the externally authenticated user or with the network-mounted storage location (I need some IT help to get a scenario in which I can test this set up) and add to this report.

For reference, the builds that we're having trouble with are the master branches of these two repos:

I've found a few clues that may help and tried a few things that didn't help:

  • We tried different versions of git on the system. The Jenkins server is running on CentOS 7.9, which includes git 1.8.3.1. I thought maybe upgrading git would work, but cloning the repo with git 2.17.1 didn't change anything.
  • We have the same version of Java on different servers where the build works.
  • I spun up a Vagrant VM with CentOS 7.9 (same version of git and Java) and built there with no problems.
  • The issue isn't limited to running within Jenkins: I cloned https://bitbucket.org/xnatdev/xnat-data-models and tried to build the master branch in my own home folder. Same result as running in Jenkins (hang).
  • I tried with daemon on and parallel true and with no daemon or parallel. No change.

I ran these builds both on the Jenkins server (not under Jenkins, just on the same machine) and locally with debug logging on and captured the output. You can find both the raw log files here, as well as the logs filtered down to just show lines that include references to git, palantir, or version, in order to better focus on what's actually happening with the plugin:

https://gist.github.com/rherrick/9894271ad99ba50fc405ae2a4d3d870f

The interesting difference can be seen in the difference between failing-gitversion-log and working-gitversion-log. In both cases, the plugin starts to apply the plugin to the project:

Build operation 'Apply plugin com.palantir.git-version to root project 'xnat-data-models'' started
Build operation 'Apply plugin com.palantir.gradle.gitversion.GitVersionRootPlugin to root project 'xnat-data-models'' started
Completing Build operation 'Apply plugin com.palantir.gradle.gitversion.GitVersionRootPlugin to root project 'xnat-data-models''
Build operation 'Apply plugin com.palantir.gradle.gitversion.GitVersionRootPlugin to root project 'xnat-data-models'' completed

When things work, it then proceeds:

Build operation 'Realize task :printVersion' started
Completing Build operation 'Realize task :printVersion'
Build operation 'Realize task :printVersion' completed
Completing Build operation 'Apply plugin com.palantir.git-version to root project 'xnat-data-models''
Build operation 'Apply plugin com.palantir.git-version to root project 'xnat-data-models'' completed

When things don't work, the build process just starts to cycle:

Waiting to acquire shared lock on daemon addresses registry.
Lock acquired on daemon addresses registry.
Releasing lock on daemon addresses registry.
Waiting to acquire shared lock on daemon addresses registry.
Lock acquired on daemon addresses registry.
Releasing lock on daemon addresses registry.

That waiting/acquired/releasing cycle is exactly what the Gradle daemon does when there's no work to do, but in this case it happens even when the build is run without the daemon (or at least the persistent daemon).

What did you want to happen?

Complete the build!

@rherrick
Copy link
Author

The root cause is in actually in jgit, in the class FS, or really the private internal class FileStoreAttributeCache and its constructor:

Path probe = dir.resolve(".probe-" + UUID.randomUUID()); //$NON-NLS-1$
Files.createFile(probe);
try {
    FileTime startTime = Files.getLastModifiedTime(probe);
    FileTime actTime = startTime;
    long sleepTime = 512;
    while (actTime.compareTo(startTime) <= 0) {
        TimeUnit.NANOSECONDS.sleep(sleepTime);
        FileUtils.touch(probe);
        actTime = Files.getLastModifiedTime(probe);
        // limit sleep time to max. 100ms
        if (sleepTime < 100_000_000L) {
            sleepTime = sleepTime * 2;
        }
    }
    fsTimestampResolution = Duration.between(startTime.toInstant(),
            actTime.toInstant());
} catch (AccessDeniedException e) {
    LOG.error(e.getLocalizedMessage(), e);
} finally {
    Files.delete(probe);
}

The problem is that FileUtils.touch(probe) (which just opens an output stream on the file) doesn't actually update the file's timestamp in our particular scenario. I pulled the jgit code and built from the HEAD of the master branch (there have been quite a few changes in this area of the code), ended up with version 5.12.0-SNAPSHOT of jgit, changed 0.12.3 of gradle-git-version to use that version of jgit instead of the older 5.3.2.201906051522-r it currently uses, re-built the gradle-git-version plugin, and tested that out. That fixes the problem.

So it looks like the fix is to update to some later release of jgit, although I tried with v5.11.0.202103091610-r and ended up with the same result. We're able to use version 0.12.1 of this plugin to work around the issue.

@mikethecalamity
Copy link
Contributor

Is this still a problem in 0.13.0?

@mikethecalamity
Copy link
Contributor

@rherrick I bumped up the jgit version, see if that works for you.

@mikethecalamity
Copy link
Contributor

It works for us now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants