Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline is stuck on Recording test results #653

Open
hiddenshadow21 opened this issue Oct 4, 2024 · 4 comments
Open

Pipeline is stuck on Recording test results #653

hiddenshadow21 opened this issue Oct 4, 2024 · 4 comments

Comments

@hiddenshadow21
Copy link

Jenkins and plugins versions report

Environment
Jenkins: 2.462.2
OS: Windows Server 2022 - 10.0
Java: 21.0.3 - Oracle Corporation (Java HotSpot(TM) 64-Bit Server VM)
---
ace-editor:1.1
ant:511.v0a_a_1a_334f41b_
antisamy-markup-formatter:162.v0e6ec0fcfcf6
apache-httpcomponents-client-4-api:4.5.14-208.v438351942757
asm-api:9.7-33.v4d23ef79fcc8
authentication-tokens:1.119.v50285141b_7e1
aws-credentials:231.v08a_59f17d742
aws-java-sdk-ec2:1.12.767-467.vb_e93f0c614b_6
aws-java-sdk-minimal:1.12.767-467.vb_e93f0c614b_6
bootstrap4-api:4.6.0-6
bootstrap5-api:5.3.3-1
bouncycastle-api:2.30.1.78.1-248.ve27176eb_46cb_
branch-api:2.1178.v969d9eb_c728e
build-timeout:1.33
caffeine-api:3.1.8-133.v17b_1ff2e0599
checks-api:2.2.1
cloudbees-bitbucket-branch-source:888.v8e6d479a_1730
cloudbees-folder:6.955.v81e2a_35c08d3
command-launcher:115.vd8b_301cc15d0
commons-compress-api:1.26.1-2
commons-lang3-api:3.17.0-84.vb_b_938040b_078
commons-text-api:1.12.0-129.v99a_50df237f7
config-file-provider:978.v8e85886ffdc4
configurationslicing:548.ve92d48e66b_f8
copyartifact:749.vfb_dca_a_9b_6549
credentials:1380.va_435002fa_924
credentials-binding:681.vf91669a_32e45
data-tables-api:2.1.6-1
display-url-api:2.204.vf6fddd8a_8b_e9
docker-commons:443.v921729d5611d
docker-workflow:580.vc0c340686b_54
dtkit-api:3.0.2
durable-task:577.v2a_8a_4b_7c0247
ec2:1688.v8c07e01d657f
echarts-api:5.5.1-1
eddsa-api:0.3.0-4.v84c6f0f4969e
email-ext:1844.v3ea_a_b_842374a_
emailext-template:1.5
envinject:2.919.v009a_a_1067cd0
envinject-api:1.199.v3ce31253ed13
extended-read-permission:53.v6499940139e5
external-monitor-job:215.v2e88e894db_f8
favorite:2.221.v19ca_666b_62f5
file-operations:266.v9d4e1eb_235b_a_
font-awesome-api:6.6.0-2
ghprb:1.42.2
git:5.5.1
git-client:5.0.0
git-server:126.v0d945d8d2b_39
github:1.40.0
github-api:1.321-468.v6a_9f5f2d5a_7e
github-branch-source:1797.v86fdb_4d57d43
gradle:2.13
greenballs:1.15.1
gson-api:2.11.0-41.v019fcf6125dc
handlebars:3.0.8
handy-uri-templates-2-api:2.1.8-30.v7e777411b_148
htmlpublisher:1.36
http_request:1.19
instance-identity:201.vd2a_b_5a_468a_a_6
ionicons-api:74.v93d5eb_813d5f
jackson2-api:2.17.0-379.v02de8ec9f64c
jakarta-activation-api:2.1.3-1
jakarta-mail-api:2.1.3-1
javadoc:280.v050b_5c849f69
javax-activation-api:1.2.0-7
javax-mail-api:1.6.2-10
jaxb:2.3.9-1
jdk-tool:80.v8a_dee33ed6f0
jersey2-api:2.44-151.v6df377fff741
jira:3.13
jjwt-api:0.11.5-112.ve82dfb_224b_a_d
jnr-posix-api:3.1.19-2
jobConfigHistory:1268.v75ce751da_911
joda-time-api:2.13.0-85.vb_64d1c2921f1
jquery:1.12.4-1
jquery-detached:1.2.1
jquery3-api:3.7.1-2
jsch:0.2.16-86.v42e010d9484b_
json-api:20240303-41.v94e11e6de726
json-path-api:2.9.0-58.v62e3e85b_a_655
junit:1302.va_b_878c32eb_b_5
ldap:725.v3cb_b_711b_1a_ef
locale:519.v4e20f313cfa_f
lockable-resources:1315.v4ea_8e5159ec8
mailer:488.v0c9639c1a_eb_3
mapdb-api:1.0.9-40.v58107308b_7a_7
matrix-auth:3.2.2
matrix-project:838.v4d7b_7b_f9b_d4b_
maven-plugin:3.23
mercurial:1260.vdfb_723cdcc81
metrics:4.2.21-451.vd51df8df52ec
mina-sshd-api-common:2.13.2-125.v200281b_61d59
mina-sshd-api-core:2.13.2-125.v200281b_61d59
momentjs:1.1.1
monitoring:1.99.0
msbuild:1.35
mstest:1.0.5
node-iterator-api:55.v3b_77d4032326
nodejs:1.6.2
nodelabelparameter:1.12.0
nunit:485.ve8a_85357320d
okhttp-api:4.11.0-172.vda_da_1feeb_c6e
pam-auth:1.11
parameterized-scheduler:277.v61a_4b_a_49a_c5c
performance:962.v95a_4913d332e
perfpublisher:8.09
pipeline-build-step:540.vb_e8849e1a_b_d8
pipeline-github-lib:61.v629f2cc41d83
pipeline-graph-analysis:216.vfd8b_ece330ca_
pipeline-groovy-lib:730.ve57b_34648c63
pipeline-input-step:495.ve9c153f6067b_
pipeline-milestone-step:119.vdfdc43fc3b_9a_
pipeline-model-api:2.2214.vb_b_34b_2ea_9b_83
pipeline-model-definition:2.2214.vb_b_34b_2ea_9b_83
pipeline-model-extensions:2.2214.vb_b_34b_2ea_9b_83
pipeline-rest-api:2.34
pipeline-stage-step:312.v8cd10304c27a_
pipeline-stage-tags-metadata:2.2214.vb_b_34b_2ea_9b_83
pipeline-stage-view:2.34
pipeline-utility-steps:2.17.0
plain-credentials:183.va_de8f1dd5a_2b_
plugin-util-api:5.1.0
popper-api:1.16.1-3
popper2-api:2.11.6-5
powershell:2.1
pubsub-light:1.18
resource-disposer:0.24
role-strategy:743.v142ea_b_d5f1d3
scm-api:696.v778d637b_a_762
script-security:1362.v67dc1f0e1b_b_3
shelve-project-plugin:3.2
simple-queue:1.4.4
simple-theme-plugin:196.v96d9592f4efa_
slack:741.v00f9591c586d
snakeyaml-api:2.3-123.v13484c65210a_
sse-gateway:1.27
ssh-agent:376.v8933585c69d3
ssh-credentials:343.v884f71d78167
ssh-slaves:2.973.v0fa_8c0dea_f9f
sshd:3.330.vc866a_8389b_58
structs:338.v848422169819
subversion:1275.va_7b_014f3fc2c
test-stability:2.4
thinBackup:2.1.1
timestamper:1.27
token-macro:400.v35420b_922dcb_
trilead-api:2.147.vb_73cc728a_32e
uno-choice:2.8.3
variant:60.v7290fc0eb_b_cd
windows-slaves:1.8.1
workflow-aggregator:600.vb_57cdd26fdd7
workflow-api:1336.vee415d95c521
workflow-basic-steps:1058.vcb_fc1e3a_21a_9
workflow-cps:3969.vdc9d3a_efcc6a_
workflow-cps-global-lib:612.v55f2f80781ef
workflow-durable-task-step:1371.vb_7cec8f3b_95e
workflow-job:1436.vfa_244484591f
workflow-multibranch:795.ve0cb_1f45ca_9a_
workflow-scm-step:427.v4ca_6512e7df1
workflow-step-api:678.v3ee58b_469476
workflow-support:926.v9f4f9b_b_98c19
ws-cleanup:0.46
xunit:3.1.5

What Operating System are you using (both controller, and any agents involved in the problem)?

Both on Windows Server 2022

Reproduction steps

  1. Setup jenkins controller with junit plugin nad global env vars:
  1. Setup jenkins agent on windows server 2022 with:
  • ~50GB free of disk space
  • latest .Net 8 sdk
  • git
  • trx2junit on agent:
    dotnet tool install -g trx2junit
    
  • label windows-agent
  1. Run pipeline on agent:
def runTestSuite(String testSuiteName) {
    script {
        try {
            command = "dotnet test"
            command += " --configuration Release"
            command += " --no-build"
            command += " --blame-hang"
            command += " --blame-hang-dump-type ${env.BLAME_HANG_DUMP_TYPE}"
            command += " --blame-hang-timeout ${env.TEST_BLAME_HANG_TIMEOUT}"
            command += " --logger \"trx;verbosity=detailed\""
            command += " --logger \"console;verbosity=detailed\""
            command += " --results-directory \"./\""
	    
            bat command
        }
        catch (Exception e) {
            echo e.toString()
        }
        finally {
            bat 'trx2junit .\\*.trx'
            try {
                junit testResults: '*.xml', testDataPublishers: [[$class: 'StabilityTestDataPublisher']]
            }
            catch(Exception e){
                echo e.toString()
                currentBuild.result = "FAILURE"
            }
        }
    }
}

pipeline {
    agent {
        label 'windows-agent'
    }

    options {
        timeout(time: 4, unit: 'HOURS')
        timestamps()
    }

    environment {
        repoUrl = 'https://github.com/ravendb/ravendb.git'
        sourceCodeDir = 's'
    }

    stages {
        stage ('Clone') {
            steps {
                dir (env.sourceCodeDir) {
                    script {
                        bat "git clone ${env.repoUrl} ."
                    }
                }
            }
        }

        stage ('Tests') {
            steps {
                script {
                    testsFailed = false
                }
                
                dir (env.sourceCodeDir) {
                    script {
                        bat "git clean -f -x -d"
                        bat "dotnet restore"
                        bat "dotnet build -c Release --no-restore"
                        
                        dir("test\\SlowTests") {
                            runTestSuite("SlowTests")
                        }
                    }
                }
            }
        }
    }
}

Expected Results

Pipeline goes through and reports tests.

Actual Results

Pipeline is stuck on Recording Test Results:

16:23:20  [Pipeline] bat
16:23:20  
16:23:20  c:\Jenkins\workspace\PR_Tests\s\test\SlowTests>trx2junit .\*.trx 
16:23:20  trx2junit (c) gfoidl -- v2.0.4
16:23:20  https://github.com/gfoidl/trx2junit
16:23:20  
16:23:20  Converting 1 trx file(s) to JUnit-xml...
16:23:20  Converting '.\test-user_test-host_2024-10-02_05_58_22.trx' to '.\test-user_test-host_2024-10-02_05_58_22.xml'
16:23:23  done in 2.3058151 seconds. bye.
16:23:23  [Pipeline] junit
16:23:23  Recording test results
18:38:14  Cancelling nested steps due to timeout

Anything else?

Attaching problematic xml file.
Just a side note - disk space is reclaimed after dotnet test run is finished.

test-user_test-host_2024-10-02_05_58_22.zip

Are you interested in contributing a fix?

No response

@timja
Copy link
Member

timja commented Oct 4, 2024

Strange the test file is tiny and it works fine for me with no delay (on Mac).

Are you able to get a few thread dumps while this is ongoing? Both on the controller and the agent its running on please?
https://www.jenkins.io/doc/book/troubleshooting/obtaining-a-thread-dump/

@hiddenshadow21
Copy link
Author

The issue just happened again on 3 windows agents, attaching thread dumps from controller and agents.

jenkins-thread-dumps.zip

@mdealer
Copy link
Contributor

mdealer commented Oct 24, 2024

I got curious, so on my side I tried this

def c = { target ->
    node('WINDOWS') {
        def r = junit(keepTestNames: true, testResults: "test-user_test-host_2024-10-02_05_58_22.xml", testDataPublishers: [attachments(), [$class: 'StabilityTestDataPublisher']])
        echo "${r.totalCount} ${r.failCount}"
    }
}
def p1c = { c('p1') }
def p2c = { c('p2') }
def m = [ 'p1': p1c, 'p2': p2c ]
parallel m

And got this in one of the builds

java.util.ConcurrentModificationException
	at java.base/java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1486)
	at java.base/java.util.TreeMap$ValueIterator.next(TreeMap.java:1531)
	at java.base/java.util.AbstractCollection.addAll(AbstractCollection.java:335)
	at java.base/java.util.HashSet.<init>(HashSet.java:121)
	at hudson.tasks.test.TestObject.uniquifyName(TestObject.java:395)
	at hudson.tasks.junit.PackageResult.getSafeName(PackageResult.java:87)
	at hudson.tasks.test.TestObject.getId(TestObject.java:87)
	at hudson.tasks.test.TestObject.getId(TestObject.java:91)
	at hudson.tasks.test.TestObject.getId(TestObject.java:91)
	at de.esailors.jenkins.teststability.StabilityTestData.getTestAction(StabilityTestData.java:61)
	at hudson.tasks.junit.TestResultAction.getActions(TestResultAction.java:378)
	at hudson.tasks.test.TestObject.getTestActions(TestObject.java:248)
	at hudson.tasks.test.TestObject.getTestAction(TestObject.java:261)
	at de.esailors.jenkins.teststability.StabilityTestDataPublisher.getPreviousHistory(StabilityTestDataPublisher.java:133)
	at de.esailors.jenkins.teststability.StabilityTestDataPublisher.contributeTestData(StabilityTestDataPublisher.java:82)
	at hudson.tasks.junit.JUnitResultArchiver.parseAndSummarize(JUnitResultArchiver.java:325)
	at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:62)
	at hudson.tasks.junit.pipeline.JUnitResultsStepExecution.run(JUnitResultsStepExecution.java:27)
	at org.jenkinsci.plugins.workflow.steps.SynchronousNonBlockingStepExecution.lambda$start$0(SynchronousNonBlockingStepExecution.java:47)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)

I checked the code and the existing operations around accessing the history while populating it are unsafe but also very slow the way StabilityTestDataPublisher is doing it. I then ran even more parallel parallel builds and they do appear stuck eventually, although they do finish after a long time.

Try adding a lock('result-recording-lock') { timeout(15) { ... } } around the junit step. The bottleneck is around the Map UNIQUIFIED_NAMES, which is synchronized anyway, being my initial assumption after a quick glance-over.

@hiddenshadow21
Copy link
Author

Thanks for the analysis. I will try with lock and timeout there. Hope it will resolve this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants