fix: stage status in case of retries #150

dharanikesav · 2025-02-20T09:32:46Z

When a retry block is used inside a stage and the step failed in first attempt but passed in one of the subsequent attempts, expected stage result is "Success", expected build result is "Success" but the actual stage result is "Failure", actual build result is "Success". Screenshot of this bug is attached below (check second stage in build 36900 and third stage in build 36899 ).

Fix implemented:
While debugging, i noticed that this failure status is returned from WarningAction condition. So, have added another condition to check run result, so that this condition does not return failure when run result is success.

Testing done

Have tested this change locally with following test cases:

Successful Stage without retry step
Failed Stage without retry step
Successful Stage with retry step
Failed Stage with retry step

below is the screenshot of stage view of local test execution

dharanikesav · 2025-02-20T10:10:04Z

Hi @jglick ,

Please review this PR

jglick · 2025-03-26T16:12:35Z

@olamy maybe. AFAIK this API is only used by Stage View, which is mostly unmaintained.

felipecrs · 2025-05-14T01:53:07Z

@stuartrowe, @timja, do you think Pipeline Graph View could use this fix too?

stuartrowe · 2025-05-14T02:12:07Z

Maybe I'm reading the fix wrong, but it seems like it's ignoring the WarningAction if the run was a success overall. We do not want that change.

felipecrs · 2025-05-14T02:48:42Z

I'm very confused either.

@dharanikesav I cannot reproduce your problem:

chrome_9T1jANDxfd.mp4

pipeline {
    agent any

    stages {
        stage('Pass 1') {
            steps {
                sh 'echo pass 1'
            }
        }
        stage('Retry pass 3rd attempt') {
            steps {
                retry(3) {
                    sh '''
                        ATTEMPT_FILE=attempt.txt
                        if [ -f $ATTEMPT_FILE ]; then
                            ATTEMPT=$(cat $ATTEMPT_FILE)
                        else
                            ATTEMPT=0
                        fi

                        ATTEMPT=$((ATTEMPT + 1))
                        echo $ATTEMPT > $ATTEMPT_FILE

                        if [ $ATTEMPT -lt 3 ]; then
                            echo "Failing attempt $ATTEMPT"
                            exit 1
                        else
                            echo "Passing attempt $ATTEMPT"
                        fi
                    '''
                }
            }
        }
        stage('Retry pass 1st attempt') {
            steps {
                retry(1) {
                    sh 'echo retry pass'
                }
            }
        }
        stage('Pass final') {
            steps {
                sh 'echo pass final'
            }
        }
    }
}

As you can see in the video, the stage is being correctly marked as passing.

Unless I'm missing your issue too. Please clarify if so, providing a Jenkinsfile example would be nice.

timja

needs at least a pipeline reproducing / demonstrating the issue and ideally a test

jglick · 2025-05-14T14:23:14Z

#164 (comment) FTR

dharanikesav · 2025-05-14T14:50:34Z

Hi @felipecrs @timja ,

Here is the pipeline script:

stage('test1') {
def b = 1
retry(3) {
try {
b = b+1
if (b < 3) {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
} else {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
}
} catch (e) {
sleep time: 1, unit: 'SECONDS'
throw e
}
}
}
stage('test2') {
// Catch error and set the build result to SUCCESS even if a failure happens
catchError(buildResult: 'SUCCESS', stageResult: 'SUCCESS') {
def a = 1
retry(3) {
try {
a = a + 1
if (a < 3) {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
} else {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
}
// Forcefully set the result to SUCCESS after every retry
currentBuild.result = 'SUCCESS'
echo "Stage result: ${currentBuild.result}"
echo "Build result: ${currentBuild.result}"
} catch (e) {
// Catch errors and force SUCCESS result
currentBuild.result = 'SUCCESS' // Ensure build result is marked as SUCCESS
sleep time: 1, unit: 'SECONDS'
throw e // Re-throw the error for retry logic to work
}
}
}
}
if(currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction) != null) {
List causes = currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction).getCauses()
for(CauseOfInterruption cause : causes){
if(cause.getShortDescription().toLowerCase().contains('aborted by')) {
currentBuild.result = "ABORTED"
break
}
}
}

Downstream build has one parameter and executable shell build step with following shell script:

if [ "$pass" = "true" ]; then
echo "Successful run..."
else
echo "Invalid option"
exit 1
fi

Here is the image of parameters of downstream job

Below is the screenshot of main pipeline. Build 93 is successful but both the steps in the pipeline are marked as failed :

felipecrs · 2025-05-14T15:53:46Z

@dharanikesav I think I see what your issue is. The build step will mark your stage as failed, and it's not possible to improve the stage result once it was set as failed once.

To fix this issue, you need to set build propagate: false, and capture the result of the build and parse it yourself like:

def buildObj = build propagate: false

if (buildObj.getResult() == "SUCCESS") {
  echo "build passed"
}

Please confirm whether that's indeed the case, and if yes, you can close this PR.

dharanikesav · 2025-05-15T11:14:43Z

Hi @felipecrs ,

When we do not propagate failures and have a custom script to throw an exception in case of failures, this works.

But, Theoretically, the plugin should display correct status of each stage. Instead of workaround, is it possible to incorporate a fix at plugin level, so that the status is shown correctly.

Thanks,
Dharani

timja · 2025-05-15T12:26:11Z

Maybe I'm reading the fix wrong, but it seems like it's ignoring the WarningAction if the run was a success overall. We do not want that change.

That's exactly what it appears to be doing. I expect you still want to have warning actions shown when a run has passed?

Is this just a problem in stage view that doesn't differentiate between warning and failed?

Alternatively is it an issue where its looking for the worst result and not the last result?

@stuartrowe, @timja, do you think Pipeline Graph View could use this fix too?

Not sure I don't think there's any known issues around retries though, likely this is stage view only.

felipecrs · 2025-05-15T12:38:28Z

I don't think there's any issue here.

I demonstrated that retries work as expected in my example above.

The quirk is that the build step changes the stage result directly when propagate: true, and once a stage is marked as failed, it can no longer be marked as passing.

This is by design and IIRC is documented.

felipecrs · 2025-05-15T12:39:45Z

Using propagate: false is not a workaround, it's the proper way to handle this.

dharanikesav · 2025-05-15T13:01:03Z

Hi @felipecrs ,

I understand that propagate: false can be used, but it still requires handling downstream build failures and warnings through custom scripting. It does feel a bit like a workaround rather than a clean solution. If propagate: false is used without handling downstream build failures, the main pipeline is marked as SUCCESS even if there is real failure in downstream build.

What I'm really aiming for is this: if the same stage/job is retried and succeeds on the second attempt, ideally, the stage should be marked as SUCCESS. Currently, it remains marked as FAILURE. Would it make sense to revisit this behavior and see if there's room for improvement?

Thanks,
Dharani

felipecrs · 2025-05-15T13:05:28Z

@dharanikesav , I am just a Jenkins user like you and not anyhow a Jenkins representative, thus I cannot speak on behalf of Jenkins.

Have you checked whether the same issue happens when using the retry step?

felipecrs · 2025-05-15T13:10:20Z

I think you'd have a better luck discussing this issue in some other place.

In the Graph Analysis itself I don't think there's anything to be done. The stage is being marked as failed by the build step, and thus it should be shown as failed in the UI.

timja · 2025-05-15T15:06:19Z

Graph view is affected as well, I would've expected this pipeline to have a successful second stage:

stage('test1') {
    def b = 1
    retry(3) {
        try {
            b = b + 1
            if (b < 3) {
                build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
            } else {
                build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
            }
        } catch (e) {
            throw e
        }
    }
}
stage('test2') {
// Catch error and set the build result to SUCCESS even if a failure happens
    catchError(buildResult: 'SUCCESS', stageResult: 'SUCCESS') {
        def a = 1
        retry(3) {
            try {
                a = a + 1
                if (a < 3) {
                    build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
                } else {
                    build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
                }
                echo "Build result: ${currentBuild.result}"
            } catch (e) {
// Catch errors and force SUCCESS result
                currentBuild.result = 'SUCCESS' // Ensure build result is marked as SUCCESS
                throw e // Re-throw the error for retry logic to work
            }
        }
        error "Reset build result"

    }
    echo "Build result: ${currentBuild.result}"
}
if (currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction) != null) {
    List causes = currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction).getCauses()
    for (CauseOfInterruption cause : causes) {
        if (cause.getShortDescription().toLowerCase().contains('aborted by')) {
            currentBuild.result = "ABORTED"
            break
        }
    }
}

I'm not confident this is the right fix though

timja · 2025-05-15T15:06:52Z

I'm not sure if a stage result works setting it multiple times though

felipecrs · 2025-05-15T15:23:50Z

@timja, see:

https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#catcherror-catch-error-and-set-build-result-to-failure

I believe the very same thing applies to the stage result as well.

That's why you can't let build step change the stage or build result by itself. And that's why you should use build propagate: false.

felipecrs · 2025-05-15T15:27:07Z

The most clean "fix" I can think of is adding a retry option to the build step itself. Like:

build retry: 3

timja · 2025-05-15T15:27:09Z

Makes sense, I thought that was the case for builds, wasn't sure on stage.

I don't think its great, but fixing it here isn't the right change.

felipecrs · 2025-05-15T15:28:23Z

Makes sense, I thought that was the case for builds, wasn't sure on stage.

To be fair it isn't documented under stageResult indeed.

felipecrs · 2025-05-15T15:32:17Z

Maybe this is the cleanest solution for now:

{
  int attempt = 0
  int maxAttempts = 3
  retry(maxAttempts) {
    attempt++
    build propagate: attempt == maxAttempts
  }
}

stuartrowe · 2025-05-15T17:28:30Z

Graph view is affected as well, I would've expected this pipeline to have a successful second stage:

This is a result of the behaviour of StatusAndTiming#findWorstWarningBetween finding the WarningAction on the build step node when computing the status for the test2 stage.

One option is to disable the warning action look up by setting DISABLE_WARNING_ACTION_LOOKUP=true globally.

Alternatively you can add a WarningAction to the stage node itself to take advantage of this recent change I made:
jenkinsci/pipeline-graph-view-plugin#617. Although the code for doing that is a bit convoluted and I'm currently using an internal plugin to handle it for my own pipelines.

fix: stage status in case of retries

e7e69ca

dharanikesav requested a review from a team as a code owner February 20, 2025 09:32

dharanikesav added 2 commits March 12, 2025 06:13

Merge branch 'master' into master

3c19096

Merge branch 'master' into master

d68972b

timja requested changes May 14, 2025

View reviewed changes

timja closed this May 15, 2025

fix: stage status in case of retries #150

fix: stage status in case of retries #150

Uh oh!

Conversation

dharanikesav commented Feb 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Testing done

Uh oh!

dharanikesav commented Feb 20, 2025

Uh oh!

jglick commented Mar 26, 2025

Uh oh!

felipecrs commented May 14, 2025

Uh oh!

stuartrowe commented May 14, 2025

Uh oh!

felipecrs commented May 14, 2025

Uh oh!

timja left a comment

Choose a reason for hiding this comment

Uh oh!

jglick commented May 14, 2025

Uh oh!

dharanikesav commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipecrs commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dharanikesav commented May 15, 2025

Uh oh!

timja commented May 15, 2025

Uh oh!

felipecrs commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipecrs commented May 15, 2025

Uh oh!

dharanikesav commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipecrs commented May 15, 2025

Uh oh!

felipecrs commented May 15, 2025

Uh oh!

timja commented May 15, 2025

Uh oh!

timja commented May 15, 2025

Uh oh!

felipecrs commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felipecrs commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

timja commented May 15, 2025

Uh oh!

felipecrs commented May 15, 2025

Uh oh!

felipecrs commented May 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stuartrowe commented May 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dharanikesav commented Feb 20, 2025 •

edited

Loading

dharanikesav commented May 14, 2025 •

edited

Loading

felipecrs commented May 14, 2025 •

edited

Loading

felipecrs commented May 15, 2025 •

edited

Loading

dharanikesav commented May 15, 2025 •

edited

Loading

felipecrs commented May 15, 2025 •

edited

Loading

felipecrs commented May 15, 2025 •

edited

Loading

felipecrs commented May 15, 2025 •

edited

Loading