Skip to content

Conversation

@dharanikesav
Copy link

@dharanikesav dharanikesav commented Feb 20, 2025

When a retry block is used inside a stage and the step failed in first attempt but passed in one of the subsequent attempts, expected stage result is "Success", expected build result is "Success" but the actual stage result is "Failure", actual build result is "Success". Screenshot of this bug is attached below (check second stage in build 36900 and third stage in build 36899 ).

image

Fix implemented:
While debugging, i noticed that this failure status is returned from WarningAction condition. So, have added another condition to check run result, so that this condition does not return failure when run result is success.

Testing done

Have tested this change locally with following test cases:

  • Successful Stage without retry step
  • Failed Stage without retry step
  • Successful Stage with retry step
  • Failed Stage with retry step

below is the screenshot of stage view of local test execution
image

@dharanikesav dharanikesav requested a review from a team as a code owner February 20, 2025 09:32
@dharanikesav
Copy link
Author

Hi @jglick ,

Please review this PR

@jglick
Copy link
Member

jglick commented Mar 26, 2025

@olamy maybe. AFAIK this API is only used by Stage View, which is mostly unmaintained.

@felipecrs
Copy link

@stuartrowe, @timja, do you think Pipeline Graph View could use this fix too?

@stuartrowe
Copy link
Contributor

Maybe I'm reading the fix wrong, but it seems like it's ignoring the WarningAction if the run was a success overall. We do not want that change.

@felipecrs
Copy link

I'm very confused either.

@dharanikesav I cannot reproduce your problem:

chrome_9T1jANDxfd.mp4
pipeline {
    agent any

    stages {
        stage('Pass 1') {
            steps {
                sh 'echo pass 1'
            }
        }
        stage('Retry pass 3rd attempt') {
            steps {
                retry(3) {
                    sh '''
                        ATTEMPT_FILE=attempt.txt
                        if [ -f $ATTEMPT_FILE ]; then
                            ATTEMPT=$(cat $ATTEMPT_FILE)
                        else
                            ATTEMPT=0
                        fi

                        ATTEMPT=$((ATTEMPT + 1))
                        echo $ATTEMPT > $ATTEMPT_FILE

                        if [ $ATTEMPT -lt 3 ]; then
                            echo "Failing attempt $ATTEMPT"
                            exit 1
                        else
                            echo "Passing attempt $ATTEMPT"
                        fi
                    '''
                }
            }
        }
        stage('Retry pass 1st attempt') {
            steps {
                retry(1) {
                    sh 'echo retry pass'
                }
            }
        }
        stage('Pass final') {
            steps {
                sh 'echo pass final'
            }
        }
    }
}

As you can see in the video, the stage is being correctly marked as passing.

Unless I'm missing your issue too. Please clarify if so, providing a Jenkinsfile example would be nice.

Copy link
Member

@timja timja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs at least a pipeline reproducing / demonstrating the issue and ideally a test

@jglick
Copy link
Member

jglick commented May 14, 2025

#164 (comment) FTR

@dharanikesav
Copy link
Author

dharanikesav commented May 14, 2025

Hi @felipecrs @timja ,

Here is the pipeline script:

stage('test1') {
def b = 1
retry(3) {
try {
b = b+1
if (b < 3) {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
} else {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
}
} catch (e) {
sleep time: 1, unit: 'SECONDS'
throw e
}
}
}
stage('test2') {
// Catch error and set the build result to SUCCESS even if a failure happens
catchError(buildResult: 'SUCCESS', stageResult: 'SUCCESS') {
def a = 1
retry(3) {
try {
a = a + 1
if (a < 3) {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
} else {
build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
}
// Forcefully set the result to SUCCESS after every retry
currentBuild.result = 'SUCCESS'
echo "Stage result: ${currentBuild.result}"
echo "Build result: ${currentBuild.result}"
} catch (e) {
// Catch errors and force SUCCESS result
currentBuild.result = 'SUCCESS' // Ensure build result is marked as SUCCESS
sleep time: 1, unit: 'SECONDS'
throw e // Re-throw the error for retry logic to work
}
}
}
}
if(currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction) != null) {
List causes = currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction).getCauses()
for(CauseOfInterruption cause : causes){
if(cause.getShortDescription().toLowerCase().contains('aborted by')) {
currentBuild.result = "ABORTED"
break
}
}
}

Downstream build has one parameter and executable shell build step with following shell script:

if [ "$pass" = "true" ]; then
echo "Successful run..."
else
echo "Invalid option"
exit 1
fi

Here is the image of parameters of downstream job
image

Below is the screenshot of main pipeline. Build 93 is successful but both the steps in the pipeline are marked as failed :
image

@felipecrs
Copy link

felipecrs commented May 14, 2025

@dharanikesav I think I see what your issue is. The build step will mark your stage as failed, and it's not possible to improve the stage result once it was set as failed once.

To fix this issue, you need to set build propagate: false, and capture the result of the build and parse it yourself like:

def buildObj = build propagate: false

if (buildObj.getResult() == "SUCCESS") {
  echo "build passed"
}

Please confirm whether that's indeed the case, and if yes, you can close this PR.

@dharanikesav
Copy link
Author

Hi @felipecrs ,

When we do not propagate failures and have a custom script to throw an exception in case of failures, this works.

But, Theoretically, the plugin should display correct status of each stage. Instead of workaround, is it possible to incorporate a fix at plugin level, so that the status is shown correctly.

Thanks,
Dharani

@timja
Copy link
Member

timja commented May 15, 2025

Maybe I'm reading the fix wrong, but it seems like it's ignoring the WarningAction if the run was a success overall. We do not want that change.

That's exactly what it appears to be doing. I expect you still want to have warning actions shown when a run has passed?

Is this just a problem in stage view that doesn't differentiate between warning and failed?


Alternatively is it an issue where its looking for the worst result and not the last result?


@stuartrowe, @timja, do you think Pipeline Graph View could use this fix too?

Not sure I don't think there's any known issues around retries though, likely this is stage view only.

@felipecrs
Copy link

felipecrs commented May 15, 2025

I don't think there's any issue here.

I demonstrated that retries work as expected in my example above.

The quirk is that the build step changes the stage result directly when propagate: true, and once a stage is marked as failed, it can no longer be marked as passing.

This is by design and IIRC is documented.

@felipecrs
Copy link

Using propagate: false is not a workaround, it's the proper way to handle this.

@dharanikesav
Copy link
Author

dharanikesav commented May 15, 2025

Hi @felipecrs ,

I understand that propagate: false can be used, but it still requires handling downstream build failures and warnings through custom scripting. It does feel a bit like a workaround rather than a clean solution. If propagate: false is used without handling downstream build failures, the main pipeline is marked as SUCCESS even if there is real failure in downstream build.

What I'm really aiming for is this: if the same stage/job is retried and succeeds on the second attempt, ideally, the stage should be marked as SUCCESS. Currently, it remains marked as FAILURE. Would it make sense to revisit this behavior and see if there's room for improvement?

Thanks,
Dharani

@felipecrs
Copy link

@dharanikesav , I am just a Jenkins user like you and not anyhow a Jenkins representative, thus I cannot speak on behalf of Jenkins.

Have you checked whether the same issue happens when using the retry step?

@felipecrs
Copy link

I think you'd have a better luck discussing this issue in some other place.

In the Graph Analysis itself I don't think there's anything to be done. The stage is being marked as failed by the build step, and thus it should be shown as failed in the UI.

@timja
Copy link
Member

timja commented May 15, 2025

Graph view is affected as well, I would've expected this pipeline to have a successful second stage:

stage('test1') {
    def b = 1
    retry(3) {
        try {
            b = b + 1
            if (b < 3) {
                build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
            } else {
                build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
            }
        } catch (e) {
            throw e
        }
    }
}
stage('test2') {
// Catch error and set the build result to SUCCESS even if a failure happens
    catchError(buildResult: 'SUCCESS', stageResult: 'SUCCESS') {
        def a = 1
        retry(3) {
            try {
                a = a + 1
                if (a < 3) {
                    build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'false')], quietPeriod: 1
                } else {
                    build job: 'test_printing_ci', parameters: [string(name: 'pass', value: 'true')], quietPeriod: 1
                }
                echo "Build result: ${currentBuild.result}"
            } catch (e) {
// Catch errors and force SUCCESS result
                currentBuild.result = 'SUCCESS' // Ensure build result is marked as SUCCESS
                throw e // Re-throw the error for retry logic to work
            }
        }
        error "Reset build result"

    }
    echo "Build result: ${currentBuild.result}"
}
if (currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction) != null) {
    List causes = currentBuild.rawBuild.getAction(jenkins.model.InterruptedBuildAction).getCauses()
    for (CauseOfInterruption cause : causes) {
        if (cause.getShortDescription().toLowerCase().contains('aborted by')) {
            currentBuild.result = "ABORTED"
            break
        }
    }
}
image

I'm not confident this is the right fix though

@timja
Copy link
Member

timja commented May 15, 2025

I'm not sure if a stage result works setting it multiple times though

@felipecrs
Copy link

felipecrs commented May 15, 2025

@timja, see:

image

https://www.jenkins.io/doc/pipeline/steps/workflow-basic-steps/#catcherror-catch-error-and-set-build-result-to-failure

I believe the very same thing applies to the stage result as well.

That's why you can't let build step change the stage or build result by itself. And that's why you should use build propagate: false.

@felipecrs
Copy link

felipecrs commented May 15, 2025

The most clean "fix" I can think of is adding a retry option to the build step itself. Like:

build retry: 3

@timja
Copy link
Member

timja commented May 15, 2025

Makes sense, I thought that was the case for builds, wasn't sure on stage.


I don't think its great, but fixing it here isn't the right change.

@timja timja closed this May 15, 2025
@felipecrs
Copy link

Makes sense, I thought that was the case for builds, wasn't sure on stage.

To be fair it isn't documented under stageResult indeed.

@felipecrs
Copy link

felipecrs commented May 15, 2025

Maybe this is the cleanest solution for now:

{
  int attempt = 0
  int maxAttempts = 3
  retry(maxAttempts) {
    attempt++
    build propagate: attempt == maxAttempts
  }
}

@stuartrowe
Copy link
Contributor

Graph view is affected as well, I would've expected this pipeline to have a successful second stage:

This is a result of the behaviour of StatusAndTiming#findWorstWarningBetween finding the WarningAction on the build step node when computing the status for the test2 stage.

One option is to disable the warning action look up by setting DISABLE_WARNING_ACTION_LOOKUP=true globally.

Alternatively you can add a WarningAction to the stage node itself to take advantage of this recent change I made:
jenkinsci/pipeline-graph-view-plugin#617. Although the code for doing that is a bit convoluted and I'm currently using an internal plugin to handle it for my own pipelines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants