Skip to content

Conversation

@ckairen
Copy link
Contributor

@ckairen ckairen commented Mar 16, 2023

closes #5361

@ckairen ckairen added Central-EngSys This issue is owned by the Engineering System team. Stress This issue is related to stress testing, part of our reliability pillar. labels Mar 16, 2023
@ckairen ckairen self-assigned this Mar 16, 2023
@ckairen ckairen marked this pull request as ready for review March 16, 2023 22:08
@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

# Renders chart templates locally without deployment
[Parameter(Mandatory=$False)][switch]$Template,

[Parameter(Mandatory=$False)][switch]$RerunFailedJobs,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know all the wording in the original issue says "Rerun" but I'm thinking "Retry" is a better choice here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it would be retrying failed pods, not jobs. Should we call it something like RetryFailedTests?

$pods = kubectl get pods -n $pkg.namespace -o json | ConvertFrom-Json

# Get all jobs within this helm release
$helmResources = helm status -n $pkg.Namespace $pkg.ReleaseName --show-resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here with an example of the output would be nice.

foreach ($helmResource in $helmResources) {
if ($discoveredJob -and $helmResource -match "==>") {break}
if ($discoveredJob) {
$jobs += ($helmResource -split '\s+')[0] | where{($_ -ne "NAME") -and ($_)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$jobs += ($helmResource -split '\s+')[0] | where{($_ -ne "NAME") -and ($_)}
$jobs += ($helmResource -split '\s+')[0] | Where-Object {($_ -ne "NAME") -and ($_)}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the advantage of doing so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"where" seems to only work on arrays & will return empty array instead of null when no matches were found

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We favor using full cmdlet names to reduce confusion (see explanation). For example, where as it is used here is an alias for Where-Object and is specifically different from .Where() which I think you are describing?

# Included packages: https://github.com/microsoft/CBL-Mariner/blob/1.0/SPECS/core-packages/core-packages.spec

ADD ./poll.sh /poll.sh
RUN tdnf -y install wget
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh yay thank you

}

if (!$genValRerun.scenarios.length) {
Write-Host "There are no failed jobs to rerun."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here re: naming

$genVal = Get-Content $genValFile -Raw | ConvertFrom-Yaml -Ordered
$releaseName = $pkg.ReleaseName
if ($RerunFailedJobs) {
$pods = kubectl get pods -n $pkg.namespace -o json | ConvertFrom-Json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this into its own helper function? That will make the outer function smaller and it will be easier to tell which variables are set/updated from this scope that are important for the remaining function code.

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

$genVal = Get-Content $genValFile -Raw | ConvertFrom-Yaml -Ordered
$releaseName = $pkg.ReleaseName
if ($RetryFailedTests) {
runRetryTests $pkg ([ref]$releaseName) ([ref]$genVal)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just return new values instead of using refs here, to reduce future bug possibilities and make the code clearer.

Suggested change
runRetryTests $pkg ([ref]$releaseName) ([ref]$genVal)
$releaseName, $genVal = runRetryTests $pkg $releaseName $genVal

Also the naming here is still confusing. runRetryTests doesn't actually run anything, it just generates the new $genVal value. $genVal is also still a non-obvious name for this variable, maybe something more along these lines:

$genValFile -> $generatedHelmValuesFilePath
$genVal -> $generatedHelmValues

$pods = kubectl get pods -n $pkg.namespace -o json | ConvertFrom-Json

# Get all jobs within this helm release
$helmResources = helm status -n $pkg.Namespace $pkg.ReleaseName --show-resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$helmResources = helm status -n $pkg.Namespace $pkg.ReleaseName --show-resources
$helmResources = RunOrExitOnFailure helm status -n $pkg.Namespace $pkg.ReleaseName --show-resources

Comment on lines 411 to 419
foreach ($helmResource in $helmResources) {
if ($discoveredJob -and $helmResource -match "==>") {break}
if ($discoveredJob) {
$jobs += ($helmResource -split '\s+')[0] | Where-Object {($_ -ne "NAME") -and ($_)}
}
if ($helmResource -match "==> v1/Job") {
$discoveredJob = $True
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another suggestion for clearer variable names:

Suggested change
foreach ($helmResource in $helmResources) {
if ($discoveredJob -and $helmResource -match "==>") {break}
if ($discoveredJob) {
$jobs += ($helmResource -split '\s+')[0] | Where-Object {($_ -ne "NAME") -and ($_)}
}
if ($helmResource -match "==> v1/Job") {
$discoveredJob = $True
}
}
foreach ($line in $helmStatusOutput) {
if ($discoveredJob -and $line -match "==>") {break}
if ($discoveredJob) {
$jobs += ($line -split '\s+')[0] | Where-Object {($_ -ne "NAME") -and ($_)}
}
if ($line -match "==> v1/Job") {
$discoveredJob = $True
}
}

$revision = $jobRevision
}

$podPhase = kubectl describe jobs -n $pkg.Namespace $job | Select-String "0 Failed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$podPhase = kubectl describe jobs -n $pkg.Namespace $job | Select-String "0 Failed"
$jobOutput = RunOrExitOnFailure kubectl describe jobs -n $pkg.Namespace $job
$podPhase = $jobOutput | Select-String "0 Failed"

}

function runRetryTests ($pkg, [ref]$releaseName, [ref]$genVal) {
$pods = kubectl get pods -n $pkg.namespace -o json | ConvertFrom-Json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
$pods = kubectl get pods -n $pkg.namespace -o json | ConvertFrom-Json
$podOutput = RunOrExitOnFailure kubectl get pods -n $pkg.namespace -o json
$pods = $podOutput | ConvertFrom-Json

@benbp
Copy link
Member

benbp commented Mar 23, 2023

Just some exit handling and naming comments otherwise this is looking good. I think because the variables are untyped and getting used across so much code we need to be extra careful that the variable names describe their underlying values accurately.

@@ -0,0 +1,16 @@

function determineRetryTests ([ref] $val, [ref]$val2) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh no, accident, it's my testing ps script

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@azure-sdk
Copy link
Collaborator

The following pipelines have been queued for testing:
java - template
java - template - tests
js - template
net - template
net - template - tests
python - template
python - template - tests
You can sign off on the approval gate to test the release stage of each pipeline.
See eng/common workflow

@ghost
Copy link

ghost commented Mar 28, 2023

Hello @azure-sdk!

Because this pull request has the auto-merge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit f6e7ccd into main Mar 28, 2023
@ghost ghost deleted the albertcheng/stress-rerun branch March 28, 2023 16:41
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Central-EngSys This issue is owned by the Engineering System team. Stress This issue is related to stress testing, part of our reliability pillar.

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

[stress] Allow for re-running only failed stress tests

3 participants