Skip to content

Conversation

@sanposhiho
Copy link
Member

What this PR does / why we need it:

Create orphan commands cleaner. It periodically find orphan commands and mark them with TIMEOUT status

Which issue(s) this PR fixes:

Fixes #1397

Does this PR introduce a user-facing change?:

NONE

@pipecd-bot
Copy link
Collaborator

GO_LINTER

The following files are not gofmt-ed. By commenting /golinter fmt, the formatted one will be appended to this pull request automatically.

pkg/app/ops/orphancommandcleaner/orphancommandcleaner.go
--- pkg/app/ops/orphancommandcleaner/orphancommandcleaner.go.orig
+++ pkg/app/ops/orphancommandcleaner/orphancommandcleaner.go
@@ -4,9 +4,10 @@
 	"context"
 	"time"
 
+	"go.uber.org/zap"
+
 	"github.com/pipe-cd/pipe/pkg/datastore"
 	"github.com/pipe-cd/pipe/pkg/model"
-	"go.uber.org/zap"
 )
 
 var (

@sanposhiho
Copy link
Member Author

/golinter fmt

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 33.96%. This pull request does not change code coverage.

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 33.96%. This pull request does not change code coverage.

@sanposhiho sanposhiho marked this pull request as ready for review January 14, 2021 14:56
Comment on lines 14 to 15
commandTimeOut = 7 * 24 * time.Hour
interval = 24 * time.Hour
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What values should be set?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commandTimeOut = 24 * time.Hour
interval = 6 * time.Hour

@sanposhiho
Copy link
Member Author

Is there a better place to create orphan commands cleaner instead of ops?


// Starting orphan commands cleaner
cleaner := orphancommandcleaner.NewOrphanCommandCleaner(ds, t.Logger)
cleaner.Run(ctx)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be wrapped by errgroup to terminate ops when cleaner exit because of an error.

group.Go(func() error {
	return cleaner.Run(ctx)
})

Copy link
Member Author

@sanposhiho sanposhiho Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, there was no need to stop ops because errors that occur in cleaner is probably caused by a network error when communicating with datastore.

Of course, it is necessary to output the error log to show that an error has occurred. However, I don't think it is necessary to stop ops because updateOrphanCommandsStatus will include the update that should have been done with the previous process if the previous process failed.

Or how about stopping ops if it fails several times in a row?

What do you think about it? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice comment. I got your points.
Agree with you that the cleaner may have some errors while executing its jobs.

But that is the problem of cleaner, not this run function.
This run should not care about what each component is doing, it just cares that they are fine or not. If any of them had a problem, ops should be stopped to notify that there was a problem from one of its components.
In other words, run should see all of its components (cleaner, http-server, insightcollector...) in the same way.

About the network problem or something that can cause the error while cleaner executing the job,
cleaner can ignore them, just log the error message, and does not return.
By that way, the ops will not be stopped by those errors and we can keep run to be nice too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the network problem or something that can cause the error while cleaner executing the job,
cleaner can ignore them, just log the error message, and does not return.
By that way, the ops will not be stopped by those errors and we can keep run to be nice too.

OK, I see 👍

) *OrphanCommandCleaner {
return &OrphanCommandCleaner{
commandstore: datastore.NewCommandStore(ds),
logger: logger.Named("orphan-command-finder"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orphan-command-cleaner

if err := o.updateOrphanCommandsStatus(ctx); err != nil {
o.logger.Error("failed to update orphan commands", zap.Error(err))
}
time.Sleep(interval)
Copy link
Member

@nghialv nghialv Jan 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use time.Ticker instead of sleep.

}
}

func (o *OrphanCommandCleaner) Run(ctx context.Context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: func(c * because it is a cleaner.

}

func (o *OrphanCommandCleaner) updateOrphanCommandsStatus(ctx context.Context) error {
timeoutedTime := time.Now().Add(-commandTimeOut).Unix()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: timedout or just timeout

}

for _, c := range commands {
o.commandstore.UpdateCommand(ctx, c.Id, func(c *model.Command) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a log to show that the command was successfully updated or not?

@nghialv
Copy link
Member

nghialv commented Jan 15, 2021

Is there a better place to create orphan commands cleaner instead of ops?

I think the current place is fine.

@nghialv
Copy link
Member

nghialv commented Jan 15, 2021

@sanposhiho Hi, Thank you for your great effort.
I have just left some comments. Please take a look.

@pipecd-bot pipecd-bot added size/L and removed size/M labels Jan 15, 2021
@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 33.96%. This pull request does not change code coverage.


t := time.NewTicker(interval)
for {
select {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the context.Done to exit when ops was terminated.

for {
    select {
          case <- ctx.Done():
                 return nil
         case <- t.C:
             ...
}
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks

cmd.Status = model.CommandStatus_COMMAND_TIMEOUT
return nil
}); err != nil {
c.logger.Error("failed to update orphan commands", zap.Error(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: commands - > command
And can you add the command ID and Type to the log message?

c.logger.Error("failed to update orphan commands", zap.Error(err))
consecutiveFailuresCount++
if consecutiveFailuresCount == maxConsecutiveFailuresCount {
return err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I explained here, we can ignore this error.
Don't have to return. And we don't need the error log too because it is already logged inside the updateOrphanCommandsStatus function.

Btw, It would be nice if you can add a log to show the elapsed time of this execution.

@sanposhiho
Copy link
Member Author

@nghialv
Thanks for your reviews, fixed them :D

@pipecd-bot
Copy link
Collaborator

COVERAGE

Code coverage for golang is 33.96%. This pull request does not change code coverage.

@nghialv
Copy link
Member

nghialv commented Jan 15, 2021

@sanposhiho Thank you very much for your great work! ❤️
/approve

@pipecd-bot
Copy link
Collaborator

APPROVE

This pull request is APPROVED by nghialv.

Approvers can cancel the approval by writing /approve cancel in a comment. Any additional commits also will change this pull request to be not-approved.

@nghialv nghialv changed the title Create orphan commands cleaner Add ops the ability to clean orphan commands Jan 15, 2021
@pipecd-bot pipecd-bot merged commit 2f95230 into master Jan 15, 2021
@pipecd-bot pipecd-bot deleted the orphan-commands-timeout branch January 15, 2021 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Mark orphan commands with TIMEOUT status

4 participants