Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep looking for master process #617

Merged
merged 15 commits into from
Mar 27, 2024
Merged

Keep looking for master process #617

merged 15 commits into from
Mar 27, 2024

Conversation

spencerugbo
Copy link
Contributor

@spencerugbo spencerugbo commented Mar 25, 2024

Proposed changes

Continuously searches for master process

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING document
  • I have run make install-tools and have attached any dependency changes to this pull request
  • If applicable, I have added tests that prove my fix is effective or that my feature works
  • If applicable, I have checked that any relevant tests pass after adding my changes
  • If applicable, I have updated any relevant documentation (README.md)
  • If applicable, I have tested my cross-platform changes on Ubuntu 22, Redhat 8, SUSE 15 and FreeBSD 13

Copy link

netlify bot commented Mar 25, 2024

Deploy Preview for agent-public-docs canceled.

Name Link
🔨 Latest commit f72e5ca
🔍 Latest deploy log https://app.netlify.com/sites/agent-public-docs/deploys/6603fb9a69bc7a0008d1c4f1

@codecov-commenter
Copy link

codecov-commenter commented Mar 26, 2024

Codecov Report

Attention: Patch coverage is 65.71429% with 12 lines in your changes are missing coverage. Please review.

Project coverage is 66.02%. Comparing base (71ae09c) to head (f72e5ca).
Report is 32 commits behind head on main.

Files Patch % Lines
...hub.com/nginx/agent/v2/src/plugins/registration.go 65.71% 10 Missing and 2 partials ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #617      +/-   ##
==========================================
- Coverage   66.13%   66.02%   -0.12%     
==========================================
  Files         118      118              
  Lines       13461    13504      +43     
==========================================
+ Hits         8903     8916      +13     
- Misses       3958     3987      +29     
- Partials      600      601       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -29,6 +29,7 @@ const (
dataplaneSoftwareDetailsMaxWaitTime = time.Duration(5 * time.Second)
// Time between attempts to gather DataplaneSoftwareDetails
softwareDetailsOperationInterval = time.Duration(1 * time.Second)
nginxStartMaxWaitTime = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this not essentially turn off backoff altogether, if the MaxElapsedTime is zero? If that is intentional, then I would maybe adding a comment explaining why it is turned off.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the max wait time is set to 0 backoff will continue waiting indefinitely until the master process is found. I'll add in a comment now

@@ -93,13 +94,16 @@ func (r *OneTimeRegistration) Process(msg *core.Message) {
defer r.dataplaneSoftwareDetailsMutex.Unlock()
r.dataplaneSoftwareDetails[data.GetPluginName()] = data.GetDataplaneSoftwareDetails()
}
case msg.Exact(core.NginxDetailProcUpdate):
r.processes = msg.Data().([]*core.Process)
Copy link
Contributor

@olli-holmala olli-holmala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a type check here, to ensure that the payload of the message is of the correct type (see the previous case).

If the payload were to contain something other than []*core.Process, then the code would panic, which is not great. Even though it should never have anything else than a []*core.Process based on what the code looks like today, you never know how the code base will change in the future. As a rule of thumb, it's always good practice to perform type checking to ensure safe code.

}
} else {
log.Tracef("NGINX non-master process: %d", proc.Pid)
return nil
Copy link
Contributor

@olli-holmala olli-holmala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This return causes the whole function to exit, meaning that the function will only ever find the master process if it is the first process in r.processes. I would remove this return.

// Reading nginx config during registration to populate nginx fields like access/error logs, etc.
_, err := r.binary.ReadConfig(nginxDetails.GetConfPath(), nginxDetails.NginxId, r.env.GetSystemUUID())
if err != nil {
log.Warnf("Unable to read config for NGINX instance %s, %v", nginxDetails.NginxId, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that this line isn't originally your code, but I'm wondering if we should actually be returning the error here instead of merely logging it 🤔

@oliveromahony, @dhurley, @Dean-Coakley, @aphralG thoughts?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the reason we dont return an error here is because we dont want to stop registration just because we cant read the config. If the config is invalid it should not stop the agent from performing other tasks like reporting the status of NGINX is that makes sense.

context.Background(), backoffSetting, findMaster,
)
if err != nil {
log.Warn(err.Error())
Copy link
Contributor

@olli-holmala olli-holmala Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was previously:

    log.Info("No master process found")

But my instinct is that this should be an error log, as I'd think we can't complete registration if we haven't found a master process.

Others could also weigh in on this, as I wouldn't consider myself an expert on registration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I believe previously we would continue with registration even if we didn't find master processes but now that we are waiting until we find one I think it should be an error log message.

log.Errorf("Unable to find NGINX master processes, %v", err)

Multiplier: backoff.BACKOFF_MULTIPLIER,
}

findMaster := func() error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
findMaster := func() error {
findNginxMasterProcesses := func() error {

@spencerugbo spencerugbo merged commit 43a80a0 into main Mar 27, 2024
29 checks passed
@spencerugbo spencerugbo deleted the Fix-Agent-Launch-Systemd branch March 27, 2024 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore Pull requests for routine tasks dependencies
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants