Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to support debugging haproxy reload not loading correct config #123

Merged
merged 2 commits into from
Mar 20, 2018

Conversation

westse
Copy link

@westse westse commented Mar 16, 2018

  • Under high load situations (hundreds of services, certs, and ingress rules), we have seen infrequent but consistent cases of haproxy not reflecting the on-disk config file. One theory is a race condition on new haproxy processes pointing to an ever-changing config file. This commit gives each haproxy process its own config file that does not change. Not only should this remove the risk of loading a partially written file, it will also help debug the issue because we see a snapshot of the config used for each haproxy process.
  • Config files are left on disk up to a max threshold, configurable via environment variable, to allow debugging issues.
  • Not supported under with reloadStrategy=multibinder since multibinder-haproxy-wrapper currently only supports receiving a USR2 signal to reload a fixed config file

- Under high load situations (hundreds of services, certs, and ingress rules), we have seen infrequent but consistent cases of haproxy not reflecting the on-disk config file. On theory is a race condition on new haproxy processes pointing to an ever-changing config file. This commit gives each haproxy process its own config file that does not change. Not only should this remove the risk of loading a partially written file, it will also help debug the issue because we see a snapshot of the config used for each haproxy process.
- Config files are left on disk up to a max threshold, configurable via environment variable, to allow debugging issues.
- Not supported under with reloadStrategy=multibinder since multibinder-haproxy-wrapper currently only supports receiving a USR2 signal to reload a fixed config file
}
return err
}

// RewriteConfigFiles safely replaces configuration files with new contents after validation
func rewriteConfigFiles(data []byte, reloadStrategy, configFile string) error {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that you are rewriting the multibinder and non-multibinder usages here, which concerns me mostly due to fear of change than anything else. Curious, did you consider the following flow instead and why would it be a bad approach?

func rewriteConfigFiles(data []byte, reloadStrategy, configFile string) error {
  // Determine temp file path using a timestamp
  // Write rendered template file there
  // if reload strategy == multibinder then do multibinder logic for moving it into place
  // else do non-multibinder work with config file
  // copy temp file to location for later investigation if required
  // rename file to the /etc/haproxy/haproxy.cfg
}

Seems to me that this approach would minimize the amount of code you would need to touch and still give you a rendered template somewhere for investigation? Also unsure about where in that flow a cleanup step would be most efficient.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm following your pseudo code, wouldn't the last step prevent each haproxy process from pointing to a unique timestamped config file? That would defeat the primary changed intended by this pull request (for the non-multibinder usage).

I agree with the sentiment to touch only the code that needs to change. I initially tried this but once I understood what it did I felt some bits of logic were confusing enough that it would be improved with what you see here. For example, I found it suprising that the non-multibinder case used the post-rename config file, while the multibinder case did not. I think this change makes the behavior in each usage more clear, but I may be wrong and am definitely open.

Thanks for taking a look.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the difference in approach now. My hope was that all reload strategies would be capable of at least storing copies of their config at reload time (by copying the file off before putting it into place), but understand that is simply adding files that provide limited value. Look forward to verifying this code when the new version gets built and tagged!

Copy link
Owner

@jcmoraisjr jcmoraisjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some new tests and can confirm that everything related to reload works synchronously - from ingress events, config generation and also HAProxy reload, which only fork the process to the background after parses the config file.

There is only one issue here afaics: multibinder uses signal to restart the process which works asynchronously. Perhaps your use case was using multibinder? This could be patched eg at reload.sh.

Anyway I'm fine with this PR because one could want to have old config files for eg debugging his environment.

Following some changing suggestions, have a look if you agree.

haproxy.configDir = "/etc/haproxy"
haproxy.configFilePrefix = "haproxy"
haproxy.configFileSuffix = ".cfg"
haproxy.template = newTemplate("haproxy.tmpl", "/etc/haproxy/template/haproxy.tmpl")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use newTemplate as the else part of the if multibinder statement.

envVar := os.Getenv("MAX_OLD_CONFIG_FILES")
if maxOldConfigFiles, err := strconv.Atoi(envVar); err == nil {
haproxy.maxOldConfigFiles = maxOldConfigFiles
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a command-line argument instead, declare at ConfigureFlags func just below reloadStrategy.

I'd also like to see a zero option, which means no old backup config should be used. I have some manual testing scenarios where I need a known filename, perhaps others could have similar issues.

@@ -166,56 +181,87 @@ func (haproxy *HAProxyController) OnUpdate(cfg ingress.Configuration) error {
return nil
}

out, err := haproxy.reloadHaproxy()
reloadCmd := haproxy.reloadHaproxy(configFile)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reloadHaproxy func could be removed and it's single line placed here.

func (haproxy *HAProxyController) rewriteConfigFiles(data []byte) (string, error) {
// Include timestamp in config file name to aid troubleshooting. When using a single, ever-changing config file it
// was difficult to know what config was loaded by any given haproxy process
timestamp := time.Now().Format("-060102-150405.0000")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about four digits to the year and three to the milisseconds?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcmoraisjr Do you mean a formate like -0102-150405.000, where year and nanoseconds are lost but month, day, hour, second, millisecond are retained? I think that will be fine since there is low risk of overwriting based on either of those.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean following this convention: -20060102-150405.000.

}
}
err = os.Rename(tmpf, configFile)

haproxy.removeOldConfigFiles(haproxy.configFileSuffix)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeOldCOnfigFiles func has already access to this argument.

if err != nil {
glog.Warningln("Error updating config file")
return err
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error isn't being used, so log a warningf("...: %v", err) here and return without an argument

@westse
Copy link
Author

westse commented Mar 19, 2018

@jcmoraisjr I've incorporated each item you mentioned, except the timestamp format (see my question). So let me know how it looks. Thanks.

@westse westse force-pushed the per-process-config branch 2 times, most recently from 2e24dd3 to 9e43c6a Compare March 19, 2018 23:17
@westse
Copy link
Author

westse commented Mar 19, 2018

Ok, I've fixed the timestamp format now as well.

@jcmoraisjr
Copy link
Owner

Just a few more issues:

  • please use zero old files as default
  • check for negative numbers, converting them to zero
  • you should use the pointer itself on flags.Int() otherwise the value won't be updated - see reloadStrategy usage
  • when zero is used the idea is to leave the timestamp empty - it's actually filling the timestamp
  • removeOld shouldn't be used if maxOldFiles is zero - it's removing the actual config file.

@westse
Copy link
Author

westse commented Mar 20, 2018

Oops, I forgot to implement the "zero means no timestamp" part. Good catches - the code has been updated. Also feel free to tweak as you see fit.

Copy link
Owner

@jcmoraisjr jcmoraisjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more issues and I think we're done

`Maximum old haproxy timestamped config files to allow before being cleaned up. A value <= 0 indicates a single non-timestamped config file will be used`)
if *haproxy.maxOldConfigFiles < 0 {
*haproxy.maxOldConfigFiles = 0
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just the initialization, the args wasn't parsed yet, so this if should be at OverrideFlags which happens after the parse.

Anyway afaics you are using >0 or <=0 everywhere so it's pointless to check it's value. I think it's safe to just remove it.

matchesFound = matchesFound + 1
if matchesFound > haproxy.maxOldConfigFiles {
if matchesFound > *haproxy.maxOldConfigFiles {
filePath := haproxy.configDir + "/" + f.Name()
glog.Infof("Removing old config file (%v). maxOldConfigFiles=%v", filePath, haproxy.maxOldConfigFiles)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a * on maxOldConfFiles or you'd output the memory address.

@westse
Copy link
Author

westse commented Mar 20, 2018

thanks, updated

@jcmoraisjr
Copy link
Owner

Lgtm, thanks for the updates!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants