-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to support debugging haproxy reload not loading correct config #123
Conversation
westse
commented
Mar 16, 2018
- Under high load situations (hundreds of services, certs, and ingress rules), we have seen infrequent but consistent cases of haproxy not reflecting the on-disk config file. One theory is a race condition on new haproxy processes pointing to an ever-changing config file. This commit gives each haproxy process its own config file that does not change. Not only should this remove the risk of loading a partially written file, it will also help debug the issue because we see a snapshot of the config used for each haproxy process.
- Config files are left on disk up to a max threshold, configurable via environment variable, to allow debugging issues.
- Not supported under with reloadStrategy=multibinder since multibinder-haproxy-wrapper currently only supports receiving a USR2 signal to reload a fixed config file
- Under high load situations (hundreds of services, certs, and ingress rules), we have seen infrequent but consistent cases of haproxy not reflecting the on-disk config file. On theory is a race condition on new haproxy processes pointing to an ever-changing config file. This commit gives each haproxy process its own config file that does not change. Not only should this remove the risk of loading a partially written file, it will also help debug the issue because we see a snapshot of the config used for each haproxy process. - Config files are left on disk up to a max threshold, configurable via environment variable, to allow debugging issues. - Not supported under with reloadStrategy=multibinder since multibinder-haproxy-wrapper currently only supports receiving a USR2 signal to reload a fixed config file
} | ||
return err | ||
} | ||
|
||
// RewriteConfigFiles safely replaces configuration files with new contents after validation | ||
func rewriteConfigFiles(data []byte, reloadStrategy, configFile string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that you are rewriting the multibinder
and non-multibinder
usages here, which concerns me mostly due to fear of change than anything else. Curious, did you consider the following flow instead and why would it be a bad approach?
func rewriteConfigFiles(data []byte, reloadStrategy, configFile string) error {
// Determine temp file path using a timestamp
// Write rendered template file there
// if reload strategy == multibinder then do multibinder logic for moving it into place
// else do non-multibinder work with config file
// copy temp file to location for later investigation if required
// rename file to the /etc/haproxy/haproxy.cfg
}
Seems to me that this approach would minimize the amount of code you would need to touch and still give you a rendered template somewhere for investigation? Also unsure about where in that flow a cleanup step would be most efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm following your pseudo code, wouldn't the last step prevent each haproxy process from pointing to a unique timestamped config file? That would defeat the primary changed intended by this pull request (for the non-multibinder usage).
I agree with the sentiment to touch only the code that needs to change. I initially tried this but once I understood what it did I felt some bits of logic were confusing enough that it would be improved with what you see here. For example, I found it suprising that the non-multibinder case used the post-rename config file, while the multibinder case did not. I think this change makes the behavior in each usage more clear, but I may be wrong and am definitely open.
Thanks for taking a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the difference in approach now. My hope was that all reload strategies would be capable of at least storing copies of their config at reload time (by copying the file off before putting it into place), but understand that is simply adding files that provide limited value. Look forward to verifying this code when the new version gets built and tagged!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some new tests and can confirm that everything related to reload works synchronously - from ingress events, config generation and also HAProxy reload, which only fork the process to the background after parses the config file.
There is only one issue here afaics: multibinder uses signal to restart the process which works asynchronously. Perhaps your use case was using multibinder? This could be patched eg at reload.sh.
Anyway I'm fine with this PR because one could want to have old config files for eg debugging his environment.
Following some changing suggestions, have a look if you agree.
pkg/controller/controller.go
Outdated
haproxy.configDir = "/etc/haproxy" | ||
haproxy.configFilePrefix = "haproxy" | ||
haproxy.configFileSuffix = ".cfg" | ||
haproxy.template = newTemplate("haproxy.tmpl", "/etc/haproxy/template/haproxy.tmpl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use newTemplate as the else part of the if multibinder statement.
pkg/controller/controller.go
Outdated
envVar := os.Getenv("MAX_OLD_CONFIG_FILES") | ||
if maxOldConfigFiles, err := strconv.Atoi(envVar); err == nil { | ||
haproxy.maxOldConfigFiles = maxOldConfigFiles | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use a command-line argument instead, declare at ConfigureFlags func just below reloadStrategy.
I'd also like to see a zero option, which means no old backup config should be used. I have some manual testing scenarios where I need a known filename, perhaps others could have similar issues.
pkg/controller/controller.go
Outdated
@@ -166,56 +181,87 @@ func (haproxy *HAProxyController) OnUpdate(cfg ingress.Configuration) error { | |||
return nil | |||
} | |||
|
|||
out, err := haproxy.reloadHaproxy() | |||
reloadCmd := haproxy.reloadHaproxy(configFile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reloadHaproxy func could be removed and it's single line placed here.
pkg/controller/controller.go
Outdated
func (haproxy *HAProxyController) rewriteConfigFiles(data []byte) (string, error) { | ||
// Include timestamp in config file name to aid troubleshooting. When using a single, ever-changing config file it | ||
// was difficult to know what config was loaded by any given haproxy process | ||
timestamp := time.Now().Format("-060102-150405.0000") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about four digits to the year and three to the milisseconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcmoraisjr Do you mean a formate like -0102-150405.000
, where year and nanoseconds are lost but month, day, hour, second, millisecond are retained? I think that will be fine since there is low risk of overwriting based on either of those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean following this convention: -20060102-150405.000
.
pkg/controller/controller.go
Outdated
} | ||
} | ||
err = os.Rename(tmpf, configFile) | ||
|
||
haproxy.removeOldConfigFiles(haproxy.configFileSuffix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removeOldCOnfigFiles func has already access to this argument.
if err != nil { | ||
glog.Warningln("Error updating config file") | ||
return err | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
error isn't being used, so log a warningf("...: %v", err) here and return without an argument
@jcmoraisjr I've incorporated each item you mentioned, except the timestamp format (see my question). So let me know how it looks. Thanks. |
2e24dd3
to
9e43c6a
Compare
Ok, I've fixed the timestamp format now as well. |
Just a few more issues:
|
9e43c6a
to
a9f5f1b
Compare
Oops, I forgot to implement the "zero means no timestamp" part. Good catches - the code has been updated. Also feel free to tweak as you see fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two more issues and I think we're done
pkg/controller/controller.go
Outdated
`Maximum old haproxy timestamped config files to allow before being cleaned up. A value <= 0 indicates a single non-timestamped config file will be used`) | ||
if *haproxy.maxOldConfigFiles < 0 { | ||
*haproxy.maxOldConfigFiles = 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just the initialization, the args wasn't parsed yet, so this if should be at OverrideFlags which happens after the parse.
Anyway afaics you are using >0 or <=0 everywhere so it's pointless to check it's value. I think it's safe to just remove it.
pkg/controller/controller.go
Outdated
matchesFound = matchesFound + 1 | ||
if matchesFound > haproxy.maxOldConfigFiles { | ||
if matchesFound > *haproxy.maxOldConfigFiles { | ||
filePath := haproxy.configDir + "/" + f.Name() | ||
glog.Infof("Removing old config file (%v). maxOldConfigFiles=%v", filePath, haproxy.maxOldConfigFiles) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a *
on maxOldConfFiles or you'd output the memory address.
a9f5f1b
to
c9f232d
Compare
thanks, updated |
Lgtm, thanks for the updates! |