Skip to content

Bug 1765044: Adds proxy support to ingress operator#334

Merged
openshift-merge-robot merged 6 commits intoopenshift:masterfrom
danehans:proxy_v2
Dec 13, 2019
Merged

Bug 1765044: Adds proxy support to ingress operator#334
openshift-merge-robot merged 6 commits intoopenshift:masterfrom
danehans:proxy_v2

Conversation

@danehans
Copy link
Contributor

@danehans danehans commented Dec 4, 2019

I would like to get feedback on the implementation before proceeding with providers other than aws.

Update: The implementation is provider agnostic.

@openshift-ci-robot openshift-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Dec 4, 2019

m.route53.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
m.elb.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
m.tags.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be made thread-safe?

errChan <- m.fileWatcher.Start(stop)
}()

// Wait for the watcher to exit or an explicit stop.
Copy link
Member

@wking wking Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit strange. Why launch two goroutines and then have a dummy waiter here? Can we just make m.fileWatcher.Start blocking, or is that not something we control?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wking I used the ingress operator's start() as a reference. The first goroutine is periodically ensuring the ca bundle used by dns clients is up-to-date while the second goroutine starts the file watcher.

// ca bundle of FileWatcher, returning false and the current ca
// bundle if the two are not equal.
func (m *Provider) caBundlesEqual() (bool, []byte, error) {
watchedCAs, err := ioutil.ReadFile(m.fileWatcher.GetFile())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is expensive. Can we just compare Stat mtimes, like I floated in golang/go#35887? I tested this locally, bumping the ConfigMap content and seeing the mtime change (although you need Stat to walk the symlink to see the real mtime in the linked file). On the other hand, reading the certs once a minute shouldn't be that bad, even if the reads are expensive ;).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I guess more fundamentally, if we're going to read the whole file off the disk, it seems like we might as well update the CAs without bothering to check if there were any changes. It's not obvious to me that a bytewise comparison in memory is all that much cheaper than just parsing out the certs into Go CAs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way might be to forgo fs notifications entirely and wrap access to the underlying file with a struct that implements a TTL for refresh. So like, calling CertFile.Get() would return cached data until its internal TTL (e.g. 30s, 1min) expires, at which point it will lock/refresh/re-cache/reset TTL. Such an approach wouldn't require any goroutines at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance considerations seem pretty inconsequential in this context — we're talking about reading a secret mount, which I believe is memory mapped? We could unconditionally re-read the contents every 15 seconds or something without any diffing at all and I'm not sure it would have any consequential impact on the platform

}

// CertsFromPEM parses pemCerts into a list of certificates.
func CertsFromPEM(pemCerts []byte) ([]*x509.Certificate, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really unfortunate that we need to reproduce this code instead of being able to use the stdlib's CertPool.AppendCertsFromPEM. I guess AWS doesn't take a CertPool for wherever this is being consumed? Can we bump Go's DefaultTransport TLSClientConfig RootCAs instead of talking to the AWS libraries directly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually, you can drop CertsFromPEM and use AppendCertsFromPEM instead in ensureDNSClientsTLSTransport to construct a transport. Then pass that transport in instead of your current

	m.route53.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
	m.elb.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
	m.tags.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)

which makes three identical transports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Annoying right? The cloud provider libs are idiosyncratic... Azure won't use the stdlib transport the way you would want either, so we have to wire up everything in a way the lib demands


m.route53.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
m.elb.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
m.tags.Client.Config.HTTPClient.Transport = certsutil.MakeTLSTransport(certs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot mutate the clients.

// ResourceGroupsTaggingAPI methods are safe to use concurrently. It is not safe to
// modify mutate any of the struct's properties though.

// ELB methods are safe to use concurrently. It is not safe to
// modify mutate any of the struct's properties though.

// Route53 methods are safe to use concurrently. It is not safe to
// modify mutate any of the struct's properties though.


// New returns a new FileWatcher watching the given file.
func New(file string) (*FileWatcher, error) {
var err error
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why declare err here?

if isRemove(event) {
if err := fw.watcher.Add(event.Name); err != nil {
log.Error(err, "error re-watching file %s", fw.file)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return here? Or do we want to try to read the file after receiving a remove event?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the watcher sees when the trusted ca bundle configmap is updated (i.e. a new ca cert is added):
2019-12-09T19:55:31.185Z INFO operator.filewatcher watcher/filewatcher.go:113 watched file change {"event": "\"/etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem\": REMOVE"}

if err := m.ensureDNSClientsTLSTransport(); err != nil {
log.Error(err, "failed to ensure dns client tls transport")
}
}, 1*time.Minute, stop)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a second, polling-based watcher built on top of the fsnotify-based watcher? This is very confusing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah the fsnotify-based watcher is continuously watching fileName and updating currentData. startWatcher() starts the fsnotify-based watcher and starts a goroutine that periodically ensures dns clients are using the latest ca bundle by comparing the fsnotify-based watcher's currentData with the data from reading /etc/pki/ca-trust/extracted/pem/tls-ca-bundle.pem. Let me know if you have any suggestions to improve. Maybe have handleEvent() to update the dns clients?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I find confusing is that you essentially have two watchers: a polling-based watcher and an fsnotify-based watcher, where the former is built atop the latter, and half the watcher logic is in the aws provider package and half is in the watcher package. Is the polling-based watcher logic going to be repeated in every provider implementation?

I would instead amend the fsnotify-based watcher to watch two channels: the fsnotify events channel and a ticker channel. Dan did something similar in openshift/router@4a4f5ab (that commit has some extra logic to distinguish between reloads triggered by fsnotify events and reloads triggered by ticks, but otherwise it does essentially what I am am suggesting). Although others may disagree, I find the single combined fsnotify- and ticker-based watcher easier to understand and reason about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah commit 4c6eb35 updated the PR and my testing against a live cluster produced expected results. However, the commit did not use a bool chan to update elb, tagging and route53 sessions. Commit 97eb21d implements the chan approach that you suggest. I'm waiting for my cluster to finish installing to test. I'll start testing this commit tomorrow morning PT. Thanks for the guidance!

@Miciah Miciah mentioned this pull request Dec 4, 2019
@danehans danehans changed the title WIP: Adds proxy support to ingress operator Bug 1765044: Adds proxy support to ingress operator Dec 9, 2019
@openshift-ci-robot openshift-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 9, 2019
@openshift-ci-robot
Copy link
Contributor

@danehans: An error was encountered adding this pull request to the external tracker bugs for bug 1765044 on the Bugzilla server at https://bugzilla.redhat.com:

JSONRPC error 100500: Insecure dependency in parameter 3 of DBI::db=HASH(0x5620f985c858)->do method call while running with -T switch at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 327.
at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 327.
Bugzilla::Extension::ExternalBugs::Bug::update_ext_info('Bugzilla::Extension::ExternalBugs::Bug=HASH(0x5620f9e12270)', 1) called at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 121
Bugzilla::Extension::ExternalBugs::Bug::create('Bugzilla::Extension::ExternalBugs::Bug', 'HASH(0x5620f98993e8)') called at /var/www/html/bugzilla/extensions/ExternalBugs/Extension.pm line 858
Bugzilla::Extension::ExternalBugs::bug_start_of_update('Bugzilla::Extension::ExternalBugs=HASH(0x5620f96cc990)', 'HASH(0x5620fb43fa40)') called at /var/www/html/bugzilla/Bugzilla/Hook.pm line 21
Bugzilla::Hook::process('bug_start_of_update', 'HASH(0x5620fb43fa40)') called at /var/www/html/bugzilla/Bugzilla/Bug.pm line 1170
Bugzilla::Bug::update('Bugzilla::Bug=HASH(0x5620f989d470)') called at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/WebService.pm line 80
Bugzilla::Extension::ExternalBugs::WebService::add_external_bug('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x5620f9dbe320)') called at (eval 2787) line 1
eval ' $procedure->{code}->($self, @params)
;' called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 220
JSON::RPC::Legacy::Server::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x5620f98b20a8)') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 295
Bugzilla::WebService::Server::JSONRPC::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x5620f98b20a8)') called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 126
JSON::RPC::Legacy::Server::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 70
Bugzilla::WebService::Server::JSONRPC::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/jsonrpc.cgi line 31
ModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::var_www_html_bugzilla_jsonrpc_2ecgi::handler('Apache2::RequestRec=SCALAR(0x5620fa7bd988)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207
eval {...} called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207
ModPerl::RegistryCooker::run('Bugzilla::ModPerl::ResponseHandler=HASH(0x562103cad798)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 173
ModPerl::RegistryCooker::default_handler('Bugzilla::ModPerl::ResponseHandler=HASH(0x562103cad798)') called at /usr/lib64/perl5/vendor_perl/ModPerl/Registry.pm line 32
ModPerl::Registry::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x5620fa7bd988)') called at /var/www/html/bugzilla/mod_perl.pl line 139
Bugzilla::ModPerl::ResponseHandler::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x5620fa7bd988)') called at (eval 2787) line 0
eval {...} called at (eval 2787) line 0

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

Details

In response to this:

Bug 1765044: Adds proxy support to ingress operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danehans
Copy link
Contributor Author

danehans commented Dec 9, 2019

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@danehans: An error was encountered adding this pull request to the external tracker bugs for bug 1765044 on the Bugzilla server at https://bugzilla.redhat.com:

JSONRPC error 100500: Insecure dependency in parameter 3 of DBI::db=HASH(0x5620fa21e4c0)->do method call while running with -T switch at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 327.
at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 327.
Bugzilla::Extension::ExternalBugs::Bug::update_ext_info('Bugzilla::Extension::ExternalBugs::Bug=HASH(0x5620f9c8d090)', 1) called at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/Bug.pm line 121
Bugzilla::Extension::ExternalBugs::Bug::create('Bugzilla::Extension::ExternalBugs::Bug', 'HASH(0x5620fa2b49d0)') called at /var/www/html/bugzilla/extensions/ExternalBugs/Extension.pm line 858
Bugzilla::Extension::ExternalBugs::bug_start_of_update('Bugzilla::Extension::ExternalBugs=HASH(0x5620f9cfd8a8)', 'HASH(0x5620fa230e90)') called at /var/www/html/bugzilla/Bugzilla/Hook.pm line 21
Bugzilla::Hook::process('bug_start_of_update', 'HASH(0x5620fa230e90)') called at /var/www/html/bugzilla/Bugzilla/Bug.pm line 1170
Bugzilla::Bug::update('Bugzilla::Bug=HASH(0x5620f9885488)') called at /loader/0x5620f51258a8/Bugzilla/Extension/ExternalBugs/WebService.pm line 80
Bugzilla::Extension::ExternalBugs::WebService::add_external_bug('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x5620fa2acec0)') called at (eval 2448) line 1
eval ' $procedure->{code}->($self, @params)
;' called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 220
JSON::RPC::Legacy::Server::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x5620fa054568)') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 295
Bugzilla::WebService::Server::JSONRPC::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x5620fa054568)') called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 126
JSON::RPC::Legacy::Server::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 70
Bugzilla::WebService::Server::JSONRPC::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/jsonrpc.cgi line 31
ModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::var_www_html_bugzilla_jsonrpc_2ecgi::handler('Apache2::RequestRec=SCALAR(0x5620f9ba75d0)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207
eval {...} called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207
ModPerl::RegistryCooker::run('Bugzilla::ModPerl::ResponseHandler=HASH(0x5620f9df2ca8)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 173
ModPerl::RegistryCooker::default_handler('Bugzilla::ModPerl::ResponseHandler=HASH(0x5620f9df2ca8)') called at /usr/lib64/perl5/vendor_perl/ModPerl/Registry.pm line 32
ModPerl::Registry::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x5620f9ba75d0)') called at /var/www/html/bugzilla/mod_perl.pl line 139
Bugzilla::ModPerl::ResponseHandler::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x5620f9ba75d0)') called at (eval 2448) line 0
eval {...} called at (eval 2448) line 0

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cliles
Copy link

cliles commented Dec 9, 2019

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@cliles: An error was encountered adding this pull request to the external tracker bugs for bug 1765044 on the Bugzilla server at https://bugzilla.redhat.com:

JSONRPC error 100500: Insecure dependency in parameter 3 of DBI::db=HASH(0x55b6fac4c7b0)->do method call while running with -T switch at /loader/0x55b6f4cee870/Bugzilla/Extension/ExternalBugs/Bug.pm line 327.
at /loader/0x55b6f4cee870/Bugzilla/Extension/ExternalBugs/Bug.pm line 327.
Bugzilla::Extension::ExternalBugs::Bug::update_ext_info('Bugzilla::Extension::ExternalBugs::Bug=HASH(0x55b6fb034288)', 1) called at /loader/0x55b6f4cee870/Bugzilla/Extension/ExternalBugs/Bug.pm line 121
Bugzilla::Extension::ExternalBugs::Bug::create('Bugzilla::Extension::ExternalBugs::Bug', 'HASH(0x55b6fa7d97e8)') called at /var/www/html/bugzilla/extensions/ExternalBugs/Extension.pm line 858
Bugzilla::Extension::ExternalBugs::bug_start_of_update('Bugzilla::Extension::ExternalBugs=HASH(0x55b6fa701818)', 'HASH(0x55b6fab335e0)') called at /var/www/html/bugzilla/Bugzilla/Hook.pm line 21
Bugzilla::Hook::process('bug_start_of_update', 'HASH(0x55b6fab335e0)') called at /var/www/html/bugzilla/Bugzilla/Bug.pm line 1170
Bugzilla::Bug::update('Bugzilla::Bug=HASH(0x55b6fb02ebe8)') called at /loader/0x55b6f4cee870/Bugzilla/Extension/ExternalBugs/WebService.pm line 80
Bugzilla::Extension::ExternalBugs::WebService::add_external_bug('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55b6fabad710)') called at (eval 2358) line 1
eval ' $procedure->{code}->($self, @params)
;' called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 220
JSON::RPC::Legacy::Server::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55b6fa7ed5d0)') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 295
Bugzilla::WebService::Server::JSONRPC::_handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...', 'HASH(0x55b6fa7ed5d0)') called at /usr/share/perl5/vendor_perl/JSON/RPC/Legacy/Server.pm line 126
JSON::RPC::Legacy::Server::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/Bugzilla/WebService/Server/JSONRPC.pm line 70
Bugzilla::WebService::Server::JSONRPC::handle('Bugzilla::WebService::Server::JSONRPC::Bugzilla::Extension::E...') called at /var/www/html/bugzilla/jsonrpc.cgi line 31
ModPerl::ROOT::Bugzilla::ModPerl::ResponseHandler::var_www_html_bugzilla_jsonrpc_2ecgi::handler('Apache2::RequestRec=SCALAR(0x55b6facbfcf0)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207
eval {...} called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 207
ModPerl::RegistryCooker::run('Bugzilla::ModPerl::ResponseHandler=HASH(0x55b6fb47b7e0)') called at /usr/lib64/perl5/vendor_perl/ModPerl/RegistryCooker.pm line 173
ModPerl::RegistryCooker::default_handler('Bugzilla::ModPerl::ResponseHandler=HASH(0x55b6fb47b7e0)') called at /usr/lib64/perl5/vendor_perl/ModPerl/Registry.pm line 32
ModPerl::Registry::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x55b6facbfcf0)') called at /var/www/html/bugzilla/mod_perl.pl line 139
Bugzilla::ModPerl::ResponseHandler::handler('Bugzilla::ModPerl::ResponseHandler', 'Apache2::RequestRec=SCALAR(0x55b6facbfcf0)') called at (eval 2358) line 0
eval {...} called at (eval 2358) line 0

Please contact an administrator to resolve this issue, then request a bug refresh with /bugzilla refresh.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

}
if len(region) == 0 {
return nil, fmt.Errorf("region is required")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is safe to move this validation from the session setup logic here to createDNSProvider. Here, we validate the region from the session configuration, which may be reading from ~/.aws/config or may be using the region from the cluster config, but createDNSProvider only has the cluster config, so it is not validating the same thing. @ironcladlou, do I have that right? It should only matter for local development though, so maybe it isn't worth the trouble to check the session config.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Miciah see #334 (comment), the check is bubbled up to the aws provider of createDNSProvider().

case configv1.AWSPlatformType:
if len(platformStatus.AWS.Region) == 0 {
return nil, fmt.Errorf("region is required")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is backwards-compatible with existing storage (openshift/api@dedfb47)...

See https://github.com/openshift/cluster-ingress-operator/blob/master/cmd/ingress-operator/start.go#L194.

Is this new check necessary?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, I misread — by the time you get here, the platformStatus has been created by getPlatformStatus and should be normalized/valid no matter the source. I think your validation here is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ironcladlou it's not a new check. I bubbled this check up from the dns/aws pkg.

m.elb = elb.New(sess, aws.NewConfig().WithRegion(m.config.Region))
m.route53 = route53.New(sess)
m.tags = resourcegroupstaggingapi.New(sess, aws.NewConfig().WithRegion("us-east-1"))
m.lock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually thread-safe absent locks near the readers throughout the rest of the code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FileWatcher struct uses a mutex to lock/unlock any mutations to the watched file data. The handleEvent() method uses a chan to indicate when the watcher updates the current data of the watched file. The ensureSessionTransport() updates the aws dns sessions when a value is received on the chan. StartWatcher() runs ensureSessionTransport() and filewatcher Start(), which blocks until the stop channel is closed and listens for fsnotify events and errors on chans. Let me know if you have any suggestions to improve thread safety.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is we have code such as the following:

	outerError := m.tags.GetResourcesPages(&resourcegroupstaggingapi.GetResourcesInput{
		ResourceTypeFilters: []*string{aws.String("route53:hostedzone")},
		TagFilters:          tagFilters,
	}, f)

To be thread safe, we now need to take a lock before using m.tags:

	m.lock.Lock()
	outerError := m.tags.GetResourcesPages(&resourcegroupstaggingapi.GetResourcesInput{
		ResourceTypeFilters: []*string{aws.String("route53:hostedzone")},
		TagFilters:          tagFilters,
	}, f)
	m.lock.Unlock()

And ditto for m.route53 and m.elb.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock/unlock occurs here for that tagging api call, here for the route53 api call, and here for elb api calls.

I'm getting ready to push a commit that will cause the ingress-operator pod to gracefully restart when the ca bundle changes to simplify the implementation and remove the need to worry about runaway goroutines.

@ironcladlou
Copy link
Contributor

@danehans @Miciah @frobware

The operator already supports graceful shutdown.

What if the operator were to shut down when the file changes just like when it receives a TERM? The operator would be immediately restarted and refresh its state with the latest cert contents.

The shutdown approach doesn't require a sidecar container and would work for all current and future providers without the need for new state management code (assuming they're wiring up TLS in when constructing their initial clients, which they all should be anyway.)

@danehans
Copy link
Contributor Author

@ironcladlou is it common for customers to monitor operators? If so, do you think it would trigger an alarm and cause cluster admins to wonder why the operator restarted?

@ironcladlou
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot
Copy link
Contributor

@ironcladlou: This pull request references Bugzilla bug 1765044, which is invalid:

  • expected the bug to target the "4.4.0" release, but it targets "4.3.0" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci-robot openshift-ci-robot added the bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. label Dec 12, 2019
@ironcladlou
Copy link
Contributor

/bugzilla refresh

@openshift-ci-robot openshift-ci-robot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Dec 12, 2019
@openshift-ci-robot
Copy link
Contributor

@ironcladlou: This pull request references Bugzilla bug 1765044, which is valid. The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danehans danehans force-pushed the proxy_v2 branch 3 times, most recently from c9cbdc4 to d122bec Compare December 12, 2019 22:25
Copy link
Contributor

@Miciah Miciah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple minor stylistic comments. Nothing important enough to worry about if CI passes.

@Miciah
Copy link
Contributor

Miciah commented Dec 12, 2019

/lgtm

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 12, 2019
@openshift-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: danehans, Miciah

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-bot
Copy link
Contributor

/retest

Please review the full test history for this PR and help us cut down flakes.

@openshift-merge-robot openshift-merge-robot merged commit eb30248 into openshift:master Dec 13, 2019
@openshift-ci-robot
Copy link
Contributor

@danehans: All pull requests linked via external trackers have merged. Bugzilla bug 1765044 has been moved to the MODIFIED state.

Details

In response to this:

Bug 1765044: Adds proxy support to ingress operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ironcladlou
Copy link
Contributor

@danehans thanks for all the work on this one! The approach you and @Miciah landed on is very easy to follow!

@ironcladlou
Copy link
Contributor

/cherrypick release-4.3

@openshift-cherrypick-robot

@ironcladlou: new pull request created: #340

Details

In response to this:

/cherrypick release-4.3

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@danehans
Copy link
Contributor Author

@kalexand-rh if we have any docs related to manually managing ingress dns records for proxy environments, the limitation can now be removed.

@openshift-ci-robot
Copy link
Contributor

@danehans: Bugzilla bug 1765044 is in an unrecognized state (VERIFIED) and will not be moved to the MODIFIED state.

Details

In response to this:

Bug 1765044: Adds proxy support to ingress operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants