-
Notifications
You must be signed in to change notification settings - Fork 833
Bugfix: Enable TLS server certificate validation by default #3030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: Enable TLS server certificate validation by default #3030
Conversation
98407f4
to
03e917e
Compare
f997530
to
7796cc9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solid work, thanks! You found the issue, fixed, tested, code changes are clean. Good attitude and good job 👏
I left a couple of nits and questions. Then we can 🚀
integration/e2e/service.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers:
To change the label in the container context, you can add either of two suffixes :z or :Z to the volume mount. These suffixes tell Docker to relabel file objects on the shared volumes. The z option tells Docker that two containers share the volume content. As a result, Docker labels the content with a shared content label. Shared volume labels allow all containers to read/write content. The Z option tells Docker to label the content with a private unshared label. Only the current container can use a private volume.
pkg/util/tls/tls.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth to also check for && !cfg.InsecureSkipVerify && cfg.ServerName == ""
or we could just assume that if no cert file has been configured then TLS is disabled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that check is there for the case when someone what's to connect via TLS, but is not using client certificate auth.
I think it is still valid to skipVerify all together or make use of the ServerName flag to require a certain hostname on the server ceritficate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that check is there for the case when someone what's to connect via TLS, but is not using client certificate auth.
Is this case supported? Have you ever tested it? Also, following the case you describe, looks there's no way to use it unless disabling skip verify or setting the server name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer to me to have special flag to enable TLS. I'm not sure if just setting InsecureSkipVerify
should enable TLS. WDYT?
For example, it should be enough to just enable TLS via this flag, without setting anything else: in that case default set of CA certificates is used, with no client authentication. I think that should be a valid option, but right now that's not possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a private conversation with Christian about this. I agree with you Peter, but I believe it's out of the scope of this PR honestly. This PR was supposed a bug fix and it's turning out to be a feature / change as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're all on the same page, but with different takes on what to do. Let's try to come up with action items.
Some facts:
- The version in
master
requires certs to enable TLS - This PR was initially meant to be a bugfix
- This PR is moving forward improving TLS, allowing to use TLS for "1) Client verifies server" only use case, but it's a bit buggy because you can't enable use case (1) without either disabling verification or setting server name
The question we have to answer is:
- Do we want this PR to be a bugfix or a mix of bugfix + additional feature?
My general suggestion is to keep separation of concerns so, if it's to me, I would:
- Keep this PR as bug fix (no logic change except for the skip verification)
- Open a new PR with the TLS improvements
If you prefer to do both in this PR I'm OK as well, but I believe we should fix the config issue. I guess we should add an explicit option to enable TLS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason this became an "enhancement" as well, was that the integration tests only work if you either skip-verify
or provide a server-name
. Otherwise there will be a validation error, as we are connecting to unpredictable hosts IP addresses, which we get from consul (I assume).
If we want to keep it as minimal as possible we probably should set skip-verify to true in the integration tests and then iterate on the whole TLS implementation.
My feeling is we need to treat GRPC and HTTP client configuration slightly different:
If you enable TLS with HTTP you are required to use https:// in the URL explicitly. For GRPC we default to tls-insecure-skip-verify being true, through this
Line 61 in a4aad5d
return []grpc.DialOption{grpc.WithInsecure()}, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep it as minimal as possible we probably should set skip-verify to true in the integration tests and then iterate on the whole TLS implementation.
Setting skip-verify
in integration tests sounds reasonable to me.
If you enable TLS with HTTP you are required to use https:// in the URL explicitly.
Right. It's not so much about enabling rather than configuring TLS in this case.
For GRPC we default to tls-insecure-skip-verify being true, through this
Whether TLS is used for gRPC or not should be specified for each gRPC config, and default to false (and when using grpc.WithInsecure()
). It looks like it should not even be in tls.ClientConfig
config then. 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To move further with this PR, I believe Marco's suggestion is to do
if cfg.CertPath == "" && cfg.KeyPath == "" && cfg.CAPath == "" && !cfg.InsecureSkipVerify && cfg.ServerName == "" { | |
if cfg.CertPath == "" && cfg.KeyPath == "" && cfg.CAPath == "" { |
i.e. to enable TLS only if certificates are configured. That is compatible with behavior on master, although not as clear as explicit flag, but we can do the flag in other PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will split out the bugfix from the enhancement of allowing a ServerName specified.
Without having the ability to configure a expected ServerName, we will not have a feasible way of running running cortex without skipping verify (as explained in my comment before), so it should remain a priority sorting that out. We will need to think a bit more how to expose that flag for GRPC clients. Another big focus needs to be testing TLS. The current coverage is quite minimal and will even be more limited by skipping verify.
7796cc9
to
726d490
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good PR, thanks for fixing hardcoded skip verify – that's clearly bad.
pkg/util/tls/tls.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be clearer to me to have special flag to enable TLS. I'm not sure if just setting InsecureSkipVerify
should enable TLS. WDYT?
For example, it should be enough to just enable TLS via this flag, without setting anything else: in that case default set of CA certificates is used, with no client authentication. I think that should be a valid option, but right now that's not possible.
fd88adb
to
0fbdab3
Compare
pkg/util/tls/tls.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f.StringVar(&cfg.CAPath, prefix+".tls-ca-path", "", "Patht to the CA certificates file to validate server certificate against. If not set, the host's root CA certificates are used.") | |
f.StringVar(&cfg.CAPath, prefix+".tls-ca-path", "", "Path to the CA certificates file to validate server certificate against. If not set, the host's root CA certificates are used.") |
pkg/util/tls/tls.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to keep it as minimal as possible we probably should set skip-verify to true in the integration tests and then iterate on the whole TLS implementation.
Setting skip-verify
in integration tests sounds reasonable to me.
If you enable TLS with HTTP you are required to use https:// in the URL explicitly.
Right. It's not so much about enabling rather than configuring TLS in this case.
For GRPC we default to tls-insecure-skip-verify being true, through this
Whether TLS is used for gRPC or not should be specified for each gRPC config, and default to false (and when using grpc.WithInsecure()
). It looks like it should not even be in tls.ClientConfig
config then. 🤔
pkg/util/tls/tls.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To move further with this PR, I believe Marco's suggestion is to do
if cfg.CertPath == "" && cfg.KeyPath == "" && cfg.CAPath == "" && !cfg.InsecureSkipVerify && cfg.ServerName == "" { | |
if cfg.CertPath == "" && cfg.KeyPath == "" && cfg.CAPath == "" { |
i.e. to enable TLS only if certificates are configured. That is compatible with behavior on master, although not as clear as explicit flag, but we can do the flag in other PR.
0fbdab3
to
550daae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @simonswine for your hard work here and to patiently address my feedback. I know the discussion went longer then initially planned, but I think we're super close to merge it. I left one last comment.
f9bacbf
to
0582424
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
Would you mind opening a GitHub issue with your suggestions to improve TLS support, not necessarily requiring certs? I think the enhancements you did propose here were very good, but we need to add a config option to explicitly enable TLS.
Yes will take a look at this tomorrow |
Thank you both for the reviews 👍 |
Fixes configuration for TLS server validation, TLS skip verify was enabled for all TLS configurations and prevented validation of server certificates. Signed-off-by: Christian Simon <[email protected]>
b63faaa
to
2c5f360
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
This removes the bash script for CA and certificate generation and replaces it with some golang library. This allows more control over the certificates generated. The tls-server-name flags are now used to ensure we are talking to a server with an appropriate certificate. Signed-off-by: Christian Simon <[email protected]>
The 'z' option tells Docker that the volume content will be shared between containers. Otherwise SELinux prevents file access between the containers. Signed-off-by: Christian Simon <[email protected]>
Signed-off-by: Christian Simon <[email protected]>
Signed-off-by: Christian Simon <[email protected]>
2c5f360
to
3070bf9
Compare
…oject/cortex#3030) * Ensure TLS server certificate validation is enabled Fixes configuration for TLS server validation, TLS skip verify was enabled for all TLS configurations and prevented validation of server certificates. Signed-off-by: Christian Simon <[email protected]> * Improve integration tests This removes the bash script for CA and certificate generation and replaces it with some golang library. This allows more control over the certificates generated. The tls-server-name flags are now used to ensure we are talking to a server with an appropriate certificate. Signed-off-by: Christian Simon <[email protected]> * Mount volumes during integration tests with `z` The 'z' option tells Docker that the volume content will be shared between containers. Otherwise SELinux prevents file access between the containers. Signed-off-by: Christian Simon <[email protected]> * Provide a better error if cert or key is missing Signed-off-by: Christian Simon <[email protected]> * Add comments where x509 material is sourced from Signed-off-by: Christian Simon <[email protected]>
What this PR does:
Fixes configuration for TLS server validation, TLS skip verify was enabled for all TLS configurations and prevented validation of server certificates. This change adds a new flag for skipping validation which defaults to
false
.Which issue(s) this PR fixes:
Fixes #3029
Checklist
CHANGELOG.md
updated - the order of entries should be[CHANGE]
,[FEATURE]
,[ENHANCEMENT]
,[BUGFIX]