-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add client certificate authentication #11421
Add client certificate authentication #11421
Conversation
84a9a41
to
26128de
Compare
26128de
to
e050358
Compare
Nice, tests now by and large work. On fedora-29 there is a (non-fatal) SELinux violation:
|
e050358
to
495831e
Compare
495831e
to
add0d8d
Compare
add0d8d
to
29a990e
Compare
I quickly talked about the PAM stuff with Stef. I was previously on the fence between entirely skipping the This is now ready for review. @stefwalter , I'd appreciate if you have some time to take a look. |
@stefwalter : Some background (also explained in JIRA): This is mostly for demoing, and unblocking development of other parts of the entire experience (like how to handle sudo authentication via smartcards, configuring the UI differently, etc.). We won't mention that in the release notes, and this deliberately isn't documented in the guide or manpages. This will happen once we have a secure way to do this (see jira task COCKPIT-369). |
29a990e
to
e62ebd1
Compare
@stefwalter : I reworked this to package the new binary into a new rpm, which only gets built for CI for now. For interested people it's easy enough to enable in the .spec by changing the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments. No major bugs found. This has several expected security holes. But we need to mark them clearly.
What would the the impact of just keeping this on a feature branch be?
So, nothing too serious really. With the "tech preview" package here it won't actually get shipped in releases, so landing this PR only has a minimal actual impact. But I have no strong opinion either way. By far the most useful purpose of that PR is to make sure it works everywhere. |
e62ebd1
to
c19ef52
Compare
c19ef52
to
1630c12
Compare
Initialize the idle timer fd to the expected "unset" value of -1, instead of 0, as the rest of the code expects. Let all unit tests except the explicit idle test run with a zero idle timeout, so that it doesn't inadvertently stop the server when the tests take longer than a second (as they do under valgrind).
See https://bugzilla.redhat.com/show_bug.cgi?id=1770159 This affects the FreeIPA setup on our "services" image, and in general, FreeIPA that runs from Fedora 30/31. CentOS/RHEL instances are not affected.
201515d
to
a941a59
Compare
I now did the final squashing and rebasing. @mvollmer, can you please have a look at this? Both Lis and I stared at this for too long now to be able to still see bits that are unclear, I'm afraid. |
In particular, I still have two outstanding review comments further above to simplify the |
* that. It might be useful for debugging, though. | ||
*/ | ||
|
||
// warnx ("Not running in a template cgroup, unable to parse systemd unit instance.\n\n/proc/self/cgroups content follows:\n%s\n", buf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that for unit testing? Because in production, both the PAM module and cockpit-ws surely run in a cgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed this during testing, yes, but in production there will be some cockpit-ws instances which are not running in a @[clientcert] cgroup. The static http one comes to mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some basic review. Hope it helps.
<ulink url="https://en.wikipedia.org/wiki/Active_Directory">Active Directory</ulink>, | ||
which can associate certificates to users. | ||
</para> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most people who are using this will be using it against Active Directory and its certificates. Could we add documentation for that here? Perhaps as a follow up pull request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, yes. We have zero experience with and access to AD in the team right now, but eventually this should be done and documented.
TasksMax=100 | ||
# add new restriction | ||
CPUQuota=30% | ||
</programlisting> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this compound and/or relate to and/or conflict with the authentication metering settings represented by the cockpit.conf MaxStartups
setting?
Is there an opportunity for these two settings to be consolidated into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, MaxStartups
is only considered by cockpit-ws cockpitauth.c. It means something different, though -- MaxStartups
essentially means "restrict login attempts", while this limits the number of parallel login sessions (which should be much higher than 10 surely). But it seems worth thinking about this more deeply, thanks for pointing out!
int result = PAM_IGNORE; | ||
int r; | ||
const char *pam_user = NULL; | ||
char cert_pem[MAX_PEER_CERT_SIZE]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we comfortable with a stack allocation here? It seems risky from a security perspective.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I defer to your judgement for that, I don't have enough experience with this to decide which is better. I'll change it to mallocx()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no qualms about security here, but I actually dislike the hardcoded arbitrary limit.
if (https_instance_has_certificate_file (NULL, 0) != -1) | ||
{ | ||
g_debug ("TLS connection has peer certificate, using tls-cert auth type"); | ||
type = g_strdup ("tls-cert"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we version this string? Or come up with a completely new one once lis does some TLS handshake foo splitting (we can dream)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now all the bits that use it are in the same package (cockpit-ws.rpm), and this doesn't work in cockpit/ws container. So we retain the option to change how this works (and I'm sure we'll change it one or three more times). If you feel better with a "tls-cert1" we can do this, but this shines through to cockpit.conf authentication modes, doesn't it? I'd hate to break existing config files some day.
* returns "123abc" instance name (static string) | ||
*/ | ||
static const char * | ||
get_ws_https_instance (void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the cgroup for this information is counter to the typical flow for Cockpit. The typical flow is:
- cockpit-ws: relay credentials from untrusted source
- cockpit-session: Receive credentials from untrusted source
- cockpit-session, PAM: Use credentials to authenticate
- cockpit-session: Use authentication to start session
To follow along this would be:
- cockpit-ws: Relay certificate information received in "authorize" tls-cert message
- cockpit-session: Parse out that tls-cert and pass to PAM module as a auth token
- pam_cockpit_cert.so: Check that auth token is actually the certificate used with the connection, check it matches the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked about this on IRC. Making this conform to the typical flow is just extra needless work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this at length on IRC, but summary: this "typical" information flow doesn't work for this setup. The whole point of splitting out TLS termination and isolation ws instances for different certificates is that we assume ws can be hacked, and thus can't give us trustworthy information. Passing something like a sealed memfd would be okayish (although ws can try to pass it on to a parallel ws instance, at which point it can impersonate another user session again). Thus c-session/the PAM module have to check the cgroup (which really means "which certificate does this ws instance handle?") anyway, and at that point there is no further information that ws could give to PAM that it doesn't already possess.
Lis had an interesting idea with tls creating nonces for each connection, sending these to ws as the first thing after accept()
, and then passing on these nonces through the "tls-cert" auth protocol. That would again be more in line with the "typical flow".
--> | ||
|
||
<refentryinfo> | ||
<title>pam_cockpit_cert</title> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole binary seems like an implementation detail. Are we sure we want to document it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this to doc/ folder as a markdown file instead if it's not "API".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it appears in /etc/pam.d/, I felt that some admins would like to know what it is and how to configure it. So in that sense, that's the only API that we have with this PR right now.
I pushed a fixup with resolving the first batch of @stefwalter 's review comments: the trivial ones, docs, assertions, and minor code cleanup. |
I now added a fixup that splits the PAM stack. This is trading always running a no-op in pam_cert_auth against having to deal with another /etc/ config file. In particular, if we ever consider splitting this out into a separate package (which I'm currently not a fan of), we'll have to migrate this conffile, which is always a bit painful. @stefwalter, @allisonkarlitskaya : any opinions? Note: I'm happy to add an extra check to the PAM module that ClientCertAuthentication is enabled in cockpit.conf. It feels a bit weird, but I suppose it can't hurt. |
e94bb6e
to
0a2e14c
Compare
After discussions with @stefwalter and @allisonkarlitskaya I took out the "separate PAM stack" commit again, as it would commit us more to the current architecture and cause possible config file troubles in the future, without giving us a big benefit right now. I also squashed in the fixups. |
err (EXIT_FAILURE, "Failed to unlink just-created certificate file %s", fingerprint.str); | ||
} | ||
|
||
pthread_mutex_unlock (&certfile_mutex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! This is indeed very gross. We need to fix this.
I think I'll just have the early failure (there's only one case) return immediately (which since we defer the PEM generation, means no allocation has been done at the time). I'll also rename the label to make this more obvious.
Bring back the export of client certificates for current connections into the runtime directory. This got dropped in commit 7086711. Export them to `$RUNTIME_DIRECTORY` under their peer certificate fingerprint name. That can be derived from the systemd unit (cgroup) in which cockpit-ws and cockpit-session run. Use flock() to implement a reference count on each file. Each exported certificate file will only last as long as there is an active connection with that client certificate. The authentication stack can thus rely on certificates belonging to a current Cockpit session -- at least the websocket connection will be open all the time. If cockpit-tls itself quits, systemd will clean out the RuntimeDirectory= for us, to avoid getting stale data. Drop the restricted RuntimeDirectory permissions, so that they use the default 0755. This allows the unprivileged cockpit-ws to check if there is a client certificate, and thus if it should request authentication with that. Add an unit test that opens 20 parallel connections, to ensure that the certificate refcounting works and nothing deadlocks: - The first variant of the test only waits until after the TLS handshake. It can happen that one of the connections did not yet complete the certificate exporting part before the test starts closing fds. In this case, when we go to check to verify that the certificate file "still" exists, it might simply not yet have been created. Thus wait for it to appear. - The "alternate" variant uses a new client certificate to connect to an "alternate mode" of the socket activation server which, instead of running cockpit-ws, writes a "hello" message and pauses. We use reception of this "hello" message as an indicator that the service has been fully initialised.
Add a new pam_cockpit_cert PAM module for TLS client certificate based authentication. This uses sssd's API to map a TLS certificate to a user [1]. Use the sd-bus API for making the D-Bus calls, instead of gdbus (glib/gio has too many side effects, which should be avoided in a setuid root program) or libdbus (which would be a new dependency). Introduce a new `tls-cert` authorization type into cockpit-ws/cockpit-session that gets enabled with the new `ClientCertAuthentication` option in cockpit.conf. If this is enabled, and cockpit-tls exports a current TLS client certificate, pam_cockpit_cert maps the certificate to a user name. Unfortunately the Chrome Devtools Protocol does not currently offer client certificate import and selecting one for a page that requests a certificate. Test this with curl instead, and describe in the comments how to test this interactively. This requires some more SELinux policy modifications. Apply them locally in cockpit.spec until they land in Fedora/RHEL. [1] https://www.freeipa.org/page/V4/User_Certificates Fixes cockpit-project#8429 Related: https://bugzilla.redhat.com/show_bug.cgi?id=1678465 Jira: COCKPIT-368 Closes cockpit-project#11421
de15eae
to
75f342e
Compare
The fedora/firefox failure is most likely fixed by PR #13221 |
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1678465
Jira: COCKPIT-368
we need to validate the certor leave that to sssd (that)Possible blockers:
Use a separate cockpit-cert PAM identifier and config file (that then includes cockpit): added as commit-- let's not do that, it would commit us more to the current architecture, which may change a lot stillcheck errors or assert success ofpthread_mutex_{,un}lock()
Non-blockers:
Follow-ups: