Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add client certificate authentication #11421

Merged
merged 4 commits into from
Dec 2, 2019

Conversation

martinpitt
Copy link
Member

@martinpitt martinpitt commented Mar 18, 2019

Related: https://bugzilla.redhat.com/show_bug.cgi?id=1678465
Jira: COCKPIT-368

Possible blockers:

  • Use a separate cockpit-cert PAM identifier and config file (that then includes cockpit): added as commit -- let's not do that, it would commit us more to the current architecture, which may change a lot still
  • check errors or assert success of pthread_mutex_{,un}lock()

Non-blockers:

Follow-ups:

  • Consider improving failure mode when exceeding https instance slice limits
  • Generate a nonce for every connection and give that to ws as authentication token. With that we can replace the cgroup lookup/mapping and tighten the RuntimeDir to 0700 again. Jira web: Fix leave handler of networking page. #545
  • Replace stack arrays with malloc, for better stack overflow protection -- does that still make sense with today's tool chains?
  • Limit cert size in connection.c, for DoS protection? (PAM module already limits it to 100kB)

@martinpitt martinpitt added blocked Don't land until something else happens first (see task list) and removed blocked Don't land until something else happens first (see task list) labels Mar 18, 2019
@martinpitt
Copy link
Member Author

Nice, tests now by and large work. On fedora-29 there is a (non-fatal) SELinux violation:

audit: type=1400 audit(1553006049.704:621): avc:  denied  { map } for  pid=8208 comm="pool" path="/etc/pki/ca-trust/source/README" dev="dm-0" ino=8538086 scontext=system_u:system_r:cockpit_ws_t:s0 tcontext=system_u:object_r:cert_t:s0 tclass=file permissive=0
audit: type=1400 audit(1553006049.710:622): avc:  denied  { map } for  pid=8208 comm="pool" path="/etc/pki/ca-trust/source/ipa.p11-kit" dev="dm-0" ino=9401424 scontext=system_u:system_r:cockpit_ws_t:s0 tcontext=system_u:object_r:cert_t:s0 tclass=file permissive=0

@martinpitt
Copy link
Member Author

I quickly talked about the PAM stuff with Stef. I was previously on the fence between entirely skipping the auth PAM stage with TLS certificates, or conditionalize it to skip substack password-auth when using TLS termination. But the latter only makes things more brittle, and the auth stage should not really do anything else than password validation (in particular, postlogin also runs as part of session, as it should). So I dropped that TODO item and just updated the comment.

This is now ready for review. @stefwalter , I'd appreciate if you have some time to take a look.

@martinpitt martinpitt requested a review from stefwalter March 20, 2019 11:01
@martinpitt martinpitt changed the title WIP: ws: Client certificate authentication proof of concept ws: Introduce (yet unsafe) client certificate authentication Mar 20, 2019
@martinpitt
Copy link
Member Author

@stefwalter : Some background (also explained in JIRA): This is mostly for demoing, and unblocking development of other parts of the entire experience (like how to handle sudo authentication via smartcards, configuring the UI differently, etc.). We won't mention that in the release notes, and this deliberately isn't documented in the guide or manpages. This will happen once we have a secure way to do this (see jira task COCKPIT-369).

@martinpitt
Copy link
Member Author

@stefwalter : I reworked this to package the new binary into a new rpm, which only gets built for CI for now. For interested people it's easy enough to enable in the .spec by changing the build_tech_preview macro.

Copy link
Contributor

@stefwalter stefwalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments. No major bugs found. This has several expected security holes. But we need to mark them clearly.

What would the the impact of just keeping this on a feature branch be?

@martinpitt
Copy link
Member Author

What would the the impact of just keeping this on a feature branch be?

  • we don't automatically track the functionality of the lower levels (sssd-dbus, IPA CLI, etc.) during image refreshes; in particular, the development of the SELinux policy updates
  • this PR would need to be rebased every once in a while, to make sure it doesn't bitrot too much

So, nothing too serious really. With the "tech preview" package here it won't actually get shipped in releases, so landing this PR only has a minimal actual impact.

But I have no strong opinion either way. By far the most useful purpose of that PR is to make sure it works everywhere.

@martinpitt martinpitt added the blocked Don't land until something else happens first (see task list) label Mar 22, 2019
Initialize the idle timer fd to the expected "unset" value of -1,
instead of 0, as the rest of the code expects.

Let all unit tests except the explicit idle test run with a zero idle
timeout, so that it doesn't inadvertently stop the server when the tests
take longer than a second (as they do under valgrind).
See https://bugzilla.redhat.com/show_bug.cgi?id=1770159

This affects the FreeIPA setup on our "services" image, and in general,
FreeIPA that runs from Fedora 30/31. CentOS/RHEL instances are not
affected.
@martinpitt martinpitt changed the title WIP: ws: Add client certificate authentication Add client certificate authentication Nov 28, 2019
@martinpitt martinpitt requested a review from mvollmer November 28, 2019 10:48
@martinpitt
Copy link
Member Author

I now did the final squashing and rebasing. @mvollmer, can you please have a look at this? Both Lis and I stared at this for too long now to be able to still see bits that are unclear, I'm afraid.

@martinpitt
Copy link
Member Author

In particular, I still have two outstanding review comments further above to simplify the Fingerprint struct. IMHO this is overkill, but it's also not important enough to spend a lot of debate time over it.

* that. It might be useful for debugging, though.
*/

// warnx ("Not running in a template cgroup, unable to parse systemd unit instance.\n\n/proc/self/cgroups content follows:\n%s\n", buf);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that for unit testing? Because in production, both the PAM module and cockpit-ws surely run in a cgroup.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed this during testing, yes, but in production there will be some cockpit-ws instances which are not running in a @[clientcert] cgroup. The static http one comes to mind.

stefwalter
stefwalter previously approved these changes Nov 28, 2019
Copy link
Contributor

@stefwalter stefwalter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some basic review. Hope it helps.

<ulink url="https://en.wikipedia.org/wiki/Active_Directory">Active Directory</ulink>,
which can associate certificates to users.
</para>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most people who are using this will be using it against Active Directory and its certificates. Could we add documentation for that here? Perhaps as a follow up pull request?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, yes. We have zero experience with and access to AD in the team right now, but eventually this should be done and documented.

TasksMax=100
# add new restriction
CPUQuota=30%
</programlisting>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this compound and/or relate to and/or conflict with the authentication metering settings represented by the cockpit.conf MaxStartups setting?

Is there an opportunity for these two settings to be consolidated into one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, MaxStartups is only considered by cockpit-ws cockpitauth.c. It means something different, though -- MaxStartups essentially means "restrict login attempts", while this limits the number of parallel login sessions (which should be much higher than 10 surely). But it seems worth thinking about this more deeply, thanks for pointing out!

int result = PAM_IGNORE;
int r;
const char *pam_user = NULL;
char cert_pem[MAX_PEER_CERT_SIZE];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we comfortable with a stack allocation here? It seems risky from a security perspective.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I defer to your judgement for that, I don't have enough experience with this to decide which is better. I'll change it to mallocx().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no qualms about security here, but I actually dislike the hardcoded arbitrary limit.

if (https_instance_has_certificate_file (NULL, 0) != -1)
{
g_debug ("TLS connection has peer certificate, using tls-cert auth type");
type = g_strdup ("tls-cert");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we version this string? Or come up with a completely new one once lis does some TLS handshake foo splitting (we can dream)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now all the bits that use it are in the same package (cockpit-ws.rpm), and this doesn't work in cockpit/ws container. So we retain the option to change how this works (and I'm sure we'll change it one or three more times). If you feel better with a "tls-cert1" we can do this, but this shines through to cockpit.conf authentication modes, doesn't it? I'd hate to break existing config files some day.

* returns "123abc" instance name (static string)
*/
static const char *
get_ws_https_instance (void)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the cgroup for this information is counter to the typical flow for Cockpit. The typical flow is:

  1. cockpit-ws: relay credentials from untrusted source
  2. cockpit-session: Receive credentials from untrusted source
  3. cockpit-session, PAM: Use credentials to authenticate
  4. cockpit-session: Use authentication to start session

To follow along this would be:

  1. cockpit-ws: Relay certificate information received in "authorize" tls-cert message
  2. cockpit-session: Parse out that tls-cert and pass to PAM module as a auth token
  3. pam_cockpit_cert.so: Check that auth token is actually the certificate used with the connection, check it matches the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked about this on IRC. Making this conform to the typical flow is just extra needless work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this at length on IRC, but summary: this "typical" information flow doesn't work for this setup. The whole point of splitting out TLS termination and isolation ws instances for different certificates is that we assume ws can be hacked, and thus can't give us trustworthy information. Passing something like a sealed memfd would be okayish (although ws can try to pass it on to a parallel ws instance, at which point it can impersonate another user session again). Thus c-session/the PAM module have to check the cgroup (which really means "which certificate does this ws instance handle?") anyway, and at that point there is no further information that ws could give to PAM that it doesn't already possess.

Lis had an interesting idea with tls creating nonces for each connection, sending these to ws as the first thing after accept(), and then passing on these nonces through the "tls-cert" auth protocol. That would again be more in line with the "typical flow".

-->

<refentryinfo>
<title>pam_cockpit_cert</title>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole binary seems like an implementation detail. Are we sure we want to document it?

Copy link
Contributor

@stefwalter stefwalter Nov 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe move this to doc/ folder as a markdown file instead if it's not "API".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it appears in /etc/pam.d/, I felt that some admins would like to know what it is and how to configure it. So in that sense, that's the only API that we have with this PR right now.

@martinpitt
Copy link
Member Author

I pushed a fixup with resolving the first batch of @stefwalter 's review comments: the trivial ones, docs, assertions, and minor code cleanup.

@martinpitt
Copy link
Member Author

I now added a fixup that splits the PAM stack. This is trading always running a no-op in pam_cert_auth against having to deal with another /etc/ config file. In particular, if we ever consider splitting this out into a separate package (which I'm currently not a fan of), we'll have to migrate this conffile, which is always a bit painful. @stefwalter, @allisonkarlitskaya : any opinions?

Note: I'm happy to add an extra check to the PAM module that ClientCertAuthentication is enabled in cockpit.conf. It feels a bit weird, but I suppose it can't hurt.

@martinpitt
Copy link
Member Author

After discussions with @stefwalter and @allisonkarlitskaya I took out the "separate PAM stack" commit again, as it would commit us more to the current architecture and cause possible config file troubles in the future, without giving us a big benefit right now. I also squashed in the fixups.

err (EXIT_FAILURE, "Failed to unlink just-created certificate file %s", fingerprint.str);
}

pthread_mutex_unlock (&certfile_mutex);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! This is indeed very gross. We need to fix this.

I think I'll just have the early failure (there's only one case) return immediately (which since we defer the PEM generation, means no allocation has been done at the time). I'll also rename the label to make this more obvious.

allisonkarlitskaya and others added 2 commits November 29, 2019 14:28
Bring back the export of client certificates for current connections
into the runtime directory. This got dropped in commit 7086711.

Export them to `$RUNTIME_DIRECTORY` under their peer certificate
fingerprint name. That can be derived from the systemd unit (cgroup) in
which cockpit-ws and cockpit-session run. Use flock() to implement a
reference count on each file.

Each exported certificate file will only last as long as there is an
active connection with that client certificate. The authentication stack
can thus rely on certificates belonging to a current Cockpit session --
at least the websocket connection will be open all the time.

If cockpit-tls itself quits, systemd will clean out the
RuntimeDirectory= for us, to avoid getting stale data.

Drop the restricted RuntimeDirectory permissions, so that they use the
default 0755. This allows the unprivileged cockpit-ws to check if there
is a client certificate, and thus if it should request authentication
with that.

Add an unit test that opens 20 parallel connections, to ensure that the
certificate refcounting works and nothing deadlocks:

 - The first variant of the test only waits until after the TLS
   handshake. It can happen that one of the connections did not yet
   complete the certificate exporting part before the test starts
   closing fds.  In this case, when we go to check to verify that the
   certificate file "still" exists, it might simply not yet have been
   created. Thus wait for it to appear.

 - The "alternate" variant uses a new client certificate to connect to
   an "alternate mode" of the socket activation server which, instead of
   running cockpit-ws, writes a "hello" message and pauses.  We use
   reception of this "hello" message as an indicator that the service
   has been fully initialised.
Add a new pam_cockpit_cert PAM module for TLS client certificate based
authentication. This uses sssd's API to map a TLS certificate to a
user [1]. Use the sd-bus API for making the D-Bus calls, instead of
gdbus (glib/gio has too many side effects, which should be avoided in a
setuid root program) or libdbus (which would be a new dependency).

Introduce a new `tls-cert` authorization type into
cockpit-ws/cockpit-session that gets enabled with the new
`ClientCertAuthentication` option in cockpit.conf. If this is enabled,
and cockpit-tls exports a current TLS client certificate,
pam_cockpit_cert maps the certificate to a user name.

Unfortunately the Chrome Devtools Protocol does not currently offer
client certificate import and selecting one for a page that requests a
certificate. Test this with curl instead, and describe in the comments
how to test this interactively.

This requires some more SELinux policy modifications. Apply them locally
in cockpit.spec until they land in Fedora/RHEL.

[1] https://www.freeipa.org/page/V4/User_Certificates

Fixes cockpit-project#8429
Related: https://bugzilla.redhat.com/show_bug.cgi?id=1678465
Jira: COCKPIT-368
Closes cockpit-project#11421
@martinpitt
Copy link
Member Author

The fedora/firefox failure is most likely fixed by PR #13221

@martinpitt martinpitt merged commit 60c5fb2 into cockpit-project:master Dec 2, 2019
@martinpitt martinpitt deleted the client-cert-auth branch December 2, 2019 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants