Skip to content

[LG-501] Only write to phone configuration table#2478

Merged
jgsmith-usds merged 2 commits intomasterfrom
jgs/lg-501-only-write-to-phone-configurations-table
Sep 7, 2018
Merged

[LG-501] Only write to phone configuration table#2478
jgsmith-usds merged 2 commits intomasterfrom
jgs/lg-501-only-write-to-phone-configurations-table

Conversation

@jgsmith-usds
Copy link
Copy Markdown
Contributor

Why:
Before we can support multiple phones, we have to stop writing
to the user.phone and related columns.

How:
Go through the code and make sure we never write/update the
user.phone and user.phone_confirmed_at columns. For now,
we leave some of the user.otp_delivery_preference updates
as they are since they look like they are made without a phone
number.

Hi! Before submitting your PR for review, and/or before merging it, please
go through the checklists below. These represent the more critical elements
of our code quality guidelines. The rest of the list can be found in
CONTRIBUTING.md

Controllers

  • When adding a new controller that requires the user to be fully
    authenticated, make sure to add before_action :confirm_two_factor_authenticated
    as the first callback.

Database

  • Unsafe migrations are implemented over several PRs and over several
    deploys to avoid production errors. The strong_migrations gem
    will warn you about unsafe migrations and has great step-by-step instructions
    for various scenarios.

  • Indexes were added if necessary. This article provides a good overview
    of indexes in Rails.

  • Verified that the changes don't affect other apps (such as the dashboard)

  • When relevant, a rake task is created to populate the necessary DB columns
    in the various environments right before deploying, taking into account the users
    who might not have interacted with this column yet (such as users who have not
    set a password yet)

  • Migrations against existing tables have been tested against a copy of the
    production database. See LG-228 Make migrations safer and more resilient #2127 for an example when a migration caused deployment
    issues. In that case, all the migration did was add a new column and an index to
    the Users table, which might seem innocuous.

Encryption

  • The changes are compatible with data that was encrypted with the old code.

Routes

  • GET requests are not vulnerable to CSRF attacks (i.e. they don't change
    state or result in destructive behavior).

Session

  • When adding user data to the session, use the user_session helper
    instead of the session helper so the data does not persist beyond the user's
    session.

Testing

  • Tests added for this feature/bug
  • Prefer feature/integration specs over controller specs
  • When adding code that reads data, write tests for nil values, empty strings,
    and invalid inputs.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmhooper This is what I finally came to. As far as I can tell, the test that was failing (spec/features/users/sign_in_spec.rb line 244) was testing that the email attribute couldn't be decrypted and didn't have anything to do with this line. When I ran the test against the code in master, the user didn't have any profile and this method never got called with this test.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be due to factory_bot stuff. I ran into a similar situation in the follow-on PR and found a fix. I'll test it later on this PR and roll it in here if it lets me get rid of this particular change.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can totally believe it is something weird with factory bot.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm looking at this and can't seem to figure out how it works at all without this conditional there.

@jgsmith-usds jgsmith-usds force-pushed the jgs/lg-501-only-write-to-phone-configurations-table branch from a11e287 to 4ec2c09 Compare September 4, 2018 15:54
@jgsmith-usds jgsmith-usds force-pushed the jgs/lg-501-only-write-to-phone-configurations-table branch 5 times, most recently from 74e3e5f to 18bd83f Compare September 5, 2018 17:34
**Why**:
Before we can support multiple phones, we have to stop writing
to the `user.phone` and related columns.

**How**:
Go through the code and make sure we never write/update the
`user.phone` and `user.phone_confirmed_at` columns. For now,
we leave some of the `user.otp_delivery_preference` updates
as they are since they look like they are made without a phone
number.
@jgsmith-usds jgsmith-usds force-pushed the jgs/lg-501-only-write-to-phone-configurations-table branch from 18bd83f to a0e1cb9 Compare September 5, 2018 20:45
email_fingerprint = create_fingerprint(email)
find_by(email_fingerprint: email_fingerprint)
resource = find_by(email_fingerprint: email_fingerprint)
resource if resource&.email == email
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an odd change. Why is it necessary, and how is this related to phone configurations?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a result of the changes to the UpdateUser service. With this change, we're not decrypting the phone number there (good) but that means we aren't raising an attribute encryption error if there's an issue decrypting attributes. So, a user can sign in without needing any attributes decrypted and a seemingly unrelated 500 pops up further down the line.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we aren't raising an error when trying to encrypt the phone when updating the user? Why would we need to decrypt attributes when setting them?

It seems strange to me that we are looking for a match twice. In what case would there be a match for the email fingerprint but resource.email would not equal email? This new code is 3.62x slower than the old code, and this is a heavily used method in the app.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Digging into this, I can confirm this code is not needed. This is in fact masking a bug in our SessionsController. I will open a new PR and explain everything there.

user.create_phone_configuration(
phone: mfa_phone || phone,
confirmed_at: user.phone_confirmed_at,
confirmed_at: user.Time.zone.now,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean Time.zone.now without user. before it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh. Good catch.

@jgsmith-usds jgsmith-usds merged commit 8fecf4a into master Sep 7, 2018
@jgsmith-usds jgsmith-usds deleted the jgs/lg-501-only-write-to-phone-configurations-table branch September 7, 2018 15:01
@monfresh monfresh mentioned this pull request Sep 7, 2018
12 tasks
monfresh added a commit that referenced this pull request Sep 7, 2018
**Why**: These changes were made as part of #2478 in order to make the
test in `spec/features/users/sign_in_spec.rb:244` pass, but they
masked a bug in `Users::SessionsController`. Moreover, they made the
heavily used `find_with_email` method almost 4 times slower. The goal of
this particular test is to make sure that the app fails loudly if we
mess up the list of keys during key rotation. Preferably, we want to
fail as early as possible.

Here is the flow that explains why this test was passing before the
changes in #2478:

1. User signs in with email and password
2. `Users::SessionsController#create` is called
3. `track_authentication_attempt(auth_params[:email])` is called
4. `user_signed_in_and_not_locked_out?(user)` is called
5. `return false unless current_user` is called
6. Devise calls `current_user`, then Warden calls `UpdateUser` in
`config/initializers/session_limitable.rb:9`, which raises
`Encryption::EncryptionError` because it is trying to decrypt the
`phone` encryptable attribute.

With the changes in #2478, `UpdateUser` no longer tries to decrypt the
phone, which is a good thing. The flow then continues as follows:

7. `User#need_two_factor_authentication?` is called
8. `two_factor_enabled?` is called and returns true right after it sees
that `phone_configuration&.mfa_enabled?` is true.

Note that so far, no encrypted attributes have been accessed.

9. After a few more calls, `cache_active_profile` is reached and
`cacher.save(auth_params[:password], profile)` is called
10. Within `Pii::Cacher#save`, `stale_attributes?` is called.
11. `user.phone_configuration&.stale_encrypted_phone?` is called, which
which raises `Encryption::EncryptionError`.

However, `Users::SessionsController#cache_active_profile` rescues
this error, and then `profile.deactivate(:encryption_error)` is called,
which raises an error because `profile` is `nil`.

The reason why we never saw this bug in `SessionsController` is because
we haven't had problems rotating keys, and because so far, up until the
changes in #2478, `Encryption::EncryptionError` was raised before
`cache_active_profile` was reached. For example, before we introduced
the PhoneConfiguration table, the error was raised early via
`two_factor_enabled?`, which accessed the `phone` encryptable attribute.

Similarly, if you change the spec to use a user
`:with_authentication_app` instead of `:signed_up`, the test will pass
because `two_factor_enabled?` will call `totp_enabled?`, which will
try to decrypt the `otp_secret_key`.

To make the test pass with an MFA-enabled user, we can wrap the 2 lines
(in `cache_active_profile`) after the rescue in an `if profile` block,
which we should do anyways. This will then allow the user to sign in,
and will then raise the `Encryption::EncryptionError` on the 2FA page
when it tries to decrypt the phone number.

Ideally, we want to fail as early as possible, but with the current
design of `cache_active_profile`, that's not possible because it is
coupling actions that only apply to verified users with actions
that apply to all users when keys are rotated.

In a follow-up PR, I will attempt to extract the key rotation so that
decryption attempts fail early and are not rescued.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants