Skip to content

Avoid database query associated with active session polling#7966

Merged
aduth merged 13 commits intomainfrom
aduth-try-avoid-poll-db-query
Jun 9, 2023
Merged

Avoid database query associated with active session polling#7966
aduth merged 13 commits intomainfrom
aduth-try-avoid-poll-db-query

Conversation

@aduth
Copy link
Contributor

@aduth aduth commented Mar 10, 2023

🛠 Summary of changes

This introduces a uses the new Api::SessionsController to manage active session polling behavior.

Proposed benefits:

  • Avoids database query resulting from testing against current_user, instead using internal Warden session values
  • Move JSON API behaviors out of existing SessionsController to own dedicated controller with conventionally named methods
  • Reduce number of redundant values tracked in session (remove session_expires_at, pinged_at)
  • Move flash behavior for "Sign In" form timeout out of ApplicationController and consolidate with session timeout behavior
  • Simplify JavaScript for session ping behavior
  • Eliminate redundancy in ping response structure (return single expiration time rather than both expiration time as well as remaining seconds)
  • Avoid logging "Session Ping Error" NewRelic page actions
    • It's expected that those page actions are a result of ping responses being returned as HTML, expected to be resolved as part of these changes

📜 Testing Plan

  • Check to ensure there are no regressions in the behavior of session timeout, both when signed in and when "partially" signed in (after entering email and password but before MFA).
    • "Keep me signed in" dismisses modal and extends session
    • "Sign me out" logs out
    • Timer counts down to 0 and then redirects

It's easier to test by setting a very low session timeout in local config/application.yml:

session_timeout_in_minutes: 1

@aduth aduth changed the title Avoid database query associated with heartbeat poll Avoid database query associated with active session polling Mar 10, 2023
Copy link
Contributor

@mitchellhenke mitchellhenke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about backwards compatibility, but this is great!

@aduth aduth force-pushed the aduth-try-avoid-poll-db-query branch 2 times, most recently from 76677e8 to 5e529a7 Compare March 20, 2023 14:44
@aduth aduth marked this pull request as ready for review March 20, 2023 18:34
@aduth
Copy link
Contributor Author

aduth commented Mar 21, 2023

Maybe for a separate pull request, but: I'm wondering if we're polling more aggressively than we should need to. At most, it seems like could ping only just prior to showing the modal (to confirm that the session is in-fact soon to expire) and just as the session is expiring† (to avoid signing out in case the session was extended in another tab). Polling while the modal is active could be nice to automatically dismiss the modal if the session was extended in another tab, but that seems like a nice-to-have and an edge-case, at the cost of at least 3-4 extra network requests per session timing out.

† In the case of expiring, we also send two separate requests in quick succession, one to check if the session is active, and then another to trigger the sign-out. Maybe we could do this at the same time? On the other hand, I'm not sure I like the idea of destroying the session in a GET request (i.e. part of the active check), but that is what we do today 😅 🙈

@aduth aduth marked this pull request as draft March 22, 2023 14:39
@aduth aduth force-pushed the aduth-try-avoid-poll-db-query branch 3 times, most recently from b590fe3 to f051b01 Compare March 28, 2023 13:21
aduth added a commit that referenced this pull request Mar 28, 2023
The CSRF won't be updated until after the request finishes, even if the modal will be disappeared immediately
aduth added a commit that referenced this pull request Mar 29, 2023
…8084)

* Remove redundant "remaining" value from timeout API

changelog: Internal, Session Timeout, Remove redundant timeout value from session timeout API

* Revert session API remaining removal

Temporary for first deploy, see https://github.com/18F/identity-idp/pull/8084/files#r1150918192

* Optimistically hide modal on session continue

1. Fixes specs
2. Better user experience, don't have to wait for request to finish before modal is closed, thus making app feel more responsive
3. The previous result handling with `success` would cause a second polling interval to begin

* Bring over rest of spec changes from #7966

The CSRF won't be updated until after the request finishes, even if the modal will be disappeared immediately
@aduth aduth force-pushed the aduth-try-avoid-poll-db-query branch from f051b01 to 567001a Compare March 30, 2023 14:10
@aduth aduth force-pushed the aduth-try-avoid-poll-db-query branch from 567001a to 13e95da Compare April 25, 2023 16:19
@aduth
Copy link
Contributor Author

aduth commented Apr 25, 2023

I rebased to resolve conflicts and was working to try to get this over the line. I encountered an issue that will need to be investigated further. There seems to be some base validation that's redirecting requests to DELETE /api/internal/sessions when the session is no longer live, and I was surprised to find that the user's Warden session is reset if they visit the homepage before finishing sign-in.

Steps to reproduce:

  1. Set config values very low like session_timeout_in_minutes: 2, session_check_delay: 5 session_check_frequency: 30
  2. Go to http://localhost:3000
  3. Sign in
  4. When prompted for MFA, go to http://localhost:3000 in a separate tab
  5. Wait for next polling request

Expected: Either (a) the initial session is still live, or (b) if it's not live, the user is redirected
Actual: The session is not live, but the user is not redirected as expected. Instead, the DELETE /api/internal/sessions is being internally redirected to the MFA page

@aduth
Copy link
Contributor Author

aduth commented May 26, 2023

I was surprised to find that the user's Warden session is reset if they visit the homepage before finishing sign-in.

This does seem to happen on main as well, for what it's worth. Still seems excessive to terminate the pending sign-in if they visit the homepage in a separate tab 🤷

@aduth
Copy link
Contributor Author

aduth commented May 26, 2023

I encountered an issue that will need to be investigated further. There seems to be some base validation that's redirecting requests to DELETE /api/internal/sessions when the session is no longer live

What's happening here is we're trying to destroy the session from the JavaScript as a DELETE request after the session is already ended, or at least after the CSRF token we include in the request is no longer valid, so it's the ApplicationController#invalid_auth_token kicking in to cause the redirect.

Initially I considered overriding the CSRF exception handling in the controller, but there's probably no way to retain the request_id in doing that. So now I'm thinking maybe it's best to move away from having a destroy action at all in this API, and instead have the timed-out redirect URL known up-front, like what we had previously.

The only challenge with this would be to find an equivalent behavior to log analytics.session_timed_out. Maybe SessionsController could handle that on the receiving end? i.e. GET /?timeout=session

@aduth
Copy link
Contributor Author

aduth commented May 26, 2023

but there's probably no way to retain the request_id in doing that

I guess the request_id is lost at this point anyways if the session has expired, so not really a big deal to lose this.

Edit: It may be lost as far as not being available in the session, but it should still be valid, right?

@aduth aduth force-pushed the aduth-try-avoid-poll-db-query branch from 13e95da to 776f1cf Compare May 26, 2023 18:49
@aduth aduth marked this pull request as ready for review May 26, 2023 19:49
@aduth
Copy link
Contributor Author

aduth commented May 26, 2023

Marking this ready for review again. I tried to simplify the diff to limit the impact. Separately, I'd like to iteratively work to slice up session-timeout-ping.ts toward some of our newer code conventions that'll always hopefully make it more testable.

Copy link
Contributor

@zachmargolis zachmargolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the specs call this with string 'form' instead, should we do that for consistency? (I know it gets converted to a string when we parse anyways)

Suggested change
timeout: :form,
timeout: 'form',

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need the html_safe on there? It looks like the only use of this method is for a data- attribute which I expect would be "smarter" about HTML escaping already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll check this on Monday, this might be one of the things we could remove altogether as a post-deploy cleanup, once we move everything off the current session API endpoints.

I should probably get that follow-on pull request prepped anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this and the prior comment, I think I might leave this alone for now. It's still needed, but ties into some bigger problems with the auto-refresh that I'd like to sort out (related Slack discussion). At the very least, it might make sense to just have the JavaScript be responsible for navigating to ?timeout=form, rather than passing a full URL path from the server.

@aduth
Copy link
Contributor Author

aduth commented May 30, 2023

Post-deploy clean-up pull request at #8507

@aduth aduth force-pushed the aduth-try-avoid-poll-db-query branch from 6369488 to 9b3f3d3 Compare June 9, 2023 13:42
@aduth aduth merged commit c1d9a4c into main Jun 9, 2023
@aduth aduth deleted the aduth-try-avoid-poll-db-query branch June 9, 2023 16:05
@jmhooper jmhooper mentioned this pull request Jun 13, 2023
@aduth
Copy link
Contributor Author

aduth commented Jun 14, 2023

I've been monitoring this in production, and it seems like it has a small improvement over the legacy sessions endpoint in response time (-6% average, -15% median), but notably I observed in the breakdown that a database load is still happening for the user, which... was one of the main motivators for this work in the first place. I'll plan to find some time to troubleshoot why that's happening.

It's getting into diminishing returns of value, since the endpoint is already quite fast, and the database load doesn't contribute greatly to the overall response time. It used to be a very highly trafficked endpoint (and still is overall), but we might find more value in other optimizations like preventing some of the intermediate polling in the first place if it's not necessary.

@aduth
Copy link
Contributor Author

aduth commented Jun 30, 2023

I observed in the breakdown that a database load is still happening for the user, which... was one of the main motivators for this work in the first place. I'll plan to find some time to troubleshoot why that's happening.

Follow-up: I traced this back to ApplicationController#append_info_to_payload, where we include the user's UUID in request logging. I don't know that there'll be any way around this assuming we want to have that detail.

Curious if we could skip a full user load, I dug into what Devise/Warden store in the session, and it looks like it's the user ID (session['warden.user.user.key'].first.first). If we really wanted to avoid a user load, we could consider switching from UUID to ID in the logs (or maybe vice-versa, try to find a way to force the Warden session to track users by UUID instead of ID), but I dunno that we'd have the appetite for that.

@aduth
Copy link
Contributor Author

aduth commented Jul 3, 2023

If we really wanted to avoid a user load, we could consider switching from UUID to ID in the logs (or maybe vice-versa, try to find a way to force the Warden session to track users by UUID instead of ID)

For posterity: A bit of extra discussion on these points in Slack at https://gsa-tts.slack.com/archives/C0NGESUN5/p1688146380110749

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants