Skip to content

LG-7836 Automatically refresh USPS auth token when necessary#7808

Merged
eileen-nava merged 15 commits intomainfrom
em/7836-refresh-usps-auth-token
Feb 22, 2023
Merged

LG-7836 Automatically refresh USPS auth token when necessary#7808
eileen-nava merged 15 commits intomainfrom
em/7836-refresh-usps-auth-token

Conversation

@eileen-nava
Copy link
Contributor

@eileen-nava eileen-nava commented Feb 10, 2023

🎫 Ticket

Automatically refresh USPS auth token when necessary.

🛠 Summary of changes

  • Check the USPS auth token before every request to the USPS API. If the token is expired, refresh it.
  • The above change necessitated a lot of spec set-up changes.

📜 Testing Plan

Provide a checklist of steps to confirm the changes.

  • Deploy feature branch to Joy sandbox
  • Open a Rails console in the Joy sandbox. I used this command: aws-vault exec sandbox-power -- bin/ssm-instance -d rails-c asg-joy-idp
  • Ensure that the usps mock fallback is off. IdentityConfig.store.usps_mock_fallback should equal false.
  • Create an account, fail the attempt to verify online and get sent to in-person verification.
  • On the PO Search page, search for an address. Then, check on how long the USPS token has to expire. Rails.cache.redis.ttl(:usps_ippaas_api_auth_token)
  • Once Rails.cache.redis.ttl(:usps_ippaas_api_auth_token) shows that the token is expired, proceed through the IPP flow.

👀 Screenshots

I didn't include screenshots because this isn't a user-facing change.

Comment on lines +156 to +157
if Rails.cache.redis.ttl(AUTH_TOKEN_CACHE_KEY) != -2 && Rails.cache.redis.ttl(AUTH_TOKEN_CACHE_KEY) <= 0.5 # rubocop:disable Layout/LineLength
retrieve_token!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pull the ttl once? Just in case there's some weird condition that changes between calls?

Also can we move the 0.5 magic number into a constant?

Suggested change
if Rails.cache.redis.ttl(AUTH_TOKEN_CACHE_KEY) != -2 && Rails.cache.redis.ttl(AUTH_TOKEN_CACHE_KEY) <= 0.5 # rubocop:disable Layout/LineLength
retrieve_token!
key_ttl = Rails.cache.redis.ttl(AUTH_TOKEN_CACHE_KEY)
if key_ttl != -2 && key_ttl <= 0.5
retrieve_token!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. 👍🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what do you think about bumping the 0.5 seconds to maybe 5 seconds?

Copy link
Contributor Author

@eileen-nava eileen-nava Feb 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NavaTim Why 5 seconds? I'm fine with increasing the buffer, though I am curious about why you chose that number.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm specifically thinking here about an amount of time approximating the upper end of what it'd take to serve an average request (maybe 75th percentile or higher) - including backend calls to DB, cache, and a potentially slow USPS API. Part of the goal is to prevent requests from failing midway through because the token wasn't updated early enough. We would also need to add a little buffer for the token update process itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thank you. 🙏🏻

Comment on lines +741 to +743
allow(Rails).to receive(:cache).and_return(
ActiveSupport::Cache::RedisCacheStore.new(url: IdentityConfig.store.redis_throttle_url),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we repeat this bit a lot in these specs. Would it make sense to either move it up to a top level context (have it once) or make a helper method like use_redis_cache?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll look into this. 👍🏻

stub_request(
:post,
%r{/ivs-ippaas-api/IPPRest/resources/rest/requestEnrollmentCode},
).to_raise(Faraday::ForbiddenError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, would Faraday still return the same error if we had something like to_return(status: 403)

Suggested change
).to_raise(Faraday::ForbiddenError)
).to_return(status: 403, body: '')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's slightly different. Here's what I saw, using binding.pry, for the raise option.

 #<WebMock::RequestStub:0x000000011abda978
 @request_pattern=
  #<WebMock::RequestPattern:0x000000011abda928
   @body_pattern=nil,
   @headers_pattern=nil,
   @method_pattern=#<WebMock::MethodPattern:0x000000011abda8d8 @pattern=:post>,
   @uri_pattern=
    #<WebMock::URIRegexpPattern:0x000000011abda888
     @pattern=/\/ivs-ippaas-api\/IPPRest\/resources\/rest\/requestEnrollmentCode/,
     @query_params=nil>,
   @with_block=nil>,
 @responses_sequences=
  [#<WebMock::ResponsesSequence:0x000000011abda5b8
    @current_position=0,
    @responses=
     [#<WebMock::Response:0x000000011abda798
       @body=nil,
       @exception=#<Faraday::ForbiddenError #<Faraday::ForbiddenError: Exception from WebMock>>,
       @headers=nil,
       @should_timeout=nil,
       @status=nil>],
    @times_to_repeat=1>]>

Here's what I see with the return option.

#<WebMock::RequestStub:0x000000010e80a840
 @request_pattern=
  #<WebMock::RequestPattern:0x000000010e80a7f0
   @body_pattern=nil,
   @headers_pattern=nil,
   @method_pattern=#<WebMock::MethodPattern:0x000000010e80a7a0 @pattern=:post>,
   @uri_pattern=
    #<WebMock::URIRegexpPattern:0x000000010e80a750
     @pattern=/\/ivs-ippaas-api\/IPPRest\/resources\/rest\/requestEnrollmentCode/,
     @query_params=nil>,
   @with_block=nil>,
 @responses_sequences=
  [#<WebMock::ResponsesSequence:0x000000010e80a4d0
    @current_position=0,
    @responses=
     [#<WebMock::Response:0x000000010e80a660
       @body="",
       @exception=nil,
       @headers=nil,
       @should_timeout=nil,
       @status=[403, ""]>],
    @times_to_repeat=1>]>

Changing the stub does not change test behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for having the mock server return a response, and letting the adapter translate + throw that, but since it doesn't change the tests, its prob fine

@eileen-nava
Copy link
Contributor Author

@NavaTim I couldn't leave an inline comment because I didn't change this area of the code, but I wanted to address your feedback on this portion of proofer.rb in the retrieve_token! method

      Rails.cache.write(AUTH_TOKEN_CACHE_KEY, token, expires_at: expires_at)
      # If using a redis cache we have to manually set the expires_at. This is because we aren't
      # using a dedicated Redis cache and instead are just using our existing Redis server with
      # mixed usage patterns. Without this cache entries don't expire.
      # More at https://api.rubyonrails.org/classes/ActiveSupport/Cache/RedisCacheStore.html
      Rails.cache.try(:redis)&.expireat(AUTH_TOKEN_CACHE_KEY, expires_at.to_i)

I remember you suggested consolidating this into one call, rather than two calls to redis, because you were concerned that if one call passes and one call fails, we'd have an entry without expireat set on it. I tried to consolidate this into one call using the method below:
Rails.cache.try(:redis)&.setex(AUTH_TOKEN_CACHE_KEY, expires_at.to_i, token)

but when I did this the ttl for the key then equaled -1, which indicates that the redis value doesn’t have an associated expiration time.
As a result, I went back to using two commands. I think (as indicated by the comments in the code) because we’re using a redis cache we might have to do this in two commands, since we have to manually set the expireat. I’m interested to hear your thoughts on this. If there's a way to consolidate the two commands into one and I'm missing it, please let me know.

@eileen-nava eileen-nava marked this pull request as ready for review February 16, 2023 21:14
@eileen-nava eileen-nava requested review from a team and racingspider February 16, 2023 21:14
@NavaTim
Copy link
Contributor

NavaTim commented Feb 16, 2023

@NavaTim I couldn't leave an inline comment because I didn't change this area of the code, but I wanted to address your feedback on this portion of proofer.rb in the retrieve_token! method

Read more...
      Rails.cache.write(AUTH_TOKEN_CACHE_KEY, token, expires_at: expires_at)
      # If using a redis cache we have to manually set the expires_at. This is because we aren't
      # using a dedicated Redis cache and instead are just using our existing Redis server with
      # mixed usage patterns. Without this cache entries don't expire.
      # More at https://api.rubyonrails.org/classes/ActiveSupport/Cache/RedisCacheStore.html
      Rails.cache.try(:redis)&.expireat(AUTH_TOKEN_CACHE_KEY, expires_at.to_i)

I remember you suggested consolidating this into one call, rather than two calls to redis, because you were concerned that if one call passes and one call fails, we'd have an entry without expireat set on it. I tried to consolidate this into one call using the method below: Rails.cache.try(:redis)&.setex(AUTH_TOKEN_CACHE_KEY, expires_at.to_i, token)

but when I did this the ttl for the key then equaled -1, which indicates that the redis value doesn’t have an associated expiration time. As a result, I went back to using two commands. I think (as indicated by the comments in the code) because we’re using a redis cache we might have to do this in two commands, since we have to manually set the expireat. I’m interested to hear your thoughts on this. If there's a way to consolidate the two commands into one and I'm missing it, please let me know.

The important part here is ensuring that the token properly expires and gets renewed regardless of how many separate Redis calls we use. If we store the expiry as part of the value itself (in addition to, or instead of, the expiry on the entry) then we can address the scenario: What if the second call fails?

Furthermore if we directly use the expiry value (unlike the 403 fallback strategy), then we can anticipate that the token may expire by the time we use it and therefore avert the 403 altogether.

To be clear - with this approach we would pass in a hash/object instead of token to the original write method call. When we retrieve the value back from Redis, we would check the expiry value contained in that same object before attempting to use the token. This process doesn't require any use of Redis's built-in expiry (though it can be supplemented with that expiry).

@NavaTim
Copy link
Contributor

NavaTim commented Feb 16, 2023

@eileen-nava I took a closer look at the logic, and considering that the result of a failed second call would be repeating the token fetch up to a few times (since Redis would return -1 for the TTL)... it's probably fine for that to happen & your implementation is probably ok as-is (excepting the comments from Zach's review & my comment about fetching the token at 5 seconds instead of 0.5 seconds).

Comment on lines +426 to +428

expect(cache).to receive(:read).with(UspsInPersonProofing::Proofer::AUTH_TOKEN_CACHE_KEY)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the test does verify that the appropriate method calls happen, it seems like without and_call_original that this may not be returning a token. Additionally, it would seem that the test is not actually checking that the cached token is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I can't add and_call_original here because it is only available on a partial double, and cache is a pure test double.

Copy link
Contributor

@NavaTim NavaTim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test does appear to adequately verify that the cache was updated, which seems like the major part of this story. That said, I recommend adding the validation that the token is being used appropriately.

Copy link
Contributor

@zachmargolis zachmargolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

stub_request(
:post,
%r{/ivs-ippaas-api/IPPRest/resources/rest/requestEnrollmentCode},
).to_raise(Faraday::ForbiddenError)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a slight preference for having the mock server return a response, and letting the adapter translate + throw that, but since it doesn't change the tests, its prob fine

@eileen-nava eileen-nava merged commit 7735e9f into main Feb 22, 2023
@eileen-nava eileen-nava deleted the em/7836-refresh-usps-auth-token branch February 22, 2023 21:17
@solipet solipet mentioned this pull request Feb 23, 2023
solipet added a commit that referenced this pull request Feb 23, 2023
* LG-8796: Add recaptcha disclosure form at bottom of phone setup screen for new and existing accounts (#7827)

* changelog: Upcoming Features, Authentication, Add recaptcha disclosure form at bottom of phone setup screen

* styling fixes and dont hardcode login.gov

* hide and show other movile service

* add captcha submit button to add new phone

* add to hide grecaptcha-badge

* use recaptcha disclosure shared

* remove empty space

* lint recaptcha badge

* Skip devices database query when identifier is null (#7856)

changelog: Internal, Performance, Skip devices database query when identifier is null

* Removed analytics check from password visibility toggle spec. (#7860)

The test was flaky and the flag is unused, per Andrew Duthie.

[skip changelog]

* Removed use of Faker email creation in user factory primary email (#7861)

* Removed use of Faker email creation in user factory primary email

We suspect that using Faker email, rather than explicitly setting
sequential emails, was occasionally creating collisions and leading to
flaky test failures.

[skip changelog]

* Add 2023 SAML certs (#7862)

(symlink 2022 ones to have non-blank files there)

changelog: Internal, Certificates, 2023 SAML Certificate rotation

* LG-8215 Update Event: IDV Reproof using Address verify by Mail (#7859)

changelog: Internal, Attempts API, Update in tracking event

* LG-8604 - Optimize s3 by streaming data w/ feature flag (#7843)

* WIP - optimize s3 by streaming data

changelog: Internal, Attempts API, Scalability Optimization

* shifted a line - rubocop apparently wasn't picky about this

* added feature flag.

updated spec to reflect nested conditionals

* cleaned up some comments.

* broke out s3 response code into its own method.

* cleaned up code based on feedback

* updated default buffer size and made some clarifying changes based on feedback

changelog: Internal, Attempts Api, Optimization: S3 Streaming

* cleaned up more test descriptions.

changelog: Internal, Attempts Api, Feature: S3 Streaming

* Add logging ahead of removing Analytics logic (#7864)

* Add Rails.logger.info logging near proposed call sites

changelog: Internal, Logging, Add logging ahead of streamlining analytics events

* Remove unused Authentication Presenter data  (#7868)

* Remove unused Authentication Presenter data

changelog: Internal, Maintenance, Remove unused methods and variables

* remove unused pivcac email data

* remove more unused methods

* Disable notify for good_job (#7870)

changelog: Internal, Background Jobs, Update GoodJob and disable NOTIFY

* LG-8423 log cancel analytics from barcode page (#7838)

* analytics logged when user cancels from barcode pg

* update event and add spec for for event in sessions controller

* event and spec added for cancellations controller

* changelog: Internal, Analytics Events, add cancel from barcode page analytics event

* fix lint issue w/ spec

* change step to barcode to make unique

* use cancellation go back w/ extra attributes when user comes from barcode pg

* remove new event and update session controller

* update spec

* change enrollment key

* update cancel spec

* check for enrollment and merge extra attributes and update tests

* use pending enrollment and remove sp as it is redundant

* update spec to create pending enrollment

* LG-8901: New Hybrid Handoff (upload step) with feature flag (#7849)

* Adding initial fields and new icons
* Deleting useless hybrid handoff mobile tests
* Pulling in send_link step functionality to upload_step
* Adding feature flag for new combined hybrid handoff step
* Adding additional checking for 50/50 conditions

-- What
As @matthinz pointed out, the current code does not account well for
the following 50/50 states:
1. User is shown the new combined upload view, but submits the phone
number to an instance with the feature flag off;
2. User is shown the original upload view, but clicks the phone button
submitting to an instance where the feature flag is now on

We handle this case by adding a request param _only_ to the form
rendered by the new combined view. That view is only shown to users
who have the feature flag enabled. The upload step will now check for
that param -- called :combined -- and can determine regardless of the
current state of the feature flag, that the user actually has
submitted a phone number to the upload step itself, and that the send
link step should be skipped.

Co-authored-by: Sonia Connolly <sonia.connolly@gsa.gov>
Co-authored-by: Andrew Duthie <andrew.duthie@gsa.gov>

changelog: User-Facing Improvements, Hybrid Handoff Screen, Combining
upload and send link steps into one screen

* Unpack a dynamic send to make code easier to trace (#7869)

* Remove now-unused constant

changelog: Internal, Refactor, Small update to remove dynamic send in 2FA mixin

* LG-8875 / LG-8877: Try to avoid race condition in document-capture-welcome (#7842)

* "Convert" document-capture-welcome to typescript

(It was already valid TS)

* Rework camera check to block form submission

Don't let the form submit until the promise we use to check for the presence / nonpresence of a camera has resolved.

* Disable submit buttons while we wait

* TEMP: Allow injecting delay into hasCamera()

Set HAS_CAMERA_DELAY_MS in localStorage to configure.

* Update app/javascript/packs/document-capture-welcome.ts

Co-authored-by: Andrew Duthie <andrew.duthie@gsa.gov>

* Use SpinnerButtonComponent for welcome/agreement

* Simplify cameraCheckPromise usage

Eliminate additional local variable and just race the promises

* Wrap agreement spinner button in <div>

Adding margin-top-4 to button does not get applied to the spinner dots

* changelog: User-Facing Improvements, Identity verification, help prevent mobile users from ending up in hybrid handoff flow

* Log event from frontend for mobile/camera checks

Ideally, we want to see if this code actually fixes anything.

(Relates to LG-8877)

Co-authored-by: Eric Gade <eric.gade@gsa.gov>

* Refactor perf measuring slightly

Pull out to a function that just augments the promise with perf data

* Add spin_on_click to SpinnerButtonComponent

Allow setting spin-on-click as needed.

* Toggle spinner on form submission

Just in case there is a form validation issue, don't start spinning until we're sure we can actually make the submission.

* await trackEvent that could occur during form submission

* Add idv_mobile_device_and_camera_check to AnalyticsEvents

Document the event and make it so it's not prefixed with "Frontend:"

* Revert "TEMP: Allow injecting delay into hasCamera()"

This reverts commit 96cbc3e.

* Don't use performance.measure()

Complicated API, don't need it for what we're doing.

---------

Co-authored-by: Andrew Duthie <andrew.duthie@gsa.gov>
Co-authored-by: Eric Gade <eric.gade@gsa.gov>

* LG-7836 Automatically refresh USPS auth token when necessary (#7808)

* implement retry behavior with repetitive code

* remove begin/rescue blocks and get failing tests

* add retry options to faraday. tests still fail.

* refactor implementation & spec

* changelog:Bug Fixes, In-person proofing, refresh USPS auth token when request fails

* attempt broader retry behavior. tests still fail.

* attempt to refresh token before pinging USPS api. more tests fail

* got first test to pass, need to get other expired token tests passing

* get all expired token tests to pass

* DRY up proofer test setup

* add buffer to expires_at conditional

* fix broken tests by adding redis cache setup

* fix broken enrollment helper tests by adding caching setup

* respond to feedback

* remove no-op

* Fixing missed grid gap sizing (follow up to LG-8901) (#7873)

* Fixing missed grid gap sizing

h/t @aduth

changelog: User-Facing Improvements, Hybrid Handoff Screen, Combining
upload and send link steps into one screen

* LG-9025 Go to new SsnController from hybrid flow when feature flag is set (#7875)

From the DocumentCaptureStep, we only want to redirect to the new SsnController if we are in the DocAuthFlow. This step is reused from the Hybrid Flow, and we don't want to go to the SsnController on mobile. In hybrid flow, we don't return to the DocumentCaptureStep, so redirect to SsnController from LinkSentStep when the feature flag is true as well.

Duplicate the hybrid flow test with doc_auth_ssn_controller_enabled flag set to true and make sure it passes. Note that on my machine even on main without the feature flag I have to set a binding.pry and delay the mobile browser part of the test to get it to pass.

changelog: Internal, refactor, fix hybrid flow with new SsnController (feature flagged)

Co-authored-by: eric-gade <eric.gade@gsa.gov>

---------

Co-authored-by: Malick Diarra <malick.diarra@gsa.gov>
Co-authored-by: Mitchell Henke <mitchell.henke@gsa.gov>
Co-authored-by: John Maxwell <john.maxwell@gsa.gov>
Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com>
Co-authored-by: Osman Latif <109746710+olatifflexion@users.noreply.github.com>
Co-authored-by: Rwolfe-Nava <87499456+Rwolfe-Nava@users.noreply.github.com>
Co-authored-by: Shannon A <20867088+svalexander@users.noreply.github.com>
Co-authored-by: Eric Gade <105373963+eric-gade@users.noreply.github.com>
Co-authored-by: Matt Hinz <matt.hinz@gsa.gov>
Co-authored-by: Andrew Duthie <andrew.duthie@gsa.gov>
Co-authored-by: Eric Gade <eric.gade@gsa.gov>
Co-authored-by: eileen-nava <80347702+eileen-nava@users.noreply.github.com>
Co-authored-by: Sonia Connolly <sonia.connolly@gsa.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants