Skip to content

Handle StandardError without response in GetUspsProofingResultsJob#7049

Merged
tomas-nava merged 1 commit intomainfrom
tomas/handle-usps-proofing-job-error-without-response
Sep 29, 2022
Merged

Handle StandardError without response in GetUspsProofingResultsJob#7049
tomas-nava merged 1 commit intomainfrom
tomas/handle-usps-proofing-job-error-without-response

Conversation

@tomas-nava
Copy link
Contributor

🛠 Summary of changes

Fixes a bug where certain errors encountered while checking for enrollment status updates were not being handled. See this discussion in Slack.

changelog: Bug Fixes, In-person proofing, Improve job error handling
@tomas-nava tomas-nava requested a review from NavaTim September 29, 2022 06:36
rescue Faraday::BadRequestError => err
handle_bad_request_error(err, enrollment)
rescue StandardError => err
NewRelic::Agent.notice_error(err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These errors are unexpected, so I'm explicitly passing them to NewRelic so we can investigate them.

Copy link
Contributor

@aduth aduth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely clear on the scenarios we're accounting for, but based on my understanding of the exception, I think this should address it as a fix for today's release.

end

def handle_standard_error(err, enrollment)
response_attributes = if err.respond_to?(:response)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand when a StandardError would ever respond to the response method? Seems to me that may have been the issue in the first place? I also don't see any test cases which exercise this.

Would an alternative be to exclude **response_attributes from the analytics logged in this method, and keep the addition of the NewRelic agent call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite my prior comment, based on the errors being logged in NewRelic, it would seem that the issue isn't so much whether error responds to response, but that response is explicitly nil when response_analytics_attributes tries to pick values from the response†. I think this would be accounted for in the revisions proposed here (checking response.present?), but then it's not clear what scenarios we're expecting or not expecting error.response to exist at all.


† The reason being that the exceptions are occurring inside response_analytics_attributes (undefined method '[]' for nil:NilClass), whereas if error didn't respond to the response, I would expect the exception to occur in this method as undefined method 'response'.

Copy link
Contributor

@aduth aduth Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so digging a bit more, I now understand that the errors we're handling here are other Faraday exceptions besides the Faraday::BadRequestError we're currently handling, such as Faraday::ServerError (related test case), which extend StandardError and thus are caught in this path and respond to response.

I think that helps clarify what's actually happening, though (separately) we may want to revisit how we're dealing with Faraday-specific errors, if we're handling some but not all explicitly.

For example, every Faraday error extends Faraday::Error, so maybe we should be rescuing the base class as a catch-all "network error" handling.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that would be nice for observability too. I expect we'll see occasional Faraday errors and it won't be out of the ordinary. Non-Faraday standard errors should probably be looked at each time though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket to address this thread: https://cm-jira.usa.gov/browse/LG-7706

Copy link
Contributor

@sheldon-b sheldon-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, this should fix the issue as I understand it

I think we'll want a follow-up ticket to tidy things up and add test cases for timeout errors, server errors, etc. which I expect are the sorts of errors causing the bug


def handle_standard_error(err, enrollment)
response_attributes = if err.respond_to?(:response)
response_analytics_attributes(err.response)
Copy link
Contributor

@aduth aduth Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Non-blocking but something to consider fixing up:)

The properties that we care about in response_analytics_attributes would come from the body property of the response (err.response[:body]), not the response directly, though in the case of the error, I assume we won't have those properties anyways, so ultimately it makes no difference.

I'd suggest we either audit the responses we're passing to response_analytics_attributes to make sure we're always sending the response body, or (as in a prior comment) avoid calling response_analytics_attributes altogether in error paths where we're not expecting useful response properties to log.

Copy link
Contributor

@aduth aduth Sep 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, if we may be expecting responseMessage in the server error response (as implied by some existing test cases), we might actually want to fix this up. (Edit: Still non-blocking as in it won't cause any issues, but (a) it'll be inaccurate with response_present: true despite an empty body and (b) we won't be logging the responseMessage.)

Suggested change
response_analytics_attributes(err.response)
response_analytics_attributes(err.response&.[](:body))
diff --git a/app/jobs/get_usps_proofing_results_job.rb b/app/jobs/get_usps_proofing_results_job.rb
index 36807bab8..dc18dd742 100644
--- a/app/jobs/get_usps_proofing_results_job.rb
+++ b/app/jobs/get_usps_proofing_results_job.rb
@@ -150,7 +150,7 @@ class GetUspsProofingResultsJob < ApplicationJob
 
   def handle_standard_error(err, enrollment)
     response_attributes = if err.respond_to?(:response)
-                            response_analytics_attributes(err.response)
+                            response_analytics_attributes(err.response&.[](:body))
                           else
                             response_analytics_attributes(nil)
                           end
diff --git a/app/services/usps_in_person_proofing/mock/fixtures.rb b/app/services/usps_in_person_proofing/mock/fixtures.rb
index 515ee8945..39cc0b5b7 100644
--- a/app/services/usps_in_person_proofing/mock/fixtures.rb
+++ b/app/services/usps_in_person_proofing/mock/fixtures.rb
@@ -25,8 +25,8 @@ module UspsInPersonProofing
         load_response_fixture('request_enroll_failed_response.json')
       end
 
-      def self.request_enroll_internal_failure_response
-        load_response_fixture('request_enroll_internal_failure_response.json')
+      def self.request_internal_failure_response
+        load_response_fixture('request_internal_failure_response.json')
       end
 
       def self.request_enroll_invalid_response
diff --git a/app/services/usps_in_person_proofing/mock/proofer.rb b/app/services/usps_in_person_proofing/mock/proofer.rb
index 8005697a1..16822cb60 100644
--- a/app/services/usps_in_person_proofing/mock/proofer.rb
+++ b/app/services/usps_in_person_proofing/mock/proofer.rb
@@ -13,7 +13,7 @@ module UspsInPersonProofing
           raise Faraday::BadRequestError.new('Bad request error', response)
         when 'usps server error'
           # usps 500 response
-          body = JSON.parse(Fixtures.request_enroll_internal_failure_response)
+          body = JSON.parse(Fixtures.request_internal_failure_response)
           response = { body: body, status: 500 }
           raise Faraday::ServerError.new('Internal server error', response)
         when 'usps invalid response'
diff --git a/app/services/usps_in_person_proofing/mock/responses/request_enroll_internal_failure_response.json b/app/services/usps_in_person_proofing/mock/responses/request_internal_failure_response.json
similarity index 100%
rename from app/services/usps_in_person_proofing/mock/responses/request_enroll_internal_failure_response.json
rename to app/services/usps_in_person_proofing/mock/responses/request_internal_failure_response.json
diff --git a/spec/jobs/get_usps_proofing_results_job_spec.rb b/spec/jobs/get_usps_proofing_results_job_spec.rb
index 6f5425beb..015f9d00e 100644
--- a/spec/jobs/get_usps_proofing_results_job_spec.rb
+++ b/spec/jobs/get_usps_proofing_results_job_spec.rb
@@ -56,9 +56,7 @@ RSpec.shared_examples 'enrollment with a status update' do |passed:, status:, re
   end
 end
 
-RSpec.shared_examples 'enrollment encountering an exception' do |exception_class: nil,
-                                                                exception_message: nil,
-                                                                reason: 'Request exception'|
+RSpec.shared_examples 'enrollment encountering an exception' do |**event_attributes|
   it 'logs an error message and leaves the enrollment and profile pending' do
     job.perform(Time.zone.now)
     pending_enrollment.reload
@@ -67,11 +65,9 @@ RSpec.shared_examples 'enrollment encountering an exception' do |exception_class
     expect(pending_enrollment.profile.active).to eq(false)
     expect(job_analytics).to have_logged_event(
       'GetUspsProofingResultsJob: Exception raised',
-      reason: reason,
       enrollment_id: pending_enrollment.id,
       enrollment_code: pending_enrollment.enrollment_code,
-      exception_class: exception_class,
-      exception_message: exception_message,
+      **event_attributes,
     )
   end
 
@@ -516,13 +512,14 @@ RSpec.describe GetUspsProofingResultsJob do
 
       context 'when USPS returns a 5xx status code' do
         before(:each) do
-          stub_request_proofing_results_with_responses({ status: 500 })
+          stub_request_proofing_results_internal_failure
         end
 
         it_behaves_like(
           'enrollment encountering an exception',
           exception_class: 'Faraday::ServerError',
           exception_message: 'the server responded with status 500',
+          response_message: 'An internal error occurred processing the request',
         )
       end
 
diff --git a/spec/support/usps_ipp_helper.rb b/spec/support/usps_ipp_helper.rb
index 22286669e..ea606e577 100644
--- a/spec/support/usps_ipp_helper.rb
+++ b/spec/support/usps_ipp_helper.rb
@@ -25,8 +25,7 @@ module UspsIppHelper
 
   def stub_request_enroll_internal_failure_response
     stub_request(:post, %r{/ivs-ippaas-api/IPPRest/resources/rest/optInIPPApplicant}).to_return(
-      status: 500, body: UspsInPersonProofing::Mock::Fixtures.
-        request_enroll_internal_failure_response
+      status: 500, body: UspsInPersonProofing::Mock::Fixtures.request_internal_failure_response,
     )
   end
 
@@ -42,6 +41,12 @@ module UspsIppHelper
     )
   end
 
+  def stub_request_proofing_results_internal_failure
+    stub_request(:post, %r{/ivs-ippaas-api/IPPRest/resources/rest/getProofingResults}).to_return(
+      status: 500, body: UspsInPersonProofing::Mock::Fixtures.request_internal_failure_response,
+    )
+  end
+
   def request_expired_proofing_results_args
     {
       status: 400, body: UspsInPersonProofing::Mock::Fixtures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think there's some confusion stemming from our Faraday middlewares which, for any <400 status code, will automatically parse the response body and return it as a hash. For any >=400 status code Faraday will raise an error and we'll have to explicitly access the body buried in the response (eg err.response[:body])

For example this is the response for a 2xx response status code:

{"status"=>"In-person passed",
 "proofingPostOffice"=>"WILKES BARRE",
 "proofingCity"=>"WILKES BARRE",
 "proofingState"=>"PA",
 "enrollmentCode"=>"2090002197504352",
 "primaryIdType"=>"Not supported",
 "transactionStartDateTime"=>"12/17/2020 033855",
 "transactionEndDateTime"=>"12/17/2020 034055",
 "fraudSuspected"=>false,
 "proofingConfirmationNumber"=>"350040248346701",
 "ippAssuranceLevel"=>"1.5"}

This is the err.response for a 4xx status code:

{:status=>400,
 :headers=>{},
 :body=>{"responseMessage"=>"More than 30 days have passed since opt-in to IPP"},
 :request=>
  {:method=>:post,
   :url_path=>"/ivs-ippaas-api/IPPRest/resources/rest/getProofingResults",
   :params=>nil,
   :headers=>{"Content-Type"=>"application/json; charset=utf-8", "User-Agent"=>"Faraday v1.8.0", "Authorization"=>"Bearer <redacted>", "RequestID"=>"d63e795a-e163-464b-9cd5-6b55e716ac7b"},
   :body=>"{\"sponsorID\":0,\"uniqueID\":\"<redacted>\",\"enrollmentCode\":\"<redacted>\"}"}}

I also noticed that there's a third type of behavior to account for. If we get a successful response (eg status code 2xx) but the JSON is invalid then Faraday will raise a Faraday::ParsingError. We can look at err.response to see the response object but its body will be nil. If we wanted to access the body (as unparsed text) in that case we could look at err.response.env.response_body. I think we should explicitly handle the Faraday::ParsingError as a new error case and handle it appropriately

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to create a new ticket to address auditing the different error cases and updating the code to make sure that it behaves as expected for the different types of responses we'll get back from Faraday for various error conditions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ticket to address this thread: https://cm-jira.usa.gov/browse/LG-7706

@eileen-nava eileen-nava self-requested a review September 29, 2022 13:45
Copy link
Contributor

@eileen-nava eileen-nava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Thank you, Tomas, for fixing this!

@tomas-nava tomas-nava merged commit 062ac12 into main Sep 29, 2022
@tomas-nava tomas-nava deleted the tomas/handle-usps-proofing-job-error-without-response branch September 29, 2022 15:09
@solipet solipet mentioned this pull request Sep 29, 2022
jskinne3 pushed a commit that referenced this pull request Oct 12, 2022
changelog: Bug Fixes, In-person proofing, Improve job error handling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants