Handle StandardError without response in GetUspsProofingResultsJob#7049
Conversation
changelog: Bug Fixes, In-person proofing, Improve job error handling
| rescue Faraday::BadRequestError => err | ||
| handle_bad_request_error(err, enrollment) | ||
| rescue StandardError => err | ||
| NewRelic::Agent.notice_error(err) |
There was a problem hiding this comment.
These errors are unexpected, so I'm explicitly passing them to NewRelic so we can investigate them.
| end | ||
|
|
||
| def handle_standard_error(err, enrollment) | ||
| response_attributes = if err.respond_to?(:response) |
There was a problem hiding this comment.
I'm trying to understand when a StandardError would ever respond to the response method? Seems to me that may have been the issue in the first place? I also don't see any test cases which exercise this.
Would an alternative be to exclude **response_attributes from the analytics logged in this method, and keep the addition of the NewRelic agent call?
There was a problem hiding this comment.
Despite my prior comment, based on the errors being logged in NewRelic, it would seem that the issue isn't so much whether error responds to response, but that response is explicitly nil when response_analytics_attributes tries to pick values from the response†. I think this would be accounted for in the revisions proposed here (checking response.present?), but then it's not clear what scenarios we're expecting or not expecting error.response to exist at all.
† The reason being that the exceptions are occurring inside response_analytics_attributes (undefined method '[]' for nil:NilClass), whereas if error didn't respond to the response, I would expect the exception to occur in this method as undefined method 'response'.
There was a problem hiding this comment.
Ok so digging a bit more, I now understand that the errors we're handling here are other Faraday exceptions besides the Faraday::BadRequestError we're currently handling, such as Faraday::ServerError (related test case), which extend StandardError and thus are caught in this path and respond to response.
I think that helps clarify what's actually happening, though (separately) we may want to revisit how we're dealing with Faraday-specific errors, if we're handling some but not all explicitly.
For example, every Faraday error extends Faraday::Error, so maybe we should be rescuing the base class as a catch-all "network error" handling.
There was a problem hiding this comment.
Yeah that would be nice for observability too. I expect we'll see occasional Faraday errors and it won't be out of the ordinary. Non-Faraday standard errors should probably be looked at each time though
There was a problem hiding this comment.
Ticket to address this thread: https://cm-jira.usa.gov/browse/LG-7706
sheldon-b
left a comment
There was a problem hiding this comment.
Looks good, this should fix the issue as I understand it
I think we'll want a follow-up ticket to tidy things up and add test cases for timeout errors, server errors, etc. which I expect are the sorts of errors causing the bug
|
|
||
| def handle_standard_error(err, enrollment) | ||
| response_attributes = if err.respond_to?(:response) | ||
| response_analytics_attributes(err.response) |
There was a problem hiding this comment.
(Non-blocking but something to consider fixing up:)
The properties that we care about in response_analytics_attributes would come from the body property of the response (err.response[:body]), not the response directly, though in the case of the error, I assume we won't have those properties anyways, so ultimately it makes no difference.
I'd suggest we either audit the responses we're passing to response_analytics_attributes to make sure we're always sending the response body, or (as in a prior comment) avoid calling response_analytics_attributes altogether in error paths where we're not expecting useful response properties to log.
There was a problem hiding this comment.
Actually, if we may be expecting responseMessage in the server error response (as implied by some existing test cases), we might actually want to fix this up. (Edit: Still non-blocking as in it won't cause any issues, but (a) it'll be inaccurate with response_present: true despite an empty body and (b) we won't be logging the responseMessage.)
| response_analytics_attributes(err.response) | |
| response_analytics_attributes(err.response&.[](:body)) |
diff --git a/app/jobs/get_usps_proofing_results_job.rb b/app/jobs/get_usps_proofing_results_job.rb
index 36807bab8..dc18dd742 100644
--- a/app/jobs/get_usps_proofing_results_job.rb
+++ b/app/jobs/get_usps_proofing_results_job.rb
@@ -150,7 +150,7 @@ class GetUspsProofingResultsJob < ApplicationJob
def handle_standard_error(err, enrollment)
response_attributes = if err.respond_to?(:response)
- response_analytics_attributes(err.response)
+ response_analytics_attributes(err.response&.[](:body))
else
response_analytics_attributes(nil)
end
diff --git a/app/services/usps_in_person_proofing/mock/fixtures.rb b/app/services/usps_in_person_proofing/mock/fixtures.rb
index 515ee8945..39cc0b5b7 100644
--- a/app/services/usps_in_person_proofing/mock/fixtures.rb
+++ b/app/services/usps_in_person_proofing/mock/fixtures.rb
@@ -25,8 +25,8 @@ module UspsInPersonProofing
load_response_fixture('request_enroll_failed_response.json')
end
- def self.request_enroll_internal_failure_response
- load_response_fixture('request_enroll_internal_failure_response.json')
+ def self.request_internal_failure_response
+ load_response_fixture('request_internal_failure_response.json')
end
def self.request_enroll_invalid_response
diff --git a/app/services/usps_in_person_proofing/mock/proofer.rb b/app/services/usps_in_person_proofing/mock/proofer.rb
index 8005697a1..16822cb60 100644
--- a/app/services/usps_in_person_proofing/mock/proofer.rb
+++ b/app/services/usps_in_person_proofing/mock/proofer.rb
@@ -13,7 +13,7 @@ module UspsInPersonProofing
raise Faraday::BadRequestError.new('Bad request error', response)
when 'usps server error'
# usps 500 response
- body = JSON.parse(Fixtures.request_enroll_internal_failure_response)
+ body = JSON.parse(Fixtures.request_internal_failure_response)
response = { body: body, status: 500 }
raise Faraday::ServerError.new('Internal server error', response)
when 'usps invalid response'
diff --git a/app/services/usps_in_person_proofing/mock/responses/request_enroll_internal_failure_response.json b/app/services/usps_in_person_proofing/mock/responses/request_internal_failure_response.json
similarity index 100%
rename from app/services/usps_in_person_proofing/mock/responses/request_enroll_internal_failure_response.json
rename to app/services/usps_in_person_proofing/mock/responses/request_internal_failure_response.json
diff --git a/spec/jobs/get_usps_proofing_results_job_spec.rb b/spec/jobs/get_usps_proofing_results_job_spec.rb
index 6f5425beb..015f9d00e 100644
--- a/spec/jobs/get_usps_proofing_results_job_spec.rb
+++ b/spec/jobs/get_usps_proofing_results_job_spec.rb
@@ -56,9 +56,7 @@ RSpec.shared_examples 'enrollment with a status update' do |passed:, status:, re
end
end
-RSpec.shared_examples 'enrollment encountering an exception' do |exception_class: nil,
- exception_message: nil,
- reason: 'Request exception'|
+RSpec.shared_examples 'enrollment encountering an exception' do |**event_attributes|
it 'logs an error message and leaves the enrollment and profile pending' do
job.perform(Time.zone.now)
pending_enrollment.reload
@@ -67,11 +65,9 @@ RSpec.shared_examples 'enrollment encountering an exception' do |exception_class
expect(pending_enrollment.profile.active).to eq(false)
expect(job_analytics).to have_logged_event(
'GetUspsProofingResultsJob: Exception raised',
- reason: reason,
enrollment_id: pending_enrollment.id,
enrollment_code: pending_enrollment.enrollment_code,
- exception_class: exception_class,
- exception_message: exception_message,
+ **event_attributes,
)
end
@@ -516,13 +512,14 @@ RSpec.describe GetUspsProofingResultsJob do
context 'when USPS returns a 5xx status code' do
before(:each) do
- stub_request_proofing_results_with_responses({ status: 500 })
+ stub_request_proofing_results_internal_failure
end
it_behaves_like(
'enrollment encountering an exception',
exception_class: 'Faraday::ServerError',
exception_message: 'the server responded with status 500',
+ response_message: 'An internal error occurred processing the request',
)
end
diff --git a/spec/support/usps_ipp_helper.rb b/spec/support/usps_ipp_helper.rb
index 22286669e..ea606e577 100644
--- a/spec/support/usps_ipp_helper.rb
+++ b/spec/support/usps_ipp_helper.rb
@@ -25,8 +25,7 @@ module UspsIppHelper
def stub_request_enroll_internal_failure_response
stub_request(:post, %r{/ivs-ippaas-api/IPPRest/resources/rest/optInIPPApplicant}).to_return(
- status: 500, body: UspsInPersonProofing::Mock::Fixtures.
- request_enroll_internal_failure_response
+ status: 500, body: UspsInPersonProofing::Mock::Fixtures.request_internal_failure_response,
)
end
@@ -42,6 +41,12 @@ module UspsIppHelper
)
end
+ def stub_request_proofing_results_internal_failure
+ stub_request(:post, %r{/ivs-ippaas-api/IPPRest/resources/rest/getProofingResults}).to_return(
+ status: 500, body: UspsInPersonProofing::Mock::Fixtures.request_internal_failure_response,
+ )
+ end
+
def request_expired_proofing_results_args
{
status: 400, body: UspsInPersonProofing::Mock::Fixtures.There was a problem hiding this comment.
Yeah I think there's some confusion stemming from our Faraday middlewares which, for any <400 status code, will automatically parse the response body and return it as a hash. For any >=400 status code Faraday will raise an error and we'll have to explicitly access the body buried in the response (eg err.response[:body])
For example this is the response for a 2xx response status code:
{"status"=>"In-person passed",
"proofingPostOffice"=>"WILKES BARRE",
"proofingCity"=>"WILKES BARRE",
"proofingState"=>"PA",
"enrollmentCode"=>"2090002197504352",
"primaryIdType"=>"Not supported",
"transactionStartDateTime"=>"12/17/2020 033855",
"transactionEndDateTime"=>"12/17/2020 034055",
"fraudSuspected"=>false,
"proofingConfirmationNumber"=>"350040248346701",
"ippAssuranceLevel"=>"1.5"}
This is the err.response for a 4xx status code:
{:status=>400,
:headers=>{},
:body=>{"responseMessage"=>"More than 30 days have passed since opt-in to IPP"},
:request=>
{:method=>:post,
:url_path=>"/ivs-ippaas-api/IPPRest/resources/rest/getProofingResults",
:params=>nil,
:headers=>{"Content-Type"=>"application/json; charset=utf-8", "User-Agent"=>"Faraday v1.8.0", "Authorization"=>"Bearer <redacted>", "RequestID"=>"d63e795a-e163-464b-9cd5-6b55e716ac7b"},
:body=>"{\"sponsorID\":0,\"uniqueID\":\"<redacted>\",\"enrollmentCode\":\"<redacted>\"}"}}
I also noticed that there's a third type of behavior to account for. If we get a successful response (eg status code 2xx) but the JSON is invalid then Faraday will raise a Faraday::ParsingError. We can look at err.response to see the response object but its body will be nil. If we wanted to access the body (as unparsed text) in that case we could look at err.response.env.response_body. I think we should explicitly handle the Faraday::ParsingError as a new error case and handle it appropriately
There was a problem hiding this comment.
I'm going to create a new ticket to address auditing the different error cases and updating the code to make sure that it behaves as expected for the different types of responses we'll get back from Faraday for various error conditions
There was a problem hiding this comment.
Ticket to address this thread: https://cm-jira.usa.gov/browse/LG-7706
eileen-nava
left a comment
There was a problem hiding this comment.
This looks good to me. Thank you, Tomas, for fixing this!
changelog: Bug Fixes, In-person proofing, Improve job error handling
🛠 Summary of changes
Fixes a bug where certain errors encountered while checking for enrollment status updates were not being handled. See this discussion in Slack.