Skip to content

LG-8440: Ingest in-person enrollment status updates from SQS#8403

Merged
NavaTim merged 29 commits intomainfrom
tbradley/lg-8440-in-person-enrollment-sqs-status
May 25, 2023
Merged

LG-8440: Ingest in-person enrollment status updates from SQS#8403
NavaTim merged 29 commits intomainfrom
tbradley/lg-8440-in-person-enrollment-sqs-status

Conversation

@NavaTim
Copy link
Contributor

@NavaTim NavaTim commented May 16, 2023

🎫 Ticket

LG-8440

🛠 Summary of changes

Create InPerson::EnrollmentsReadyForStatusCheckJob job to ingest in-person enrollment status updates from USPS.

The following pseudocode approximates the job's behavior, minus error handling.

Report job start to analytics
Loop: Pull SQS messages into received batch
  Initialize empty deletion batch
  Loop: For each message in received batch
    Unwrap SQS payload
    Unwrap SNS payload
    Unwrap SES payload
    Parse raw email body
    Extract enrollment code
    Update enrollment record in DB
    Add SQS message to deletion batch
  End Loop
  Delete messages in deletion batch
End Loop
Report job completion to analytics

This adds the aws-sdk-sqs which was not previously present; this is the official AWS package. The mail gem is (and previously was) a direct dependency, so it has been added to the Gemfile; this does not constitute a substantial change, but reduces the risk of regression.

📜 Testing Plan

  • Enable and configure feature with valid SQS queue accessible from the IDP worker
  • Create in-person enrollment record
  • Send payload to SQS queue containing email payload corresponding to the enrollment record
  • Wait for or trigger cron job
  • Check that the ready_for_status_check flag is set on the enrollment record
Logs from local testing

The following logs cover a local test where 7 emails were sent. The batch size was adjusted to 5 in order to ensure the job fetched multiple times.

Expected success and deletion:

  • 1 email was the exact expected format
  • 1 email had whitespace around the enrollment code
  • 1 email contained text without an enrollment code

Expected failure and deletion:

  • 1 email contained a code that did not match an existing enrollment record
  • 3 blank emails

Another error occurred prior to the test where I was able to confirm that the job will log the completion (as expected) even when an error is thrown.

==> ./log/workers.log <==
{"duration_ms":0.019999980926513672,"timestamp":"2023-05-19T22:32:06.176Z","name":"perform_start.active_job","job_class":"InPerson::EnrollmentsReadyForStatusCheckJob","trace_id":null,"queue_name":"GoodJob(low)","job_id":"b1462a37-47aa-4cea-8c30-10abec1e6e2f","enqueued_at":null,"queued_duration_ms":null}
Performing InPerson::EnrollmentsReadyForStatusCheckJob (Job ID: b1462a37-47aa-4cea-8c30-10abec1e6e2f) from GoodJob(low) enqueued at 

==> ./log/events.log <==
{"name":"InPersonEnrollmentsReadyForStatusCheckJob: Job started","properties":{"event_properties":{},"new_event":true,"path":null,"session_duration":null,"user_id":"anonymous-uuid","locale":"en"},"time":"2023-05-19T22:32:06.187Z","id":"80d9ce3a-1abe-4ffa-902a-04d4a2b035dc","visitor_id":"5b81ee2c-e4c4-4404-a5ca-be6a90392293","visit_id":"c8f723d8-09c4-46c9-bacb-ed7c756c6280"}

==> ./log/production.log <==
Aws::SQS::Client 200 1.512384 0 receive_message [ ]

==> ./log/events.log <==
{"name":"InPersonEnrollmentsReadyForStatusCheckJob: Ingestion error","properties":{"event_properties":{"exception_class":"RuntimeError","exception_message":"InPerson::EnrollmentsReadyForStatusCheck::EnrollmentPipeline: Failure occurred when attempting to get email body","sqs_message_id":"5c5e05d9-00f7-435c-b7e2-5f2eb81cc068","sns_message_id":"edccd4fc-1871-56b8-a38d-881de00640d2","ses_aws_message_id":"g8a5ai89mi3om5vbhgclqprpt04rq16m0d0d7r81","ses_mail_timestamp":"2023-05-19T22:20:38.534Z","ses_mail_source":"timothy.bradley@gsa.gov","ses_rfc_origination_date":"2023-05-19T15:20:26-07:00","ses_rfc_message_id":"\u003cCANYMRCHnW5rLsWDetv1RjRc3oMnKxsOZMHM2tSk1H4sv0ALgaA@mail.gmail.com\u003e"},"new_event":true,"path":null,"session_duration":null,"user_id":"anonymous-uuid","locale":"en"},"time":"2023-05-19T22:32:07.809Z","id":"558511d2-74a3-4e0b-b66b-df607c4a43dd","visitor_id":"b4da149d-3a99-4d53-a463-f660c3b2c99a","visit_id":"04153042-24c2-475d-961a-499f7b9e0668"}

==> ./log/development.log <==
  InPersonEnrollment Pluck (0.8ms)  SELECT "in_person_enrollments"."id", "in_person_enrollments"."user_id", "in_person_enrollments"."ready_for_status_check" FROM "in_person_enrollments" WHERE "in_person_enrollments"."enrollment_code" = $1 AND "in_person_enrollments"."status" = $2 ORDER BY "in_person_enrollments"."created_at" DESC LIMIT $3  [["enrollment_code", "2048702198809999"], ["status", 1], ["LIMIT", 1]]

==> ./log/events.log <==
{"name":"InPersonEnrollmentsReadyForStatusCheckJob: Ingestion error","properties":{"event_properties":{"exception_class":"RuntimeError","exception_message":"InPerson::EnrollmentsReadyForStatusCheck::EnrollmentPipeline: Received code for enrollment that does not exist in the database","sqs_message_id":"3e09d8b6-1bce-4870-a8a2-c468498c28a4","sns_message_id":"3fc8db28-4e38-5ae8-98c0-87c9b6c03a85","ses_aws_message_id":"1conen8qaha1cm76pq9qigv52nq41u99gkf8rco1","ses_mail_timestamp":"2023-05-19T22:22:47.518Z","ses_mail_source":"timothy.bradley@gsa.gov","ses_rfc_origination_date":"2023-05-19T15:22:35-07:00","ses_rfc_message_id":"\u003cCANYMRCERPgo78+AwJFbKL2DwmR-2a76-pMeYG8phycuwL12KEw@mail.gmail.com\u003e","enrollment_code":"2048702198809999"},"new_event":false,"path":null,"session_duration":null,"user_id":"anonymous-uuid","locale":"en"},"time":"2023-05-19T22:32:07.856Z","id":"83939068-53a9-468c-90fb-d355d17306c2","visitor_id":"b4da149d-3a99-4d53-a463-f660c3b2c99a","visit_id":"04153042-24c2-475d-961a-499f7b9e0668"}

==> ./log/production.log <==
Aws::SQS::Client 200 0.16097 0 delete_message_batch [ ]
Aws::SQS::Client 200 0.398405 0 receive_message [ ]

==> ./log/development.log <==
  InPersonEnrollment Pluck (0.7ms)  SELECT "in_person_enrollments"."id", "in_person_enrollments"."user_id", "in_person_enrollments"."ready_for_status_check" FROM "in_person_enrollments" WHERE "in_person_enrollments"."enrollment_code" = $1 AND "in_person_enrollments"."status" = $2 ORDER BY "in_person_enrollments"."created_at" DESC LIMIT $3  [["enrollment_code", "2048702198801234"], ["status", 1], ["LIMIT", 1]]
  InPersonEnrollment Load (0.4ms)  SELECT "in_person_enrollments"."id", "in_person_enrollments"."user_id", "in_person_enrollments"."profile_id", "in_person_enrollments"."enrollment_code", "in_person_enrollments"."status_check_attempted_at", "in_person_enrollments"."status_updated_at", "in_person_enrollments"."status", "in_person_enrollments"."created_at", "in_person_enrollments"."updated_at", "in_person_enrollments"."current_address_matches_id", "in_person_enrollments"."selected_location_details", "in_person_enrollments"."unique_id", "in_person_enrollments"."enrollment_established_at", "in_person_enrollments"."issuer", "in_person_enrollments"."follow_up_survey_sent", "in_person_enrollments"."early_reminder_sent", "in_person_enrollments"."late_reminder_sent", "in_person_enrollments"."deadline_passed_sent", "in_person_enrollments"."proofed_at", "in_person_enrollments"."capture_secondary_id_enabled", "in_person_enrollments"."status_check_completed_at", "in_person_enrollments"."ready_for_status_check" FROM "in_person_enrollments" WHERE "in_person_enrollments"."id" = $1 LIMIT $2  [["id", 51], ["LIMIT", 1]]
  TRANSACTION (0.2ms)  BEGIN
  Profile Load (1.0ms)  SELECT "profiles"."id", "profiles"."user_id", "profiles"."active", "profiles"."verified_at", "profiles"."activated_at", "profiles"."created_at", "profiles"."updated_at", "profiles"."encrypted_pii", "profiles"."ssn_signature", "profiles"."encrypted_pii_recovery", "profiles"."deactivation_reason", "profiles"."proofing_components", "profiles"."name_zip_birth_year_signature", "profiles"."reproof_at", "profiles"."initiating_service_provider_issuer", "profiles"."fraud_review_pending_at", "profiles"."fraud_rejection_at", "profiles"."gpo_verification_pending_at" FROM "profiles" WHERE "profiles"."id" = $1 LIMIT $2  [["id", 1], ["LIMIT", 1]]
  User Load (0.5ms)  SELECT "users"."id", "users"."reset_password_token", "users"."reset_password_sent_at", "users"."remember_created_at", "users"."created_at", "users"."updated_at", "users"."confirmed_at", "users"."second_factor_attempts_count", "users"."uuid", "users"."second_factor_locked_at", "users"."phone_confirmed_at", "users"."direct_otp", "users"."direct_otp_sent_at", "users"."unique_session_id", "users"."otp_delivery_preference", "users"."encrypted_password_digest", "users"."encrypted_recovery_code_digest", "users"."remember_device_revoked_at", "users"."email_language", "users"."accepted_terms_at", "users"."encrypted_recovery_code_digest_generated_at" FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
  User Load (0.6ms)  SELECT "users"."id", "users"."reset_password_token", "users"."reset_password_sent_at", "users"."remember_created_at", "users"."created_at", "users"."updated_at", "users"."confirmed_at", "users"."second_factor_attempts_count", "users"."uuid", "users"."second_factor_locked_at", "users"."phone_confirmed_at", "users"."direct_otp", "users"."direct_otp_sent_at", "users"."unique_session_id", "users"."otp_delivery_preference", "users"."encrypted_password_digest", "users"."encrypted_recovery_code_digest", "users"."remember_device_revoked_at", "users"."email_language", "users"."accepted_terms_at", "users"."encrypted_recovery_code_digest_generated_at" FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 2], ["LIMIT", 1]]
  InPersonEnrollment Update (0.6ms)  UPDATE "in_person_enrollments" SET "updated_at" = $1, "ready_for_status_check" = $2 WHERE "in_person_enrollments"."id" = $3  [["updated_at", "2023-05-19 22:32:08.481168"], ["ready_for_status_check", true], ["id", 51]]
  TRANSACTION (2.3ms)  COMMIT
  InPersonEnrollment Pluck (0.6ms)  SELECT "in_person_enrollments"."id", "in_person_enrollments"."user_id", "in_person_enrollments"."ready_for_status_check" FROM "in_person_enrollments" WHERE "in_person_enrollments"."enrollment_code" = $1 AND "in_person_enrollments"."status" = $2 ORDER BY "in_person_enrollments"."created_at" DESC LIMIT $3  [["enrollment_code", "2048702198804353"], ["status", 1], ["LIMIT", 1]]
  InPersonEnrollment Load (0.7ms)  SELECT "in_person_enrollments"."id", "in_person_enrollments"."user_id", "in_person_enrollments"."profile_id", "in_person_enrollments"."enrollment_code", "in_person_enrollments"."status_check_attempted_at", "in_person_enrollments"."status_updated_at", "in_person_enrollments"."status", "in_person_enrollments"."created_at", "in_person_enrollments"."updated_at", "in_person_enrollments"."current_address_matches_id", "in_person_enrollments"."selected_location_details", "in_person_enrollments"."unique_id", "in_person_enrollments"."enrollment_established_at", "in_person_enrollments"."issuer", "in_person_enrollments"."follow_up_survey_sent", "in_person_enrollments"."early_reminder_sent", "in_person_enrollments"."late_reminder_sent", "in_person_enrollments"."deadline_passed_sent", "in_person_enrollments"."proofed_at", "in_person_enrollments"."capture_secondary_id_enabled", "in_person_enrollments"."status_check_completed_at", "in_person_enrollments"."ready_for_status_check" FROM "in_person_enrollments" WHERE "in_person_enrollments"."id" = $1 LIMIT $2  [["id", 44], ["LIMIT", 1]]
  TRANSACTION (0.2ms)  BEGIN
  Profile Load (0.7ms)  SELECT "profiles"."id", "profiles"."user_id", "profiles"."active", "profiles"."verified_at", "profiles"."activated_at", "profiles"."created_at", "profiles"."updated_at", "profiles"."encrypted_pii", "profiles"."ssn_signature", "profiles"."encrypted_pii_recovery", "profiles"."deactivation_reason", "profiles"."proofing_components", "profiles"."name_zip_birth_year_signature", "profiles"."reproof_at", "profiles"."initiating_service_provider_issuer", "profiles"."fraud_review_pending_at", "profiles"."fraud_rejection_at", "profiles"."gpo_verification_pending_at" FROM "profiles" WHERE "profiles"."id" = $1 LIMIT $2  [["id", 8], ["LIMIT", 1]]
  User Load (0.3ms)  SELECT "users"."id", "users"."reset_password_token", "users"."reset_password_sent_at", "users"."remember_created_at", "users"."created_at", "users"."updated_at", "users"."confirmed_at", "users"."second_factor_attempts_count", "users"."uuid", "users"."second_factor_locked_at", "users"."phone_confirmed_at", "users"."direct_otp", "users"."direct_otp_sent_at", "users"."unique_session_id", "users"."otp_delivery_preference", "users"."encrypted_password_digest", "users"."encrypted_recovery_code_digest", "users"."remember_device_revoked_at", "users"."email_language", "users"."accepted_terms_at", "users"."encrypted_recovery_code_digest_generated_at" FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 6], ["LIMIT", 1]]
  User Load (0.3ms)  SELECT "users"."id", "users"."reset_password_token", "users"."reset_password_sent_at", "users"."remember_created_at", "users"."created_at", "users"."updated_at", "users"."confirmed_at", "users"."second_factor_attempts_count", "users"."uuid", "users"."second_factor_locked_at", "users"."phone_confirmed_at", "users"."direct_otp", "users"."direct_otp_sent_at", "users"."unique_session_id", "users"."otp_delivery_preference", "users"."encrypted_password_digest", "users"."encrypted_recovery_code_digest", "users"."remember_device_revoked_at", "users"."email_language", "users"."accepted_terms_at", "users"."encrypted_recovery_code_digest_generated_at" FROM "users" WHERE "users"."id" = $1 LIMIT $2  [["id", 6], ["LIMIT", 1]]
  InPersonEnrollment Update (0.4ms)  UPDATE "in_person_enrollments" SET "updated_at" = $1, "ready_for_status_check" = $2 WHERE "in_person_enrollments"."id" = $3  [["updated_at", "2023-05-19 22:32:08.498820"], ["ready_for_status_check", true], ["id", 44]]
  TRANSACTION (0.4ms)  COMMIT

==> ./log/events.log <==
{"name":"InPersonEnrollmentsReadyForStatusCheckJob: Ingestion error","properties":{"event_properties":{"exception_class":"RuntimeError","exception_message":"InPerson::EnrollmentsReadyForStatusCheck::EnrollmentPipeline: Failure occurred when attempting to get email body","sqs_message_id":"9e08c28e-6a9d-4ed9-85bd-28860f886bcd","sns_message_id":"673c75f0-d6f0-58bd-a425-29112fbb5353","ses_aws_message_id":"vb9vbqtihrm3e1jqaajgvmbvpcl9s9vhse2hca01","ses_mail_timestamp":"2023-05-19T22:24:10.787Z","ses_mail_source":"timothy.bradley@gsa.gov","ses_rfc_origination_date":"2023-05-19T15:23:58-07:00","ses_rfc_message_id":"\u003cCANYMRCHcOBOtLN0nURQL0PTB0g7zOQZVr5c=nJySRisYmbAeJw@mail.gmail.com\u003e"},"new_event":false,"path":null,"session_duration":null,"user_id":"anonymous-uuid","locale":"en"},"time":"2023-05-19T22:32:08.503Z","id":"61a0f8b6-c5f1-4216-b2f4-aeea199d631d","visitor_id":"b4da149d-3a99-4d53-a463-f660c3b2c99a","visit_id":"04153042-24c2-475d-961a-499f7b9e0668"}

==> ./log/production.log <==
Aws::SQS::Client 200 0.156043 0 delete_message_batch [ ]
Aws::SQS::Client 200 0.243185 0 receive_message [ ]

==> ./log/events.log <==
{"name":"InPersonEnrollmentsReadyForStatusCheckJob: Ingestion error","properties":{"event_properties":{"exception_class":"RuntimeError","exception_message":"InPerson::EnrollmentsReadyForStatusCheck::EnrollmentPipeline: Failure occurred when attempting to get email body","sqs_message_id":"643969ae-7915-4ecf-920c-29c9d991f0a1","sns_message_id":"57847377-fbbd-5433-bc71-2f9603f4f0ff","ses_aws_message_id":"okm73d4p9c7ko4dg1b9ahrikr4s8odf7sgd8jdo1","ses_mail_timestamp":"2023-05-19T22:19:53.147Z","ses_mail_source":"timothy.bradley@gsa.gov","ses_rfc_origination_date":"2023-05-19T15:19:40-07:00","ses_rfc_message_id":"\u003cCANYMRCH4=9X-FkB-76VFnQ=0Vb5hMomhYU-9KGM31cg1k3hwDQ@mail.gmail.com\u003e"},"new_event":false,"path":null,"session_duration":null,"user_id":"anonymous-uuid","locale":"en"},"time":"2023-05-19T22:32:08.906Z","id":"07f5c35c-42fe-4940-bef4-65ca9ad2d9cc","visitor_id":"b4da149d-3a99-4d53-a463-f660c3b2c99a","visit_id":"04153042-24c2-475d-961a-499f7b9e0668"}

==> ./log/production.log <==
Aws::SQS::Client 200 0.152797 0 delete_message_batch [ ]
Aws::SQS::Client 200 0.20169 0 receive_message [ ]

==> ./log/events.log <==
{"name":"InPersonEnrollmentsReadyForStatusCheckJob: Ingestion error","properties":{"event_properties":{"exception_class":"RuntimeError","exception_message":"InPerson::EnrollmentsReadyForStatusCheck::EnrollmentPipeline: Failed to extract enrollment code using regex, check email body format and regex","sqs_message_id":"08153bde-f8cf-40d4-b32f-4db8f66e159e","sns_message_id":"495227d2-cab5-5d63-b2ad-7b333279b0eb","ses_aws_message_id":"qgrmdo4teol3jiqbkl9ajn3kaep0dfrfdiuis401","ses_mail_timestamp":"2023-05-19T22:22:04.180Z","ses_mail_source":"timothy.bradley@gsa.gov","ses_rfc_origination_date":"2023-05-19T15:21:52-07:00","ses_rfc_message_id":"\u003cCANYMRCEo_Lv6QN_zd7KzEj4Na=EvcSxjmciJsBQaPB8YjVGtSQ@mail.gmail.com\u003e"},"new_event":false,"path":null,"session_duration":null,"user_id":"anonymous-uuid","locale":"en"},"time":"2023-05-19T22:32:09.265Z","id":"8dbdb93d-aa9e-4264-9527-f089f2304bd4","visitor_id":"b4da149d-3a99-4d53-a463-f660c3b2c99a","visit_id":"04153042-24c2-475d-961a-499f7b9e0668"}

==> ./log/production.log <==
Aws::SQS::Client 200 0.145369 0 delete_message_batch [ ]
Aws::SQS::Client 200 20.191978 0 receive_message [ ]

==> ./log/events.log <==
{"name":"InPersonEnrollmentsReadyForStatusCheckJob: Job completed","properties":{"event_properties":{"fetched_items":7,"processed_items":7,"deleted_items":7,"valid_items":2,"invalid_items":5,"incomplete_items":0,"deletion_failed_items":0},"new_event":true,"path":null,"session_duration":null,"user_id":"anonymous-uuid","locale":"en"},"time":"2023-05-19T22:32:29.604Z","id":"4e662e4f-02b9-48a5-8f84-4b8f2723b5d5","visitor_id":"37bd0788-7732-4a02-81c4-dbaae075398d","visit_id":"10a81cc7-9997-4e8c-afc9-c5b58c0eeaa8"}

==> ./log/workers.log <==
{"duration_ms":23429.647000074387,"timestamp":"2023-05-19T22:32:29.605Z","name":"perform.active_job","job_class":"InPerson::EnrollmentsReadyForStatusCheckJob","trace_id":null,"queue_name":"GoodJob(low)","job_id":"b1462a37-47aa-4cea-8c30-10abec1e6e2f","enqueued_at":null}
Performed InPerson::EnrollmentsReadyForStatusCheckJob (Job ID: b1462a37-47aa-4cea-8c30-10abec1e6e2f) from GoodJob(low) in 23430.02ms

@NavaTim NavaTim requested review from a team and racingspider May 16, 2023 18:40
Copy link
Contributor

@zachmargolis zachmargolis May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "uses" an abbreviation, or is it like the verb like "this uses that"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uses is a verb here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! I feel like that's not a common naming convention we have in this repo? Is there a particular reason to start that now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use of this convention is pretty isolated, so I don't think it's a big concern. If you want to have the job refactored to use delegation instead of mixins though then I'll rename accordingly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading this right, it looks like we are only including this mixin in one place. Do we have plans for future SQS topics? (my understanding is no, we do not) so I think that it would make sense to just inline this for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mixin is used by the job (included by both the job directly and the batch processor), but I'm intentionally separating this out to reduce the complexity of working on this section of the code. I did consider renaming this to SqsBatchWrapper.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using delegation instead of inheritance via mixin? To have an SQS client/consumer class that we can instantiate instead of mixing in? delegation is much easier to follow than included methods from mixins when tracing code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was considering that exact refactor before posting this PR. If you think it's worth making the change, then I'll do it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes definitely, let's go for it

Copy link
Contributor Author

@NavaTim NavaTim May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zachmargolis I refactored to use DI instead of mixins. There's some lack of clarity about where to put the factories, so for now they live in the job and have their own tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rescue inside of an ensure is very surprising to me. Could we move the body of this ensure into its own method? that way it can have its own rescue and stuff

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rescue here is very intentional, but I think it's reasonable to move this into another method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved into the new private method process_deletions.

Comment on lines +128 to +138
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe unhandled errors in background jobs already go to NewRelic, do we need to notify directly like this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do need additional direct error logging here in order to better capture the context of the error; this is most likely to occur after we have captured the enrollment code. However that part could be isolated to CloudWatch if you think that's particularly important.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we filter on status, later we update ready_for_status_check, are they the same thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they currently track different things. status tracks the overall progress of the enrollment creation, checking, and success/failure. ready_for_status_check only covers whether we received a message saying that the enrollment is ready to have its status checked (which is not tracked by status).

The check with USPS may result in a status update if USPS indicates that the enrollment is not present, has expired, or that the user has visited the post office already.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More context here - we are currently checking enrollments that have a pending status. When this and related stories are complete, we will be able to deprioritize (but still eventually check) enrollments that have ready_for_status_check remain as false.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not getting why the append is happening outside of the above if statement? Based on the comment it feels like the sqs_message should only be appended if process_message returns true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If process_message returns false, then the enrollment cannot be processed (e.g. invalid JSON, wrong fields, etc.). Keeping it in the queue would unnecessarily slow down the ingestion of other email notifications.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to Jack’s question-how does a message meet the condition described in the comment on line 38?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svalexander The message will meet the condition if an uncaught error occurs while processing the message. One example would be if the database becomes unavailable before we can update the enrollment status.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this inclusion necessary, as BatchProcessor already includes this module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking no, but I consider it important since this class directly uses the poll method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for including these comments, very useful!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these items that encountered errors when we attempted to process them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're deleting items in the following cases:

  • Processing succeeds
  • The item is malformed (e.g. invalid JSON or has wrong fields)
  • The item does not correspond to a pending enrollment record

If we don't delete the last two, then workers will continue attempting (and failing) to process them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I like factory object. Matt showed me once we have a a lot of analytics events piling together.

Copy link
Contributor

@svalexander svalexander May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering how we decided on the time limit for this and the next line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's normal with SQS to set wait_time_seconds high in order to reduce the number of calls and increase the likelihood that we'll receive items from the SQS on each call. Behind the scenes, SQS is also a distributed queue, which means order is only an approximate and not a strict guarantee (unless we use FIFO) & AWS could potentially be fetching the items from multiple machines.

I'm leaving visibility_timeout_seconds at the default, but it's a parameter that may be worth adjusting if the processing time is extended at some point. This controls when items will become visible to other workers (i.e. essentially for the case of an unreported processing failure).

@NavaTim NavaTim force-pushed the tbradley/lg-8440-in-person-enrollment-sqs-status branch from 23cb016 to 35d6b26 Compare May 18, 2023 21:54
Comment on lines 65 to 67
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tap returns the original value, not the block, I think we want a different approach here

1.tap { |x| x + 1 }
=> 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, updated to use then instead. I also updated the test so that it will catch this issue.

@NavaTim NavaTim marked this pull request as ready for review May 19, 2023 22:54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't the deleted_items and processed_items values in the splatted analytics_hash? Why explicitly list them here? To be explicit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default analytics_stats hash has all of the items as zero. The overrides here are primarily intended to represent changes in the items. I have set some of the zero redundantly because it matters for understanding the individual test's behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to grab this from the config so if it changes there the test will be aware?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a unit test; testing the specific configuration is out-of-scope. This is meant to test that the module functions correctly as a unit.

Copy link
Contributor

@JackRyan1989 JackRyan1989 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT!

Minor non-blocking question: We set a limit of ten messages to be pulled from SQS. The batch processor will handle any length of messages returned from SQS, but I'm wondering if it's worth thinking about setting a limit?

@NavaTim
Copy link
Contributor Author

NavaTim commented May 23, 2023

LGMT!

Minor non-blocking question: We set a limit of ten messages to be pulled from SQS. The batch processor will handle any length of messages returned from SQS, but I'm wondering if it's worth thinking about setting a limit?

@JackRyan1989 I think that the control on this type of behavior should be on the job runtime, if possible, rather than the number of items. This may be a good place to implement or check for existing monitoring on this type of problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My vote is to remove this direct NewRelic notification, we tend to react to errors in NR as if they are unhandled 500s so seeing errors like this that were caught might make something seem more urgent than it is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of error signals that USPS is sending us garbage data or that AWS has become severely misconfigured. At the current scale it may not be consequential, but it could be extremely consequential at the scales we were preparing for earlier this year.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOW - if we are getting this at all, then it is very likely to be a problem that needs addressed soon. It doesn't mean we have to stop processing the other queue items, though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine keeping the NR notification, but we should reconsider if it turns out not to be useful

Comment on lines 10 to 11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just style, I really dislike unless, especially with a compound statement, I think its always clearer as an if

Suggested change
return true unless IdentityConfig.store.in_person_proofing_enabled &&
IdentityConfig.store.in_person_enrollments_ready_job_enabled
if !IdentityConfig.store.in_person_proofing_enabled ||
!IdentityConfig.store.in_person_enrollments_ready_job_enabled
return true
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree; unless provides a larger visual indication of the conditional behavior than the oft-missed exclamation mark.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the conditions could be pulled into well-named methods, ie:

def in_person_proofing_not_enabled
  !IdentityConfig.store.in_person_proofing_enabled
end

then it could look like this, which imo is a little easier to parse/reason about

if in_person_proofing_not_enabled || in_person_enrollements_ready_job_not_enabled
  return true
end

Copy link
Contributor

@tomas-nava tomas-nava May 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank? will return true if a variable is nil or false

      return true if IdentityConfig.store.in_person_proofing_enabled.blank? ||
                     IdentityConfig.store.in_person_enrollments_ready_job_enabled.blank?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomas-nava That sounds like a reasonable compromise.

@Sgtpluck I think locality is preferable here.

Comment on lines +48 to +52
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like using dependency injection, but I think that it seems redundant to pass in the class_name explicitly like this, could the BatchProcessor class have a default implementation that passes its own self.class.name in to the error reporter? Since analytics here is basically just a default/empty instance with no user, I feel like we don't need to worry about passing in the same one necessarily

Ditto for below with EnrollmentPipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default implementations for DI have been a colossal headache for me in the past, so I really prefer to avoid default implementations.

Comment on lines +57 to +60
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since sqs_message is doubling for a plain struct, could we just use the real value type instead?

Suggested change
before(:each) do
allow(sqs_message).to receive(:message_id).
and_return(sqs_message_id)
end
let(:sqs_message) { Aws::SQS::Types::Message.new(message_id: sqs_message_id) }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This builds into the test some assumptions about how the struct is created, so I would prefer to avoid this change.

Copy link
Contributor

@mitchellhenke mitchellhenke May 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the worry that the class constructor changes and breaks the test, or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mitchellhenke Yes, roughly speaking. i.e. constructor or factory doing some kind of interpretation on the data passed in rather than setting it to exactly match the original parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an additional concern about coupling and misrepresenting parts of the payload that we don't actually use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, I think we could simplify the tests by using the actual value type here for sqs_message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above:

This builds into the test some assumptions about how the struct is created, so I would prefer to avoid this change.

@NavaTim NavaTim force-pushed the tbradley/lg-8440-in-person-enrollment-sqs-status branch from 75255e1 to 853013c Compare May 24, 2023 18:21
NavaTim and others added 4 commits May 24, 2023 11:53
…ent_pipeline_spec.rb

Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com>
…h_wrapper.rb

Co-authored-by: Zach Margolis <zachmargolis@users.noreply.github.com>
@NavaTim NavaTim merged commit 3f1177a into main May 25, 2023
@NavaTim NavaTim deleted the tbradley/lg-8440-in-person-enrollment-sqs-status branch May 25, 2023 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants