Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Queue Monitoring - Schedulers Refactor #4076

Closed
3 tasks done
andrewsignori-aot opened this issue Dec 11, 2024 · 1 comment
Closed
3 tasks done

Queue Monitoring - Schedulers Refactor #4076

andrewsignori-aot opened this issue Dec 11, 2024 · 1 comment

Comments

@andrewsignori-aot
Copy link
Collaborator

andrewsignori-aot commented Dec 11, 2024

User Story
As a product team, we need to monitor the queues and associated SFTP folders to diagnose errors in processing the files, and confirm archiving behavior for failed files.
This is a continuation of the schedulers refactors started during #3666.

Acceptance Criteria

  • Refactor schedulers to ensure they will be moved to the failed state if an unexpected error happens. The below schedulers were already adjusted and can be used as a reference.
    • cas-supplier-integration.scheduler.ts
    • cra-process-integration.scheduler.ts
    • cra-response-integration.scheduler.ts
  • E2E will need to be adjusted by calling the new processQueue method.
  • E2E logs assertions will need to be adjusted.
@andrewsignori-aot andrewsignori-aot changed the title Copy of Queue Monitoring Queue Monitoring - Schedulers Refactor Dec 11, 2024
github-merge-queue bot pushed a commit that referenced this issue Dec 16, 2024
)

- Refactored SFAS-related schedulers, SFAS to SIMS and vice-versa.
  - Adjusted E2E tests.
- Replaced `SFASProcessingResult` with the `processSummary`. This
refactor is not expected to be done for every scheduler.
github-merge-queue bot pushed a commit that referenced this issue Dec 18, 2024
…an balance) (#4118)

- Refactored SIN validation-related schedulers and student loan balance
import.
- Adjusted E2E tests.
github-merge-queue bot pushed a commit that referenced this issue Dec 19, 2024
…Receipts) (#4135)

- Refactored schedulers `disbursement-receipts-file-integration` and
`federal-restrictions-integration`.
- Adjusted existing E2Es.
- As agreed in previous PRs
  - moving the audit user to the service instead of the processor.
- removing the friendly start/end logs from the processor since the
`BaseQueue` is already logging similar start/end logs.

### E2E `containLogMessages` changes
To allow the below check to happen the method `containLogMessages` was
changed to check if the string "contains" a value instead of "endsWith"
it. The `endsWith` was used to have a more precise check and remove the
log initial information (e.g. context and date). Using the "contains"
will still give enough assertion precision and allow inspecting errors
currently serialized to a JSON.
```ts
 // Act/Assert
await expect(processor.processQueue(mockedJob.job)).rejects.toThrow(
  "One or more errors were reported during the process, please see logs for details.",
);
expect(
  mockedJob.containLogMessages([
    `Error downloading file ${expectedFileName}.`,
    "Invalid file footer.",
  ]),
).toBe(true);
```
github-merge-queue bot pushed a commit that referenced this issue Dec 24, 2024
…dback) (#4159)

Refactored the below schedulers.
- ecert-full-time-feedback
- ecert-full-time-process
- ecert-part-time-feedback
- ecert-part-time-process
@andrewsignori-aot
Copy link
Collaborator Author

andrewsignori-aot commented Dec 27, 2024

@CarlyCotton while executing the refactor I noticed that we need a way to generate alerts for jobs finalizing with success but having some warnings.
The approach taken right now is.

  • The job will move to a failed state if at any point an exception is thrown and the job can recover if retried.
  • If an "error" or situation happens that needs attention but executing again the job will not fix it, the jobs will save logs as warnings.

The recommendation right now is to make the jobs count any execution that was finalized and contain at least a warning. This counter will be then collected by Sysdig and we can generate an alert about it. Does it make sense?

Warnings examples.

  • An example of a warning is when a federal restriction file is imported and contains an unknown code. SIMS will create a new restriction to allow its association with the student and Ministry should act to verify it and have it properly configured.
  • An ECE file is imported with inconsistent records, such as a confirmation for an application that is not in the correct status.

github-merge-queue bot pushed a commit that referenced this issue Dec 27, 2024
…nter (#4186)

- Collected a metric for jobs finalizing with some warnings to allow the
creation of a Sysdig alert based on the counter.
- Injected the service as a property to avoid passing the service to
every single class inheriting from the `BaseQueue`.

### New metric sample from metrics payload
```
queue_event_total_count {
  queueName="student-application-notifications",
  queueEvent="job-finalized-with-warnings",
  queueType="scheduler",
  app="queue-consumers"
} 2
```


![image](https://github.com/user-attachments/assets/2d7e7c13-1b40-47a9-b2a5-2f1cf7f349af)

_Note:_ this change was a quick way to resolve the concerned raised and
explained to the business in this
[comment](#4076 (comment)).
@AnnaPBashkatova AnnaPBashkatova added this to the 2.2 Full-Time "Asset" milestone Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants