Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make requests to the CHA more resilient #718

Merged
merged 5 commits into from
Feb 9, 2024

Conversation

Cruikshanks
Copy link
Member

@Cruikshanks Cruikshanks commented Feb 8, 2024

https://eaflood.atlassian.net/browse/WATER-4365

As part of looking at re-implementing the SROC annual billing engine in this project, we found a solution that allows us to process bills in parallel (will be added in later commits!) However, the more we do the more strain this puts on the Charging Module API to the point we start seeing timeouts and ECONNRESET errors.

With just a few tweaks on our side, we could increase our tolerance of these errors. There is still a point where we are pushing it too hard. But the more throughput we get, the faster we can complete the bill run process.

So, this change does the following.

  • Add charging module specific request timeout We don't just want to increase the timeout for all web requests. Some impact the experience of the end-user and they shouldn't be made to watch a page spinning for longer than necessary. So, we had a specific one for the Charging Module
  • Retry ECONNRESET errors. We learnt this can happen when an API is under pressure. It doesn't mean the Charging Module can't handle the next attempt so it is safe to try these again.

Also a bit of housekeeping. Whilst using the timeout scenarios as the template for updating the network issue ones, we realised that the timeout tests were not checking that a log of the error is recorded after all retries fail.

So, we've gone back and added them in.

https://eaflood.atlassian.net/browse/WATER-4365

As part of looking at re-implementing the SROC annual billing engine in this project we found a solution that allows us to process bills in parallel (will be added in later commits!) However, the more we do the more strain this puts on the [Charging Module API](https://github.com/DEFRA/sroc-charging-module-api) to the point we start seeing timeouts and `ECONNRESET` errors.

We found with just a few tweaks on our side we could increase our tolerance of these errors. There is still a point where we are pushing it to hard. But the more throughput we can get, the faster we can complete the bill run process.

So, this change does the following.

- **double the default timeout we set for web requests just for the charging module.** We don't just double the timeout for all web requests. Some impact the experience of the end user and they shouldn't be made to watch a page spinning for longer than necessary
- **retry `ECONNRESET` errors.** We learnt this can happen when an API is under pressure. It doesn't mean the Charging Module can't handle the next attempt so it is safe to try these again.
@Cruikshanks Cruikshanks added the housekeeping Refactoring, tidying up or other work which supports the project label Feb 8, 2024
@Cruikshanks Cruikshanks self-assigned this Feb 8, 2024
Was originally going to just double the existing timeout value. But on reflection felt we would have more control and flexibility if we kept them separate.
Having clocked that we hadn't set the `{ retry: shortBackoffLimitRetryOptions }` in our new retry tests we added them and saw run times improve. But then we realised this made needing to override the Lab test timeout redundant so we remove that here.

Also, whilst using the timeout scenarios as the template for updating the network issue ones, we realised that the timeout tests were not checking that a log of the error is record after all retries fail.

So, we've gone back and added them in.
@Cruikshanks Cruikshanks marked this pull request as ready for review February 8, 2024 23:18
@Cruikshanks Cruikshanks merged commit d007d0b into main Feb 9, 2024
6 checks passed
@Cruikshanks Cruikshanks deleted the make-cha-requests-more-resilient branch February 9, 2024 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping Refactoring, tidying up or other work which supports the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants