Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove timeout on MessageOutboxDeliveryService #437

Merged
merged 5 commits into from
Jul 18, 2024

Conversation

Enkidu93
Copy link
Collaborator

@Enkidu93 Enkidu93 commented Jul 17, 2024

Fixes #422

I've gone ahead and added the recommended index on pretranslations which should speed up queries there significantly, but ultimately, it wasn't the queries on pretranslations that were taxing mongo and causing it to send alerts.

Rather, it has to do with a bug in the MessageOutboxDeliveryService's subscription. A timeout of 10 seconds is used when waiting for a new message to appear. However, when looping back to wait again, it's looking for changes from the same time onward since the timestamp which it uses as a placeholder only updates when there's been a change (i.e., not when it's timed out). For now, I've simply removed the timeout which fits with usage elsewhere. Now it should wait until there's been a change without rerunning the changeStream command. In recent versions of Mongo, it's also possible to resume changeStreams. It may be better (if we have situations in which we want a timeout but will also re-wait for changes) to use the resumeToken strategy to resume rather than restart.


This change is Reviewable

@codecov-commenter
Copy link

codecov-commenter commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 88.23529% with 2 lines in your changes missing coverage. Please review.

Project coverage is 61.76%. Comparing base (05f074e) to head (9bf9b09).

Files Patch % Lines
....Shared/Configuration/IMachineBuilderExtensions.cs 0.00% 1 Missing ⚠️
...ne.Shared/Services/MessageOutboxDeliveryService.cs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #437      +/-   ##
==========================================
+ Coverage   61.73%   61.76%   +0.03%     
==========================================
  Files         232      232              
  Lines       11825    11827       +2     
  Branches     1510     1510              
==========================================
+ Hits         7300     7305       +5     
+ Misses       3998     3995       -3     
  Partials      527      527              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@johnml1135
Copy link
Collaborator

@Enkidu93, do you have a reference to any documentation that uses the pattern that you are recommending?

@johnml1135
Copy link
Collaborator

That is, do you have documentation that this solution will work, or did you test it out empirically?

@johnml1135
Copy link
Collaborator

Also, can you change the dot net code to correct the pre-translation indexing as well?

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a section where the documentation discusses techniques for resuming/starting change streams: https://www.mongodb.com/docs/manual/changeStreams/#resumeafter-for-change-streams

In the same area, you can see information about the method we're using which is to start from a particular time. If you look at the logs on Atlas, you can see that the change stream command is running from the same start time every time until an insertion has happened. And the command reruns because of the timeout. Inasmuch as I can verify locally, I have. Of course, it's not quite as convenient to see into things on the local instance as it is via Atlas.

Note: Since the outbox code is only on QA for the time being, the issue is only visible there. I'm concerned about pushing current QA code to production given the strain this puts on Mongo, but at the same time, this issue should actually be less problematic the more activity there is in the outbox.

In regards to updating the .NET in some way to do with indexing, I'm not sure what you mean? I created the index in the db. What do I need to do in the code?

Reviewable status: 0 of 1 files reviewed, all discussions resolved (waiting on @ddaspit and @johnml1135)

@johnml1135
Copy link
Collaborator

The removal of the timeout appears like the right choice - I didn't understand the rational behind the timeout (it was only used in Machine for the Locks to not grind the whole software to a halt).

As for indexing, I think we would want a compond index. That way we can sort and index by (at least) the EngineId and CorpusId, if not also the Scripture reference. Here is how to implement it in C#.

I don't know the best way to add it - would we be able to add a new parameter of type CreateIndexModel<T> to AddRepository that can be applied to the collection as it is created.

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135
Copy link
Collaborator

Check that - CreateIndexModels are already being created for the pretranslations - but they might not be ideal. I pushed some code - what do you think?

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

Yes, I already created a compound index in Atlas on pretranslations on the engineRef and modelRevision (the recommended index given the queries from SF although it looks like maybe a compound index that includes the textId as well could help so I'll go ahead and make that as well). But I see what you're saying: You are wanting me to verify that those indices exist programmatically in the the mongo instance on startup - I can do that. For Atlas, it probably isn't as necessary, but it won't hurt.

Reviewable status: 1 of 4 files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135
Copy link
Collaborator

Can you review the code that I pushed and see if it resolves the issues? I also checked for other places in the code that look like they should have a multi-index but don't.

…ful compound indices on pretranslations (and remove redundant index on engineRef).
@Enkidu93
Copy link
Collaborator Author

I think this is the arrangement we want (although I'd like to spend some more time peeking at all the indices at some point and giving it more thought). We don't want to remove the simple indices even if we do add compound indices unless the simple index is a prefix of the compound index. We also want to avoid unnecessary indices because they take up memory and make writes more expensive. I went ahead and looked at how frequently the indices are being used and removed ones that have never been used. I removed redundant simple indices and added the suggested compound index on pretranslations.

@johnml1135
Copy link
Collaborator

Please test on docker-compose with Atlas.

Copy link
Collaborator Author

@Enkidu93 Enkidu93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: 1 of 5 files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135
Copy link
Collaborator

:lgtm:

Copy link
Collaborator

@johnml1135 johnml1135 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, 1 of 1 files at r4, all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @ddaspit)

@johnml1135 johnml1135 merged commit af90cbc into main Jul 18, 2024
2 checks passed
@johnml1135 johnml1135 deleted the fix_inefficient_mongo_usage branch July 18, 2024 23:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better database indexing for pretranslations
3 participants