-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make events with missing parents go through eventcache #318
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Anastasios Papagiannis <[email protected]>
tpapagian
force-pushed
the
pr/apapag/parent_refcnt
branch
from
August 12, 2022 10:56
c32c93a
to
4b6e97a
Compare
When we bounce through the event cache its possible we never found the parent process. In this case we also get a ref cnt imbalance because the exit event will dec the ref but the exec does not. To fix this update the Retry logic to also search for the parent and if its missing populate it plus do refInc. And similarly on exit check for parent and if its not handled yet similarly go to the cache and wait, but this time do a refDec. Now its still possible the dec happens before the inc in this scenario, exec -> cache ------------------> parentFound IncRef exit -> decRef But, in this case its OK that the ref briefly was incorrect because the GC will wait for at minimum two iterations before removing it to account for this eventually consistent logic. Finally, there is the corner case where the exec is delayed for some extreme amount of time, the GC interval * 3, and then the exit is handled with its decRef. Now we've go an imbalance. But we tried really hard here and the system is not responsive so best we can do is carry on and let the cache reconcile this later. It could cause some entries to get froze in the cache until the LRU evicts them. This should be exceedingly rare and we will add counters for this case which would indicate the GC interval should be extended. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Anastasios Papagiannis <[email protected]>
We currently have Failed Parent lookup counters stuck in a case that is relatively common. Namely, when a child event is handled before its parent so we need to send it to the cache. The real failure is if the process makes it through the caches 3 iterations and still doesn't have a parent. We can track this from event stream. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Anastasios Papagiannis <[email protected]>
The previous commits introduce fixes in the cache regarding refcounts of parents. We also add an event in the cache if this misses its parent. This commit updates all the tests to check for parents and also do the appropriate fixes (previously we didn't provide any parents in most of these tests). This commit also does some cleanup in this code. Signed-off-by: Anastasios Papagiannis <[email protected]>
tpapagian
force-pushed
the
pr/apapag/parent_refcnt
branch
from
August 12, 2022 10:58
4b6e97a
to
7e7b2b4
Compare
tpapagian
changed the title
Make missing parents go through eventcache
Make events with missing parents go through eventcache
Aug 12, 2022
Now eventcache handles the RefDec() calls both for process and parent if the exit event goes into the eventcache. So calling RefDec() before checking if we need to cache the event may results in double calls to RefDec(). Signed-off-by: Anastasios Papagiannis <[email protected]>
tpapagian
force-pushed
the
pr/apapag/parent_refcnt
branch
from
August 12, 2022 12:22
3adc260
to
03bdc1c
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we bounce through the event cache its possible we never found the
parent process. In this case we also get a ref cnt imbalance because
the exit event will dec the ref but the exec does not.
To fix this update the Retry logic to also search for the parent
and if its missing populate it plus do refInc. And similarly
on exit check for parent and if its not handled yet similarly
go to the cache and wait, but this time do a refDec.
Now its still possible the dec happens before the inc in this
scenario,
But, in this case its OK that the ref briefly was incorrect because
the GC will wait for at minimum two iterations before removing it
to account for this eventually consistent logic.
Finally, there is the corner case where the exec is delayed for
some extreme amount of time, the GC interval * 3, and then the
exit is handled with its decRef. Now we've go an imbalance. But
we tried really hard here and the system is not responsive so
best we can do is carry on and let the cache reconcile this later.
It could cause some entries to get froze in the cache until the
LRU evicts them. This should be exceedingly rare and we will
add counters for this case which would indicate the GC interval
should be extended.
Signed-off-by: John Fastabend [email protected]
Signed-off-by: Anastasios Papagiannis [email protected]