Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make events with missing parents go through eventcache #318

Merged
merged 5 commits into from
Aug 12, 2022

Conversation

tpapagian
Copy link
Member

@tpapagian tpapagian commented Aug 12, 2022

When we bounce through the event cache its possible we never found the
parent process. In this case we also get a ref cnt imbalance because
the exit event will dec the ref but the exec does not.

To fix this update the Retry logic to also search for the parent
and if its missing populate it plus do refInc. And similarly
on exit check for parent and if its not handled yet similarly
go to the cache and wait, but this time do a refDec.

Now its still possible the dec happens before the inc in this
scenario,

  exec -> cache ------------------> parentFound IncRef
              exit -> decRef

But, in this case its OK that the ref briefly was incorrect because
the GC will wait for at minimum two iterations before removing it
to account for this eventually consistent logic.

Finally, there is the corner case where the exec is delayed for
some extreme amount of time, the GC interval * 3, and then the
exit is handled with its decRef. Now we've go an imbalance. But
we tried really hard here and the system is not responsive so
best we can do is carry on and let the cache reconcile this later.
It could cause some entries to get froze in the cache until the
LRU evicts them. This should be exceedingly rare and we will
add counters for this case which would indicate the GC interval
should be extended.

Signed-off-by: John Fastabend [email protected]
Signed-off-by: Anastasios Papagiannis [email protected]

Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Anastasios Papagiannis <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/parent_refcnt branch from c32c93a to 4b6e97a Compare August 12, 2022 10:56
jrfastab and others added 3 commits August 12, 2022 10:58
When we bounce through the event cache its possible we never found the
parent process. In this case we also get a ref cnt imbalance because
the exit event will dec the ref but the exec does not.

To fix this update the Retry logic to also search for the parent
and if its missing populate it plus do refInc. And similarly
on exit check for parent and if its not handled yet similarly
go to the cache and wait, but this time do a refDec.

Now its still possible the dec happens before the inc in this
scenario,

  exec -> cache ------------------> parentFound IncRef
                exit -> decRef

But, in this case its OK that the ref briefly was incorrect because
the GC will wait for at minimum two iterations before removing it
to account for this eventually consistent logic.

Finally, there is the corner case where the exec is delayed for
some extreme amount of time, the GC interval * 3, and then the
exit is handled with its decRef. Now we've go an imbalance. But
we tried really hard here and the system is not responsive so
best we can do is carry on and let the cache reconcile this later.
It could cause some entries to get froze in the cache until the
LRU evicts them. This should be exceedingly rare and we will
add counters for this case which would indicate the GC interval
should be extended.

Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Anastasios Papagiannis <[email protected]>
We currently have Failed Parent lookup counters stuck in a case that is
relatively common. Namely, when a child event is handled before its parent
so we need to send it to the cache.

The real failure is if the process makes it through the caches 3 iterations
and still doesn't have a parent. We can track this from event stream.

Signed-off-by: John Fastabend <[email protected]>
Signed-off-by: Anastasios Papagiannis <[email protected]>
The previous commits introduce fixes in the cache regarding refcounts
of parents. We also add an event in the cache if this misses its
parent.

This commit updates all the tests to check for parents and also
do the appropriate fixes (previously we didn't provide any parents
in most of these tests).

This commit also does some cleanup in this code.

Signed-off-by: Anastasios Papagiannis <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/parent_refcnt branch from 4b6e97a to 7e7b2b4 Compare August 12, 2022 10:58
@tpapagian tpapagian changed the title Make missing parents go through eventcache Make events with missing parents go through eventcache Aug 12, 2022
@tpapagian tpapagian marked this pull request as ready for review August 12, 2022 11:40
@tpapagian tpapagian requested a review from a team as a code owner August 12, 2022 11:40
@tpapagian tpapagian requested a review from kaworu August 12, 2022 11:40
Now eventcache handles the RefDec() calls both for process and parent
if the exit event goes into the eventcache.

So calling RefDec() before checking if we need to cache the event
may results in double calls to RefDec().

Signed-off-by: Anastasios Papagiannis <[email protected]>
@tpapagian tpapagian force-pushed the pr/apapag/parent_refcnt branch from 3adc260 to 03bdc1c Compare August 12, 2022 12:22
@jrfastab jrfastab merged commit db61ded into main Aug 12, 2022
@jrfastab jrfastab deleted the pr/apapag/parent_refcnt branch August 12, 2022 20:33
@tpapagian tpapagian restored the pr/apapag/parent_refcnt branch August 23, 2022 17:09
@tpapagian tpapagian deleted the pr/apapag/parent_refcnt branch August 23, 2022 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants