Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loader: simplify template cache invalidation #29449

Merged
merged 4 commits into from
Jan 10, 2024
Merged

Conversation

lmb
Copy link
Contributor

@lmb lmb commented Nov 28, 2023

elf: fix ELF.Write for files obtained from NewELF

Trying to call ELF.Write on a file obtained from NewELF currently fails with
EINVAL, since ELF.Write expects ELF.file to be non-nil.

Fix the code so that it works with NewELF as well.

Signed-off-by: Lorenz Bauer <[email protected]>

loader: remove NewObjectCache

There are no external users of the object cache, remove the public 
constructor.

Signed-off-by: Lorenz Bauer <[email protected]>

loader: simplify template cache invalidation

The loader tries to avoid recompiling endpoint programs if possible by 
caching them on disk, based on their configuration. To allow reuse it 
formats the configuration as a header file and hashes that. The hash is then
used as a key for the cache.

It also tries to be resilient to something or someone deleting a cached 
program by creating an inotify watcher on the state directory. This has 
several drawbacks:

* inotify watches are a global resource and can therefore run out
* Eviction of deleted cache objects is asynchronous. The loader may
 therefore intermittently try to use a non-existing cached file.

Rewrite the cache to just fall back to recompilation if opening the cached
object from disk fails. This allows getting rid of all the inotify code and
is functionally equivalent.

Signed-off-by: Lorenz Bauer <[email protected]>

serializer: remove package

There are no users left in the code base, remove the package.

Signed-off-by: Lorenz Bauer <[email protected]>

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 28, 2023
@lmb lmb added sig/loader Impacts the loading of BPF programs into the kernel. release-note/misc This PR makes changes that have no direct user impact. labels Nov 28, 2023
@maintainer-s-little-helper maintainer-s-little-helper bot removed dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. labels Nov 28, 2023
@lmb lmb force-pushed the remove-inotify branch 2 times, most recently from df59c14 to 5933359 Compare November 28, 2023 15:31
@lmb
Copy link
Contributor Author

lmb commented Dec 14, 2023

/test

@lmb
Copy link
Contributor Author

lmb commented Dec 14, 2023

/test

@lmb lmb marked this pull request as ready for review December 15, 2023 09:40
@lmb lmb requested review from a team as code owners December 15, 2023 09:40
@dylandreimerink dylandreimerink removed their request for review December 15, 2023 09:49
@lmb lmb requested a review from ti-mo December 15, 2023 10:31
@lmb
Copy link
Contributor Author

lmb commented Dec 15, 2023

/test

Copy link
Member

@bimmlerd bimmlerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CODEOWNERS lgtm (did not look at the rest)

@danehans
Copy link
Contributor

danehans commented Dec 18, 2023

@lmb the fix references a downstream issue. Do you mind creating and linking an upstream issue so I can better understand the reason for the PR?

@ti-mo
Copy link
Contributor

ti-mo commented Dec 19, 2023

@danehans We've been wanting to shorten the compiler + loader pipeline for a while now (see #25004, for example), but this particular defect came up during internal chaos testing:

The fsnotify documentation states:

When a file is removed a REMOVE event won't be emitted until all file descriptors are closed; it will emit a CHMOD instead […]

The agent doesn't hold open file descriptors to the files in the cache. It also ignores all fsnotify events other than REMOVE. Therefore, if we hold a file descriptor open from another process and then delete the file, the REMOVE event will never be triggered, and the cache won't see the file as deleted. However it will still attempt to read from the file.

Also, as mentioned in the commit messages in this PR, inotify handles are a global resource, and global node configuration can interfere with this. As such, to reduce the amount of moving parts and reduce the potential for failures to occur, we're no longer going to rely on inotify.

Copy link
Contributor

@ti-mo ti-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks! I'd like a code walk of this once you're back, I don't have enough understanding of the surrounding system, so I can't give an in-depth review off hand aside from the odd nit.

pkg/datapath/loader/cache.go Outdated Show resolved Hide resolved
pkg/datapath/loader/cache.go Outdated Show resolved Hide resolved
@lmb
Copy link
Contributor Author

lmb commented Jan 3, 2024

/test

@lmb lmb removed the request for review from a team January 3, 2024 12:40
@lmb lmb enabled auto-merge January 3, 2024 12:41
@lmb
Copy link
Contributor Author

lmb commented Jan 3, 2024

I don't have enough understanding of the surrounding system, so I can't give an in-depth review off hand aside from the odd nit

Good thing we are early in the cycle 😅

@lmb
Copy link
Contributor Author

lmb commented Jan 4, 2024

/test

@lmb lmb requested review from a team as code owners January 9, 2024 15:33
@lmb lmb requested review from nathanjsweet and tklauser January 9, 2024 15:33
@lmb
Copy link
Contributor Author

lmb commented Jan 9, 2024

/ci-ginkgo

@lmb
Copy link
Contributor Author

lmb commented Jan 9, 2024

/ci-ginkgo

@lmb
Copy link
Contributor Author

lmb commented Jan 9, 2024

/ci-ginkgo

Copy link
Member

@nathanjsweet nathanjsweet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lmb
Copy link
Contributor Author

lmb commented Jan 10, 2024

/ci-ginkgo

lmb added 4 commits January 10, 2024 11:14
Trying to call ELF.Write on a file obtained from NewELF currently
fails with EINVAL, since ELF.Write expects ELF.file to be non-nil.

Fix the code so that it works with NewELF as well.

Signed-off-by: Lorenz Bauer <[email protected]>
There are no external users of the object cache, remove the public
constructor.

Signed-off-by: Lorenz Bauer <[email protected]>
The loader tries to avoid recompiling endpoint programs if possible by
caching them on disk, based on their configuration. To allow reuse it
formats the configuration as a header file and hashes that. The hash
is then used as a key for the cache.

It also tries to be resilient to something or someone deleting a cached
program by creating an inotify watcher on the state directory. This has
several drawbacks:

* inotify watches are a global resource and can therefore run out
* Eviction of deleted cache objects is asynchronous. The loader may
  therefore intermittently try to use a non-existing cached file.

Rewrite the cache to just fall back to recompilation if opening the
cached object from disk fails. This allows getting rid of all the
inotify code and is functionally equivalent.

Signed-off-by: Lorenz Bauer <[email protected]>
There are no users left in the code base, remove the package.

Signed-off-by: Lorenz Bauer <[email protected]>
@lmb
Copy link
Contributor Author

lmb commented Jan 10, 2024

/ci-ginkgo

@lmb
Copy link
Contributor Author

lmb commented Jan 10, 2024

/test

@lmb lmb added this pull request to the merge queue Jan 10, 2024
Merged via the queue into cilium:main with commit 264e9d3 Jan 10, 2024
62 checks passed
@lmb lmb deleted the remove-inotify branch January 10, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/misc This PR makes changes that have no direct user impact. sig/loader Impacts the loading of BPF programs into the kernel.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants