-
Notifications
You must be signed in to change notification settings - Fork 782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PVF: more filesystem sandboxing #1373
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm no Landlock expert, and I've just left some nits; none of them is a blocker. I still believe it should be burned in on Versi as the change is significant. Overall, it's a great improvement to the PVF security, and I do appreciate that work, that's really cool!
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PartiallyEnforced => LandlockStatus::PartiallyEnforced, | ||
NotEnforced => LandlockStatus::NotEnforced, | ||
/// Delete all env vars to prevent malicious code from accessing them. | ||
pub fn remove_env_vars(worker_kind: WorkerKind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be better to do this when spawning the process (e.g. execvpe
allows you to pass a custom envp
), otherwise this info will most likely still be in memory anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously tried that here, but when testing on my Mac it inserted some env var. Since we don't care about Macs so I went back to that approach and added an exception for that one env var.
let _ = tokio::fs::remove_file(socket_path).await; | ||
let socket_path = worker_dir::socket(&worker_dir_path); | ||
let stream = UnixStream::connect(&socket_path).await?; | ||
let _ = tokio::fs::remove_file(&socket_path).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above connect fails, we will never remove the socket? Isn't there some RAII primitive available?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or defer
like in Go. :P (One of the few nice things about that language.) There are some crates for defer
in Rust, but I don't want to introduce a dep or extra code just for this one case.
Anyway, good point, but I think here it's fine because 1. the socket is always created in the worker dir, which we remove on worker death as well as host startup, and has a random name so it can't be reused, and 2. if we fail to connect, the worker dies and we try to re-create it from scratch, with a new worker dir and socket and everything.
#[cfg(target_os = "linux")] | ||
if security_status.can_enable_landlock { | ||
let landlock_status = | ||
security::landlock::enable_for_worker(worker_kind, &worker_dir_path); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we do this above already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean, near the beginning of the function? I want to enable landlock after the socket's been removed, so we don't need an exception for it. Any extra exception would increase surface area for attacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... this is a good point; why not do it before spawning tokio? (:
You can still remove the socket; use normal synchronous std
APIs to create and remove it, then enable landlock, and then you can use tokio's from_std
to convert it to a tokio socket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point. :)
- Fix `libc::syscall` call (use `as_ptr()` instead of references) - Use absolute path for MS_BIND mount
- Moved socket connection to beginning of worker startup. - Seemed to be causing landlock violations that didn't happen before. - Removed the conversion to a tokio socket, to rule this out as a cause (we have to eventually remove the tokio dependency anyway). - Realized we were still removing the socket on the host-side, which wasn't needed (on failed rendezvous the host will just wipe the whole worker dir). Removed this removal, to remove a potential cause. - Thought we needed a landlock exception for the socket, so I expanded `try_restrict` to allow for multiple exceptions (wasn't needed in the end, but left it in because why not) - Saw that Landlock was ignoring exceptions for files that didn't exist. I added a check for this case. It clued me in to the actual problem. - Added a bunch of logging and some tests to try to narrow down this befuddling issue. - The socket was fine. Tthe issue turned out to be that landlock exceptions are based on fd's and not paths. We were creating new files for each new job, so except for the first excepted files, no exceptions were accepted. - Applied an exception to the whole worker dir and called it a day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the PR description outdated with the working dir being on /tmp?
Good catch @eskimor, updated. I'm tempted to whack the big green merge button, but this PR still waiting on a Versi burn-in. |
The CI pipeline was cancelled due to failure one of the required jobs. |
There were a few places where this can be used for a slight improvement to code quality.
edfe5b7
to
279db25
Compare
Didn't observe any issues in Versi. Let's merge. |
bot merge |
@mrcnski |
* master: (24 commits) Init System Parachain storage versions and add migration check jobs to CI (#1344) no-bound derives: Use absolute path for `core` (#1763) migrate alliance, fast-unstake and bags list to use derive-impl (#1636) Tvl pool staking (#1322) improve service error (#1734) frame-support: `RuntimeDebug\Eq\PartialEq` impls for `Imbalance` (#1717) Point documentation links to monorepo (#1741) [NPoS] Fix for Reward Deficit in the pool (#1255) Move import queue from `ChainSync` to `SyncingEngine` (#1736) Enable mocking contracts (#1331) Revert "fix(review-bot): pull secrets from `master` environment" (#1748) Remove kusama and polkadot runtime crates (#1731) Use `Extensions` to register offchain worker custom extensions (#1719) [RPC-Spec-V2] chainHead: use integer for block index and adjust RuntimeVersion JSON format (#1666) fix(review-bot): pull secrets from `master` environment (#1745) Fix `subkey inspect` output text padding (#1744) archive: Implement height, hashByHeight and call (#1582) rpc/client: Propagate `rpc_methods` method to reported methods (#1713) rococo-runtime: `RococoGenesisExt` removed (#1490) PVF: more filesystem sandboxing (#1373) ...
* tsv-disabling-node-side: (24 commits) Init System Parachain storage versions and add migration check jobs to CI (#1344) no-bound derives: Use absolute path for `core` (#1763) migrate alliance, fast-unstake and bags list to use derive-impl (#1636) Tvl pool staking (#1322) improve service error (#1734) frame-support: `RuntimeDebug\Eq\PartialEq` impls for `Imbalance` (#1717) Point documentation links to monorepo (#1741) [NPoS] Fix for Reward Deficit in the pool (#1255) Move import queue from `ChainSync` to `SyncingEngine` (#1736) Enable mocking contracts (#1331) Revert "fix(review-bot): pull secrets from `master` environment" (#1748) Remove kusama and polkadot runtime crates (#1731) Use `Extensions` to register offchain worker custom extensions (#1719) [RPC-Spec-V2] chainHead: use integer for block index and adjust RuntimeVersion JSON format (#1666) fix(review-bot): pull secrets from `master` environment (#1745) Fix `subkey inspect` output text padding (#1744) archive: Implement height, hashByHeight and call (#1582) rpc/client: Propagate `rpc_methods` method to reported methods (#1713) rococo-runtime: `RococoGenesisExt` removed (#1490) PVF: more filesystem sandboxing (#1373) ...
Done in this PR
pivot_root
(a betterchroot
), if possible./
ifpivot_root
is enabled.For even more background, see the issues and PRs in the "Related" section.
Worker dir
Here is how the filesystem structure has changed.
Before:
Now:
Some more notes on the worker dir:
IdleWorker
struct.WorkerHandle
struct,IdleWorker
contains data that should be used when starting a job. I know, confusing. :/Considered but not done
Shared memory
Since we are already doing a big refactor, we discussed at length switching the IPC mechanism to shared memory. At the least, switching out the filesystem IPC (all the worker dir stuff) for transferring artifacts between the host and workers.
We decided that there is not a big enough benefit for what would be an even more drastic change. Without filesystem IPC we could totally remove FS access for workers, but with multiple layers of sandboxing already (including Landlock and now
pivot_root
), the extra security seemed like overkill. Shared memory also comes at a performance cost: we have to write to it and read from it. I'd estimate that to cost 1-2ms, which with execution taking up to 10ms total, would be somewhat of a regression relatively speaking.Shared memory was also considered in the past here with some more downsides listed. We decided to continue with the file-based architecture, albeit significantly hardened.
Follow-ups
Fork instead of spawning thread for better isolation (see #574)
There's still a potential attack where a lingering malicious job can access the worker dir state for subsequent jobs. With the isolation described in #574 this can be fixed if we apply Landlock with no exceptions in the forked process.
Refactor argument passing (raised #1576)
Right now if you want to add some data that is passed from the host startup to a worker, there is a long chain of calls it has to go through. It should be refactored somehow. For example, maybe we could pass around some super-objects to make adding new fields a one-line change. It would make such changes much easier in the future.
On the worker-side, we often pass around the worker kind, pid, and dir together. Perhaps there could be one
WorkerInfo
struct.In general I felt like the PVF code is in dire need of cleaning up. Any other ideas would be highly appreciated.
Remove puppet worker (done in #1449)
Related to the above, it would be awesome to get this issue done:
#583
It would remove some of the code duplication you may notice in this PR.
TODO
Related
Supersedes paritytech/polkadot#7580
Closes #584
Closes #600