Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PVF: more filesystem sandboxing #1373

Merged
merged 22 commits into from
Sep 28, 2023

Conversation

mrcnski
Copy link
Contributor

@mrcnski mrcnski commented Sep 3, 2023

Done in this PR

  • Create a worker directory for each worker.
    • The directory contains files with known names (e.g. "socket", "artifact") so we don't have to pass these paths to the workers.
    • We only pass the worker dir path, once at the start of the worker lifetime.
    • The worker dir is cleared after each job so that malicious jobs can't access unintended data.
    • See "Worker dir" section.
  • The worker dir is sandboxed with pivot_root (a better chroot), if possible.
    • This means the workers shouldn't have access to anything outside their worker dir.
  • Instead of running Landlock once per thread, which is insecure, we run it on the whole process when a worker starts.
    • There is only one exception for the worker dir (mentioned above). This is / if pivot_root is enabled.
    • Jobs can't access any files outside the worker dir and shouldn't have any rights they don't strictly need.
  • We get the statuses of security features once per host startup.
    • This is so we can emit a warning as soon as possible and so workers don't duplicate the warning.
    • In the future we plan for this to be an early error for validators, with an optional flag to bypass the checks.
  • Refactor/simplify some landlock code.
    • We moved landlock enablement from worker-specific code to the common worker setup function.
    • Some compatibility code for Mac turned out to be unneeded and was removed.

For even more background, see the issues and PRs in the "Related" section.

Worker dir

Here is how the filesystem structure has changed.

Before:

+ /<cache_path>/
  - artifact-1
  - artifact-2
  - [...]
+ /tmp/
  - socket-1       (random name) (created by host)
  - socket-2       (random name) (created by host)
  - tmp-artifact-1 (random name) (created by worker) (prepare-only)
  - tmp-artifact-2 (random name) (created by worker) (prepare-only)
  - [...]

Now:

+ /<cache_path>/
  - artifact-1
  - artifact-2
  - [...]
  - worker-dir-1/  (new `/` for worker-1)
    + socket                            (created by host)
    + tmp-artifact                      (created by host) (prepare-only)
    + artifact     (link -> artifact-1) (created by host) (execute-only)
  - worker-dir-2/  (new `/` for worker-2)
    + [...]

Some more notes on the worker dir:

  • It is created when spawning a worker and attached to the IdleWorker struct.
    • As opposed to the WorkerHandle struct, IdleWorker contains data that should be used when starting a job. I know, confusing. :/
  • The host populates the worker dir before a job and clears it right after.
  • Jobs have no access to other artifacts or files from other jobs.
  • The worker dir has some attached RAII to delete it when it's dropped.
  • For execute jobs, we link the artifact into the worker dir.
    • Copying it would be too expensive.
    • Moving it wouldn't work because other workers may need it concomittantly.
    • We use a hard-link instead of a symlink, to avoid possible situations where the workers can follow a symlink to get out of the pivot_root jail.

Considered but not done

Shared memory

Since we are already doing a big refactor, we discussed at length switching the IPC mechanism to shared memory. At the least, switching out the filesystem IPC (all the worker dir stuff) for transferring artifacts between the host and workers.

We decided that there is not a big enough benefit for what would be an even more drastic change. Without filesystem IPC we could totally remove FS access for workers, but with multiple layers of sandboxing already (including Landlock and now pivot_root), the extra security seemed like overkill. Shared memory also comes at a performance cost: we have to write to it and read from it. I'd estimate that to cost 1-2ms, which with execution taking up to 10ms total, would be somewhat of a regression relatively speaking.

Shared memory was also considered in the past here with some more downsides listed. We decided to continue with the file-based architecture, albeit significantly hardened.

Follow-ups

Fork instead of spawning thread for better isolation (see #574)

There's still a potential attack where a lingering malicious job can access the worker dir state for subsequent jobs. With the isolation described in #574 this can be fixed if we apply Landlock with no exceptions in the forked process.

Refactor argument passing (raised #1576)

Right now if you want to add some data that is passed from the host startup to a worker, there is a long chain of calls it has to go through. It should be refactored somehow. For example, maybe we could pass around some super-objects to make adding new fields a one-line change. It would make such changes much easier in the future.

On the worker-side, we often pass around the worker kind, pid, and dir together. Perhaps there could be one WorkerInfo struct.

In general I felt like the PVF code is in dire need of cleaning up. Any other ideas would be highly appreciated.

Remove puppet worker (done in #1449)

Related to the above, it would be awesome to get this issue done:

#583

It would remove some of the code duplication you may notice in this PR.

TODO

  • Versi burn-in

Related

Supersedes paritytech/polkadot#7580
Closes #584
Closes #600

@mrcnski mrcnski added T0-node This PR/Issue is related to the topic “node”. T8-parachains_engineering T8-polkadot This PR/Issue is related to/affects the Polkadot network. labels Sep 3, 2023
@mrcnski mrcnski self-assigned this Sep 3, 2023
Copy link
Contributor

@s0me0ne-unkn0wn s0me0ne-unkn0wn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm no Landlock expert, and I've just left some nits; none of them is a blocker. I still believe it should be burned in on Versi as the change is significant. Overall, it's a great improvement to the PVF security, and I do appreciate that work, that's really cool!

polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/prepare-worker/src/lib.rs Outdated Show resolved Hide resolved
@mrcnski

This comment was marked as outdated.

@mrcnski mrcnski requested a review from alindima September 10, 2023 13:50
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good

polkadot/node/core/pvf/common/src/worker/security.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/common/src/execute.rs Outdated Show resolved Hide resolved
polkadot/node/core/pvf/src/host.rs Outdated Show resolved Hide resolved
@mrcnski mrcnski requested a review from alindima September 11, 2023 14:04
Copy link
Contributor

@alindima alindima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

PartiallyEnforced => LandlockStatus::PartiallyEnforced,
NotEnforced => LandlockStatus::NotEnforced,
/// Delete all env vars to prevent malicious code from accessing them.
pub fn remove_env_vars(worker_kind: WorkerKind) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better to do this when spawning the process (e.g. execvpe allows you to pass a custom envp), otherwise this info will most likely still be in memory anyway.

Copy link
Contributor Author

@mrcnski mrcnski Sep 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I previously tried that here, but when testing on my Mac it inserted some env var. Since we don't care about Macs so I went back to that approach and added an exception for that one env var.

let _ = tokio::fs::remove_file(socket_path).await;
let socket_path = worker_dir::socket(&worker_dir_path);
let stream = UnixStream::connect(&socket_path).await?;
let _ = tokio::fs::remove_file(&socket_path).await;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above connect fails, we will never remove the socket? Isn't there some RAII primitive available?

Copy link
Contributor Author

@mrcnski mrcnski Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or defer like in Go. :P (One of the few nice things about that language.) There are some crates for defer in Rust, but I don't want to introduce a dep or extra code just for this one case.

Anyway, good point, but I think here it's fine because 1. the socket is always created in the worker dir, which we remove on worker death as well as host startup, and has a random name so it can't be reused, and 2. if we fail to connect, the worker dies and we try to re-create it from scratch, with a new worker dir and socket and everything.

#[cfg(target_os = "linux")]
if security_status.can_enable_landlock {
let landlock_status =
security::landlock::enable_for_worker(worker_kind, &worker_dir_path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we do this above already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean, near the beginning of the function? I want to enable landlock after the socket's been removed, so we don't need an exception for it. Any extra exception would increase surface area for attacks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... this is a good point; why not do it before spawning tokio? (:

You can still remove the socket; use normal synchronous std APIs to create and remove it, then enable landlock, and then you can use tokio's from_std to convert it to a tokio socket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. :)

@mrcnski mrcnski requested a review from eskimor September 14, 2023 09:56
- Fix `libc::syscall` call (use `as_ptr()` instead of references)
- Use absolute path for MS_BIND mount
- Moved socket connection to beginning of worker startup.
- Seemed to be causing landlock violations that didn't happen before.
- Removed the conversion to a tokio socket, to rule this out as a cause (we have
  to eventually remove the tokio dependency anyway).
- Realized we were still removing the socket on the host-side, which wasn't
  needed (on failed rendezvous the host will just wipe the whole worker dir).
  Removed this removal, to remove a potential cause.
- Thought we needed a landlock exception for the socket, so I expanded
  `try_restrict` to allow for multiple exceptions (wasn't needed in the end, but
  left it in because why not)
- Saw that Landlock was ignoring exceptions for files that didn't exist. I added
  a check for this case. It clued me in to the actual problem.
- Added a bunch of logging and some tests to try to narrow down this befuddling
  issue.
- The socket was fine. Tthe issue turned out to be that landlock exceptions are
  based on fd's and not paths. We were creating new files for each new job, so
  except for the first excepted files, no exceptions were accepted.
- Applied an exception to the whole worker dir and called it a day.
@mrcnski mrcnski requested a review from koute September 18, 2023 07:45
Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the PR description outdated with the working dir being on /tmp?

@mrcnski
Copy link
Contributor Author

mrcnski commented Sep 19, 2023

Good catch @eskimor, updated.

I'm tempted to whack the big green merge button, but this PR still waiting on a Versi burn-in.

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: cargo-clippy
Logs: https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/3787494

There were a few places where this can be used for a slight improvement to code
quality.
@mrcnski mrcnski force-pushed the mrcnski/pvf-worker-sandbox-with-pivot-root branch from edfe5b7 to 279db25 Compare September 25, 2023 12:57
@mrcnski
Copy link
Contributor Author

mrcnski commented Sep 28, 2023

Didn't observe any issues in Versi. Let's merge.

@mrcnski
Copy link
Contributor Author

mrcnski commented Sep 28, 2023

bot merge

@command-bot
Copy link

command-bot bot commented Sep 28, 2023

@mrcnski bot merge and bot rebase are not supported anymore. Please use native Github "Auto-Merge" and "Update Branch" buttons instead.
image

@mrcnski mrcnski merged commit c1eb342 into master Sep 28, 2023
@mrcnski mrcnski deleted the mrcnski/pvf-worker-sandbox-with-pivot-root branch September 28, 2023 16:24
ordian added a commit that referenced this pull request Oct 3, 2023
* master: (24 commits)
  Init System Parachain storage versions and add migration check jobs to CI (#1344)
  no-bound derives: Use absolute path for `core` (#1763)
  migrate alliance, fast-unstake and bags list to use derive-impl (#1636)
  Tvl pool staking (#1322)
  improve service error (#1734)
  frame-support: `RuntimeDebug\Eq\PartialEq` impls for `Imbalance` (#1717)
  Point documentation links to monorepo (#1741)
  [NPoS] Fix for Reward Deficit in the pool (#1255)
  Move import queue from `ChainSync` to `SyncingEngine` (#1736)
  Enable mocking contracts (#1331)
  Revert "fix(review-bot): pull secrets from `master` environment" (#1748)
  Remove kusama and polkadot runtime crates (#1731)
  Use `Extensions` to register offchain worker custom extensions (#1719)
  [RPC-Spec-V2] chainHead: use integer for block index and adjust RuntimeVersion JSON format (#1666)
  fix(review-bot): pull secrets from `master` environment (#1745)
  Fix `subkey inspect` output text padding (#1744)
  archive: Implement height, hashByHeight and call (#1582)
  rpc/client: Propagate `rpc_methods` method to reported methods (#1713)
  rococo-runtime: `RococoGenesisExt` removed (#1490)
  PVF: more filesystem sandboxing (#1373)
  ...
ordian added a commit that referenced this pull request Oct 10, 2023
* tsv-disabling-node-side: (24 commits)
  Init System Parachain storage versions and add migration check jobs to CI (#1344)
  no-bound derives: Use absolute path for `core` (#1763)
  migrate alliance, fast-unstake and bags list to use derive-impl (#1636)
  Tvl pool staking (#1322)
  improve service error (#1734)
  frame-support: `RuntimeDebug\Eq\PartialEq` impls for `Imbalance` (#1717)
  Point documentation links to monorepo (#1741)
  [NPoS] Fix for Reward Deficit in the pool (#1255)
  Move import queue from `ChainSync` to `SyncingEngine` (#1736)
  Enable mocking contracts (#1331)
  Revert "fix(review-bot): pull secrets from `master` environment" (#1748)
  Remove kusama and polkadot runtime crates (#1731)
  Use `Extensions` to register offchain worker custom extensions (#1719)
  [RPC-Spec-V2] chainHead: use integer for block index and adjust RuntimeVersion JSON format (#1666)
  fix(review-bot): pull secrets from `master` environment (#1745)
  Fix `subkey inspect` output text padding (#1744)
  archive: Implement height, hashByHeight and call (#1582)
  rpc/client: Propagate `rpc_methods` method to reported methods (#1713)
  rococo-runtime: `RococoGenesisExt` removed (#1490)
  PVF: more filesystem sandboxing (#1373)
  ...
bkchr pushed a commit that referenced this pull request Apr 10, 2024
bkchr pushed a commit that referenced this pull request Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T0-node This PR/Issue is related to the topic “node”. T8-polkadot This PR/Issue is related to/affects the Polkadot network.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

PVF worker: apply sandboxing per-process PVF: consider spawning a new process per job
6 participants