lib/tar: Pre-filter tar archive in Rust#112
Conversation
e1c16da to
2961a88
Compare
|
|
||
| // Completely discard stuff outside of /usr | ||
| if !(matches!(path.as_str(), "/" | "/usr") || path.starts_with("usr/")) { | ||
| continue; |
There was a problem hiding this comment.
I'd throw a warn!() in here. I'm pretty sure we'll discover a good bunch of nasty things.
There was a problem hiding this comment.
Good point, though since this is a library we should probably allow the caller to collect this data and log/print it however it wants.
Will look at that.
There was a problem hiding this comment.
In the bigger picture I am currently thinking of this as a shortcut - longer term we'll strongly encourage the model in #12 (comment) where we have a required RUN ostree container-finalize-bootable incantation that will clean this stuff up at build time.
Rather than trying to extend the ostree C code to support arbitrary transformations, let's do pre-parsing here in safe Rust. We keep the `/etc` → `/usr/etc` bits, but we also just completely drop everything not in `/usr` now. As noted in the comment, this pre-validation will hopefully also catch any corrupt tarballs that might be exploitable in the C libarchive codebase.
2961a88 to
520fc40
Compare
|
OK, that was a fun deep dive into path libraries in Rust, specifically the components method. We now parse, normalize and filter the input path in a much more robust way, and this updated code now also returns a btreemap from "prefix -> number of filtered paths" which should be sufficient to start. |
| } | ||
| } | ||
|
|
||
| /// A [`std::io::Write`] implementation backed by an asynchronous source. |
|
Hmm, what we should really do here actually is store the metadata about filtered paths in the commit object. However, doing that would require splitting up the write/commit phases, which in turn really wants us to be doing all of this natively in Rust instead of forking off |
|
Any other comments on this, or good to go? |
|
All good from my side 👍. As (very subjective) feedback on your last comment, I don't personally feel the need to record that information persistently. It can be interesting to log and inspect it at runtime, but in the steady state I guess it's unlikely that we'd want to parse/analyze that from the commit object itself. |
Rather than trying to extend the ostree C code to support
arbitrary transformations, let's do pre-parsing here in safe Rust.
We keep the
/etc→/usr/etcbits, but we also just completelydrop everything not in
/usrnow.As noted in the comment, this pre-validation will hopefully also
catch any corrupt tarballs that might be exploitable in the C libarchive
codebase.