Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dyld shared cache parsing #305

Closed
wants to merge 2 commits into from
Closed

Conversation

mstange
Copy link
Contributor

@mstange mstange commented May 18, 2021

Fixes #268.

This PR is based on top of #304. For my purposes, #304 on its own is enough to unblock me, because I can have the dyld cache parsing code outside of object. But it might be worth landing into object anyways.
So this is more of a "please take the parts you like" kind of PR.

mstange added 2 commits May 18, 2021 18:47
This allows parsing Mach-O images inside dyld shared cache files (gimli-rs#268):
The dyld shared cache contains multiple images at different offsets; all these
images share the same address space for absolute offsets such as symoff. Due to
these absolute offsets, one cannot just parse the images by subsetting the input
slice and parsing at header offset zero.

This patch is a breaking change because it adds a header_offset argument to the
MachHeader methods load_commands and uuid, and MachHeader is part of the public API.
This implements just enough to get the path and header offset of each contained image.
It also adds a function to get an "any" File object for the image, so that the caller
doesn't need to write code twice for 32 and 64 bit images and can instead benefit from
the enum-based dynamic dispatch.
This commit also adds two "examples", for printing the list of images in the cache and
for dumping an object from inside the cache.
@philipc
Copy link
Contributor

philipc commented May 19, 2021

I think this is fine to add to this crate. I want to experiment a bit with the API though, so I'll probably edit this and keep you as a coauthor.

@mstange
Copy link
Contributor Author

mstange commented May 19, 2021

Sounds good!


/// The offset in the dyld cache file where this image starts.
pub fn offset(&self) -> u64 {
self.image_info.address.get(self.endian) - self.first_mapping_address
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this assuming that all images are in the first mapping?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure - this is based on the code in dsc_iterator.cpp, specifically the function forEachDylibInCache which calls this offset firstRegionAddress. It applies the same offset to all images.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed this to avoid the assumption. It's unlikely it will ever matter, but it seems more correct.

Copy link
Contributor Author

@mstange mstange May 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read your implementation and I agree it looks more correct. To make sure it doesn't break anything, I looked at a few dyld shared cache files. The "correct" code is equivalent to the simple code if there is only one mapping, or if all the images are in the first mapping, or if all the mappings containing images have the same value for mapping.address - mapping.offset as the first mapping.
It seems like the second option is the case: In the cache files I've checked, all images are in the first mapping. The x86_64 and x86_64h caches on 10.14.6 and on 11.3.1 each have three mappings, and the arm64e cache on 11.3.1 has seven mappings. The mappings have different values for mapping.address - mapping.offset, but that's fine since all images are in the first mapping.
So, in summary, what you have looks great and shouldn't break anything.

let slice_containing_path = self
.data
.read_bytes_at(path_offset, MAX_PATH_LEN)
.read_error("Couldn't read path")?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding a ReadRef::read_string_at or something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Plans for macOS Big Sur system libraries in the dyld shared cache?
2 participants