-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dyld shared cache parsing #305
Conversation
This allows parsing Mach-O images inside dyld shared cache files (gimli-rs#268): The dyld shared cache contains multiple images at different offsets; all these images share the same address space for absolute offsets such as symoff. Due to these absolute offsets, one cannot just parse the images by subsetting the input slice and parsing at header offset zero. This patch is a breaking change because it adds a header_offset argument to the MachHeader methods load_commands and uuid, and MachHeader is part of the public API.
This implements just enough to get the path and header offset of each contained image. It also adds a function to get an "any" File object for the image, so that the caller doesn't need to write code twice for 32 and 64 bit images and can instead benefit from the enum-based dynamic dispatch. This commit also adds two "examples", for printing the list of images in the cache and for dumping an object from inside the cache.
I think this is fine to add to this crate. I want to experiment a bit with the API though, so I'll probably edit this and keep you as a coauthor. |
Sounds good! |
|
||
/// The offset in the dyld cache file where this image starts. | ||
pub fn offset(&self) -> u64 { | ||
self.image_info.address.get(self.endian) - self.first_mapping_address |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this assuming that all images are in the first mapping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure - this is based on the code in dsc_iterator.cpp, specifically the function forEachDylibInCache
which calls this offset firstRegionAddress
. It applies the same offset to all images.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed this to avoid the assumption. It's unlikely it will ever matter, but it seems more correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read your implementation and I agree it looks more correct. To make sure it doesn't break anything, I looked at a few dyld shared cache files. The "correct" code is equivalent to the simple code if there is only one mapping, or if all the images are in the first mapping, or if all the mappings containing images have the same value for mapping.address - mapping.offset
as the first mapping.
It seems like the second option is the case: In the cache files I've checked, all images are in the first mapping. The x86_64 and x86_64h caches on 10.14.6 and on 11.3.1 each have three mappings, and the arm64e cache on 11.3.1 has seven mappings. The mappings have different values for mapping.address - mapping.offset
, but that's fine since all images are in the first mapping.
So, in summary, what you have looks great and shouldn't break anything.
let slice_containing_path = self | ||
.data | ||
.read_bytes_at(path_offset, MAX_PATH_LEN) | ||
.read_error("Couldn't read path")?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about adding a ReadRef::read_string_at
or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me.
Fixes #268.
This PR is based on top of #304. For my purposes, #304 on its own is enough to unblock me, because I can have the dyld cache parsing code outside of object. But it might be worth landing into object anyways.
So this is more of a "please take the parts you like" kind of PR.