Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

elf: Define Go Build ID constants #549

Merged
merged 3 commits into from
Jun 2, 2023
Merged

Conversation

tamird
Copy link
Contributor

@tamird tamird commented May 22, 2023

See individual commits.

src/read/elf/file.rs Outdated Show resolved Hide resolved
@philipc
Copy link
Contributor

philipc commented May 23, 2023

What do you need this for? I don't think this change is correct. build_id is used to locate DWARF debug info on linux systems. I don't think that the Go build id is used for this purpose.

@tamird
Copy link
Contributor Author

tamird commented May 23, 2023

Yeah, I wasn't sure whether it was correct to expose this in the same function. I just need to read this build id when looking at Go binaries, and they do use this form xor the GNU form (gccgo produces the GNU form).
Would you prefer a different API?

@philipc
Copy link
Contributor

philipc commented May 24, 2023

What do you need to do with it? I can make a better decision about the API when I know what it is used for.

Currently my preference would be to not add any specific knowledge of these notes. If users require it, then they can use the existing ELF note parser to read it. It is likely that the current API exposed by ElfFile doesn't make it easy or even possible to do this, but we should change that instead of adding another API to the Object trait that can only be implemented by one file format.

@tamird
Copy link
Contributor Author

tamird commented May 24, 2023

What do you need to do with it? I can make a better decision about the API when I know what it is used for.

We want to use this to identify the binary. By default, the value of the note is a content hash of the binary.

Currently my preference would be to not add any specific knowledge of these notes. If users require it, then they can use the existing ELF note parser to read it. It is likely that the current API exposed by ElfFile doesn't make it easy or even possible to do this, but we should change that instead of adding another API to the Object trait that can only be implemented by one file format.

Indeed, the current constellation of APIs doesn't make this pleasant:

  • There's no way to go from File (which abstracts over file format) to elf::ElfFile{32,64}.
  • There's no way to parse an ELF file of unknown bitness (there is only elf::ElfFile{32,64}::parse)
  • There's no way to get an elf::FileHeader from an elf::ElfFile{32,64}

I think you'd have to:

FileKind::parse(..);
elf::FileHeader{32,64}::parse(..);
elf::FileHeader::program_headers().iter()....
elf::FileHeader::section_headers().iter()....

Are you imagining an API that makes this possible starting from File, or just starting from the elf module?

FWIW, the current build_id API is also one that can only be implemented by one file format as I understand it.

@philipc
Copy link
Contributor

philipc commented May 24, 2023

There's no way to go from File (which abstracts over file format) to elf::ElfFile{32,64}.

I think we need to add this.

There's no way to parse an ELF file of unknown bitness (there is only elf::ElfFile{32,64}::parse)

This is what File already does. I don't see a way to add anything better than that.

There's no way to get an elf::FileHeader from an elf::ElfFile{32,64}

There exists raw_header and raw_segments, and we could add more similar methods if you need.

We could also add a method to ElfFile to find notes with a given name (so you could pass ELF_NOTE_GO as a parameter to that).

I think you'd have to:

FileKind::parse(..);
elf::FileHeader{32,64}::parse(..);
elf::FileHeader::program_headers().iter()....
elf::FileHeader::section_headers().iter()....

This should already possible. Do you think this is unsuitable? If so, what could be done to improve it?

Are you imagining an API that makes this possible starting from File, or just starting from the elf module?

I think both should be made possible. Use File if you need to abstract over the different file formats, and use elf if the entirety of what you are doing is specific to ELF.

FWIW, the current build_id API is also one that can only be implemented by one file format as I understand it.

It's special because this crate originally existed only to find and load debug info for gimli, and when it was added the lower level API didn't exist.

@tamird
Copy link
Contributor Author

tamird commented May 24, 2023

There's no way to go from File (which abstracts over file format) to elf::ElfFile{32,64}.

I think we need to add this.

Today File is opaque, but contains a useful enum that's private (FileInternal). What do you think about replace File's definition with that of FileInternal?

There's no way to parse an ELF file of unknown bitness (there is only elf::ElfFile{32,64}::parse)

This is what File already does. I don't see a way to add anything better than that.

There's no way to get an elf::FileHeader from an elf::ElfFile{32,64}

There exists raw_header and raw_segments, and we could add more similar methods if you need.

True, if we could go File -> ElfFile then this would be useful. We could add raw_sections, I imagine.

We could also add a method to ElfFile to find notes with a given name (so you could pass ELF_NOTE_GO as a parameter to that).

That would work, but with the above, I think it wouldn't be strictly necessary.

I think you'd have to:

FileKind::parse(..);
elf::FileHeader{32,64}::parse(..);
elf::FileHeader::program_headers().iter()....
elf::FileHeader::section_headers().iter()....

This should already possible. Do you think this is unsuitable? If so, what could be done to improve it?

Yeah, it is possible today (that's what I was saying). I think the ideas above would make this more ergonomic and wouldn't require parsing the file header twice.

Are you imagining an API that makes this possible starting from File, or just starting from the elf module?

I think both should be made possible. Use File if you need to abstract over the different file formats, and use elf if the entirety of what you are doing is specific to ELF.

In this case what I'm doing is specific to ELF (other formats don't have notes AFAIK) but it would be nice to start with File so that failure can be graceful.

FWIW, the current build_id API is also one that can only be implemented by one file format as I understand it.

It's special because this crate originally existed only to find and load debug info for gimli, and when it was added the lower level API didn't exist.

Perhaps that's a good reason to remove it (once it's possible to ergonomically go from File to elf::ElfFile{32,64}.

I think redefining File as FileInternal and adding raw_sections would get us where we need to be. Removing File::build_id can come later, at such time as taking breaking changes is deemed acceptable.

Let me know if this sounds sensible, and I'll rework this PR.

@philipc
Copy link
Contributor

philipc commented May 25, 2023

Today File is opaque, but contains a useful enum that's private (FileInternal). What do you think about replace File's definition with that of FileInternal?

It's something I've considered, and it's okay, but ideally I would like something better. The shortcoming of this approach is that it means you must be working with File, whereas I would like to be able to write the same code for T: Object, and then have the option of using File, NativeFile, ElfFile or whatever with that code. So we need a method on Object that gets you an enum of the various files, even if you are working with something other than File, and FileInternal isn't quite right for that. I've implemented a draft of this idea in #550.

Yeah, it is possible today (that's what I was saying). I think the ideas above would make this more ergonomic and wouldn't require parsing the file header twice.

Why do you need to parse the file header twice? Do you mean you would still need the parsing that File does? With this approach I was thinking you wouldn't need to use File at all.

Perhaps that's a good reason to remove it (once it's possible to ergonomically go from File to elf::ElfFile{32,64}.

It's not worth breaking existing users for.

@tamird
Copy link
Contributor Author

tamird commented May 25, 2023

Yeah, I think you're right - we wouldn't need to use File at all if we just lived in elf. However, then we have another problem: we don't have an enum for elf::FileHeader, only a trait.

    use object::read::elf::FileHeader as _;
    let object = match object::read::FileKind::parse(mmap.as_ref()).unwrap() {
        object::FileKind::Elf32 => object::elf::FileHeader32::parse(mmap.as_ref()),
        object::FileKind::Elf64 => object::elf::FileHeader64::parse(mmap.as_ref()),
        file_kind => bail!("unsupported file kind {file_kind:?}"),
    }.unwrap();

this code doesn't compile because the match arms have different types. Should this crate provide an enum to unify the 32-bit and 64-bit headers? I tried writing this in my code, but ran into problems:

  • object::read::elf::FileHeader::parse returns a reference, not a value
  • object::read::elf::FileHeader has a Pod bound
  • Pod has a 'static bound
  • therefore my enum must have a lifetime parameter, and thus can't be Pod, and thus can't implement FileHeader. Obviously an inherent impl works, but it's unfortunate that it's not possible to impl the trait.

@tamird
Copy link
Contributor Author

tamird commented May 25, 2023

I sent #551 to remove the 'static bound on Pod, which should allow me to define the enum I wanted above.

@philipc
Copy link
Contributor

philipc commented May 26, 2023

However, then we have another problem: we don't have an enum for elf::FileHeader, only a trait.

    use object::read::elf::FileHeader as _;
    let object = match object::read::FileKind::parse(mmap.as_ref()).unwrap() {
        object::FileKind::Elf32 => object::elf::FileHeader32::parse(mmap.as_ref()),
        object::FileKind::Elf64 => object::elf::FileHeader64::parse(mmap.as_ref()),
        file_kind => bail!("unsupported file kind {file_kind:?}"),
    }.unwrap();

this code doesn't compile because the match arms have different types. Should this crate provide an enum to unify the 32-bit and 64-bit headers?

Instead of writing code like that, can you call a function that is generic over trait FileHeader from each match arm? For example, see

match kind {
object::FileKind::Archive => print_archive(p, data),
object::FileKind::Coff => pe::print_coff(p, data),
object::FileKind::CoffBig => pe::print_coff_big(p, data),
object::FileKind::DyldCache => macho::print_dyld_cache(p, data),
object::FileKind::Elf32 => elf::print_elf32(p, data),
object::FileKind::Elf64 => elf::print_elf64(p, data),
object::FileKind::MachO32 => macho::print_macho32(p, data, 0),
object::FileKind::MachO64 => macho::print_macho64(p, data, 0),
object::FileKind::MachOFat32 => macho::print_macho_fat32(p, data),
object::FileKind::MachOFat64 => macho::print_macho_fat64(p, data),
object::FileKind::Pe32 => pe::print_pe32(p, data),
object::FileKind::Pe64 => pe::print_pe64(p, data),
// TODO
_ => {}
}

pub(super) fn print_elf32(p: &mut Printer<'_>, data: &[u8]) {
if let Some(elf) = FileHeader32::<Endianness>::parse(data).print_err(p) {
writeln!(p.w(), "Format: ELF 32-bit").unwrap();
print_elf(p, elf, data);
}
}
pub(super) fn print_elf64(p: &mut Printer<'_>, data: &[u8]) {
if let Some(elf) = FileHeader64::<Endianness>::parse(data).print_err(p) {
writeln!(p.w(), "Format: ELF 64-bit").unwrap();
print_elf(p, elf, data);
}
}
fn print_elf<Elf: FileHeader<Endian = Endianness>>(p: &mut Printer<'_>, elf: &Elf, data: &[u8]) {

If that is not possible, I agree that it might be nice to have an enum for both 32-bit and 64-bit (this is what read::File is after all), but I don't think that removing the Pod trait from FileHeader is a satisfactory solution. For example, in #551 you had to add many Pod bounds in the readobj example, and think that needing to add those is a big decrease in ergonomics.

Instead of trying to implement FileHeader for your enum, can you add methods to your enum that forward to the individual variants, similar to what read::File does.

@tamird
Copy link
Contributor Author

tamird commented May 26, 2023

Instead of writing code like that, can you call a function that is generic over trait FileHeader from each match arm?

Sort of. The code I ended up writing (which didn't require any patches to this crate) is:

fn notes<'data, Elf, R>(
    elf: &object::read::elf::ElfFile<'data, Elf, R>,
) -> object::read::Result<
    impl Iterator<Item = object::read::Result<object::read::elf::NoteIterator<'data, Elf>>>,
>
where
    Elf: object::read::elf::FileHeader,
    R: object::ReadRef<'data>,
{
    use object::read::elf::{ProgramHeader as _, SectionHeader as _};

    let endian = elf.endian();
    let data = elf.data();
    let header = elf.raw_header();
    let sections = header.sections(endian, data)?;
    let sections = sections
        .iter()
        .filter_map(move |header| header.notes(endian, data).transpose());
    let segments = header.program_headers(endian, data)?;
    let segments = segments
        .iter()
        .filter_map(move |header| header.notes(endian, data).transpose());
    Ok(std::iter::empty().chain(sections).chain(segments))
}

// TODO(https://github.com/gimli-rs/object/pull/549): Simplify this.
fn build_id<'data, Elf, R>(
    elf: &object::read::elf::ElfFile<'data, Elf, R>,
) -> object::read::Result<Option<&'data [u8]>>
where
    Elf: object::read::elf::FileHeader,
    R: object::ReadRef<'data>,
{
    let endian = elf.endian();

    const ELF_NOTE_GNU: &[u8] = b"GNU";
    const ELF_NOTE_GO: &[u8] = b"Go";

    const NT_GO_BUILD_ID: u32 = 4;

    let mut notes = notes(elf)?;
    while let Some(mut notes) = notes.next().transpose()? {
        while let Some(note) = notes.next()? {
            let mut name = note.name();
            while let [rest @ .., 0] = name {
                name = rest;
            }
            match (name, note.n_type(endian)) {
                (ELF_NOTE_GNU, object::elf::NT_GNU_BUILD_ID) | (ELF_NOTE_GO, NT_GO_BUILD_ID) => {
                    return Ok(Some(note.desc()));
                }
                _ => {}
            }
        }
    }
    Ok(None)
}

enum ElfFile<'data, Endian: object::Endian, R: object::ReadRef<'data>> {
    Elf32(object::read::elf::ElfFile<'data, object::elf::FileHeader32<Endian>, R>),
    Elf64(object::read::elf::ElfFile<'data, object::elf::FileHeader64<Endian>, R>),
}

impl<'data, Endian: object::Endian, R: object::ReadRef<'data>> ElfFile<'data, Endian, R> {
    fn endian(&self) -> Endian {
        match self {
            Self::Elf32(elf) => elf.endian(),
            Self::Elf64(elf) => elf.endian(),
        }
    }

    fn section_data_by_name(
        &self,
        section_name: &str,
    ) -> Option<object::Result<std::borrow::Cow<'_, [u8]>>> {
        use object::Object as _;
        match self {
            Self::Elf32(elf) => elf
                .section_by_name(section_name)
                .as_ref()
                .map(object::ObjectSection::uncompressed_data),
            Self::Elf64(elf) => elf
                .section_by_name(section_name)
                .as_ref()
                .map(object::ObjectSection::uncompressed_data),
        }
    }

    fn build_id(&self) -> object::read::Result<Option<&'data [u8]>> {
        match self {
            Self::Elf32(elf) => build_id(elf),
            Self::Elf64(elf) => build_id(elf),
        }
    }
}

If that is not possible, I agree that it might be nice to have an enum for both 32-bit and 64-bit (this is what read::File is after all), but I don't think that removing the Pod trait from FileHeader is a satisfactory solution. For example, in #551 you had to add many Pod bounds in the readobj example, and think that needing to add those is a big decrease in ergonomics.

Instead of trying to implement FileHeader for your enum, can you add methods to your enum that forward to the individual variants, similar to what read::File does.

Yeah, I agree. I think we can perhaps move that discussion to #551. Apart from the last commit in this PR, I think it's an uncontroversial cleanup - do you agree? If yes, I can either:

  • remove the last commit entirely
  • remove the changes to build_id but keep the new constants

Let me know what you think, and hopefully we can get this landed. Thanks for helping me iterate here.

crates/examples/src/objdump.rs Outdated Show resolved Hide resolved
src/read/elf/file.rs Outdated Show resolved Hide resolved
src/read/elf/note.rs Show resolved Hide resolved
@philipc
Copy link
Contributor

philipc commented May 27, 2023

The code I ended up writing (which didn't require any patches to this crate) is:

Is that satisfactory?

Are you doing things with the file in addition to reading the build id? That is more code than I hoped would be needed simply to read a build id, and it could be simplified, but if you are doing other things too then maybe it is needed.

remove the changes to build_id but keep the new constants

I think we should do this. I've added some review comments.

@tamird
Copy link
Contributor Author

tamird commented May 30, 2023

Are you doing things with the file in addition to reading the build id? That is more code than I hoped would be needed simply to read a build id, and it could be simplified, but if you are doing other things too then maybe it is needed.

Yeah, I go on to pass this into gimli to read the debug info.

I think we should do this. I've added some review comments.

Done, thanks for the comments.

src/read/elf/file.rs Outdated Show resolved Hide resolved
src/read/elf/note.rs Outdated Show resolved Hide resolved
src/elf.rs Show resolved Hide resolved
@tamird tamird changed the title Support reading Build ID from Go ELF binaries elf: Define Go Build ID constants May 31, 2023
tamird added 3 commits June 1, 2023 08:56
This allows these items to be used in match expressions.
Add Note::name_bytes which returns the entire content of the name field.
@philipc philipc merged commit ac5e8e2 into gimli-rs:master Jun 2, 2023
@tamird tamird deleted the go-build-id branch June 2, 2023 02:02
mcbegamerxx954 pushed a commit to mcbegamerxx954/object that referenced this pull request Jun 15, 2024
elf: Define Go Build ID constants
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants