Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pltStubSymbols's treatment of the .plt.got section miscalculates the number of PLT stubs #375

Open
RyanGlScott opened this issue Apr 4, 2024 · 4 comments
Labels

Comments

@RyanGlScott
Copy link
Contributor

RyanGlScott commented Apr 4, 2024

If you compile this simple C program using clang:

int main(void) {
    return 0;
}

You'll see that it has exactly one PLT stub in its .plt.got section:

$ clang test.c -o test
$ objdump -d -j .plt.got test

test:     file format elf64-x86-64


Disassembly of section .plt.got:

0000000000001030 <__cxa_finalize@plt>:
    1030:	ff 25 c2 2f 00 00    	jmp    *0x2fc2(%rip)        # 3ff8 <__cxa_finalize@GLIBC_2.2.5>
    1036:	66 90                	xchg   %ax,%ax

However, pltStubSymbols claims that it has more PLT stubs than this! Here is what you see if you debug-print the output of pltStubSymbols on this program:

fromList [(0x1040,""),(0x1048,""),(0x1050,""),(0x1058,"__libc_start_main"),(0x1060,"_ITM_deregisterTMCloneTable"),(0x1068,"__gmon_start__"),(0x1070,"_ITM_registerTMCloneTable"),(0x1078,"__cxa_finalize")]

What is going on here?

Ultimately, pltStubsSymbols consults the .rela.dyn section to figure out what the contents of the .plt.got are. In this case, .rela.dyn contains five entries:

$ objdump -dzR -j .got test

test:     file format elf64-x86-64


Disassembly of section .got:

0000000000003fd8 <.got>:
    3fd8:	00 00                	add    %al,(%rax)
			3fd8: R_X86_64_GLOB_DAT	__libc_start_main@GLIBC_2.34
    3fda:	00 00                	add    %al,(%rax)
    3fdc:	00 00                	add    %al,(%rax)
    3fde:	00 00                	add    %al,(%rax)
    3fe0:	00 00                	add    %al,(%rax)
			3fe0: R_X86_64_GLOB_DAT	_ITM_deregisterTMCloneTable@Base
    3fe2:	00 00                	add    %al,(%rax)
    3fe4:	00 00                	add    %al,(%rax)
    3fe6:	00 00                	add    %al,(%rax)
    3fe8:	00 00                	add    %al,(%rax)
			3fe8: R_X86_64_GLOB_DAT	__gmon_start__@Base
    3fea:	00 00                	add    %al,(%rax)
    3fec:	00 00                	add    %al,(%rax)
    3fee:	00 00                	add    %al,(%rax)
    3ff0:	00 00                	add    %al,(%rax)
			3ff0: R_X86_64_GLOB_DAT	_ITM_registerTMCloneTable@Base
    3ff2:	00 00                	add    %al,(%rax)
    3ff4:	00 00                	add    %al,(%rax)
    3ff6:	00 00                	add    %al,(%rax)
    3ff8:	00 00                	add    %al,(%rax)
			3ff8: *unknown*	__cxa_finalize@GLIBC_2.2.5
    3ffa:	00 00                	add    %al,(%rax)
    3ffc:	00 00                	add    %al,(%rax)
    3ffe:	00 00                	add    %al,(%rax)

But only one of them (__cxa_finalize) is actually a PLT stub. However, their presence throws off the heuristics that pltStubSymbols uses.

I'm not quite sure what to do about this. It would be convenient if there were a convenient mechanism to distinguish __cxa_finalize apart from the other entries in .got, but I'm not sure what that would be. My first inclination was to filter out any symbols that aren't function symbols, but even that isn't enough, as __libc_start_main is also a function symbol:

$ readelf -W --dyn-syms test

Symbol table '.dynsym' contains 6 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FUNC    GLOBAL DEFAULT  UND __libc_start_main@GLIBC_2.34 (2)
     2: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_deregisterTMCloneTable
     3: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND __gmon_start__
     4: 0000000000000000     0 NOTYPE  WEAK   DEFAULT  UND _ITM_registerTMCloneTable
     5: 0000000000000000     0 FUNC    WEAK   DEFAULT  UND __cxa_finalize@GLIBC_2.2.5 (3)

It's also tempting to think that the combination of FUNC and WEAK would uniquely identify PLT stubs, but that is also not true. If you call a function via a function pointer, e.g.,

void* (*m)(size_t) = &malloc;

Then malloc will also be called via a PLT stub, but its function symbol will be FUNC and GLOBAL.

@RyanGlScott RyanGlScott added the bug label Apr 4, 2024
@langston-barrett
Copy link
Contributor

langston-barrett commented Apr 16, 2024

Ultimately, pltStubsSymbol consults the .rela.dyn section to figure out what the contents of the .plt.got are.

Perhaps a silly question: Why doesn't pltStubsSymbol consult the .plt.got section to figure out what the contents of .plt.got are?

[EDIT]: Perhaps this is just a hard problem, as indicated by this comment in the angr source code

@RyanGlScott
Copy link
Contributor Author

Your EDIT hits the nail on the head: the .plt.got section (as well as its cousins .plt and .plt.sec) are really just an unorganized collection of instructions, with no discernible function symbols to clearly demarcate the start of each PLT stub. In general, you have to undergo reverse engineering to know where each PLT stub begins and ends.

The only reason our heuristics for detecting PLT stubs in the .plt section work as well as they do is that the relocations contained in the .rela.plt section are only related to the .plt section. This isn't always the case for the .rela.dyn section, however. In addition to containing relocations for the PLT stubs in the .plt.got section, it can also contain relocations for things like global variables defined in shared libraries (e.g., _ITM_deregisterTMCloneTable). As such, our heuristics aren't terribly reliable for the .plt.got section.

@sauclovian-g
Copy link
Contributor

Aren't PLT stubs highly stylized though? I have never tangled with the x86_64 ones but back in 386 days they had a very definite layout.

also, even failing that it seems like it should be possible to extract all the jump targets (whether they're absolute like in your example or the result of relocations) and join that list (in the relational sense) to the dynamic symbol table.

maybe I'm missing something...

@RyanGlScott
Copy link
Contributor Author

Aren't PLT stubs highly stylized though?

You are correct. While it's not straightforward to identify PLT stub addresses as I had hoped, they do tend to exhibit some predictable layout patterns. As such, it should be possible to identify them by performing some modest analysis of the assembly code. Reading the code which identifies PLT stubs in angr, I worry that it might not exactly be fun to implement this code, but it should be doable.

The current approach in Macaw does in fact assume that each PLT stub is always the same size. This allows us to get away with a very simple, heuristics-based implementation of PLT stub detection in Macaw, but the downside is that these heuristics don't work for every binary (see the caveats mentioned here). If we improve our PLT stub analysis, I'd hope that we can cover these cases as well.

also, even failing that it seems like it should be possible to extract all the jump targets (whether they're absolute like in your example or the result of relocations) and join that list (in the relational sense) to the dynamic symbol table.

Yes. If we improve our analysis, then I suspect that we will need to do something quite like this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants