Profiling CLE, pyelftools, and pefile #231

ltfish · 2020-02-15T08:17:27Z

Loading binaries is taking longer and longer since recent updates in CLE, pyelftools, and pefile. Profiling them is the first step to make things faster.

rhelmot · 2020-04-29T15:23:25Z

Here are some preliminary findings:

the hottest functions for elf loading are clemory.load and pyelftools struct._parse
For the former I was able to get a few milliseconds out of it
The latter is going to be very hard. pyelftools has a very intricate struct parsing mechanism and I can't imagine making any changes to it without bringing down a house of cards. The one change we could make which I could see improving things is somehow getting pyelftools to directly use clemory.unpack_word for its word unpacking when it is using a clemory as a stream, instead of reading the word out as bytes and then unpacking that. I have no idea what percentage of the struct parsing is done over a clemory vs the binary stream, so take this with a big old grain of salt.
how much time is spent on the various aspects of ELF parsing will obviously vary from binary to binary, but on the one I was testing on, relocation parsing was the most intensive. Note that this is just the parsing, not performing relocations, which actually takes relatively little by comparison. Because of this, another change I made was to disable relocation parsing when we disable relocation performing. This removes our ability to introspect into a binary's relocations without also performing them, but imo this is an okay tradeoff considering it is a noticeable speed improvement for large binaries.

I profiled PE loading back in summer 2017 and found that the same thing applied to pefile as it does to pyelftools - the hot functions are all struct parsing and this is already highly optimized. The big difference between our use of pefile vs pyelftools is that we use pefile as much more of a monolity, whereas we use pyelftools as a parsing toolkit. It might be possible to remove some unnecessary parsing if we look more carefully into how to use pefile efficiently.

ltfish · 2020-04-29T16:34:24Z

Are you using load_debug_info=True? If so, are you using the latest pyelftools master? Recently a PR added a cache for DIU I believe, which sped up DWARF loading for me a lot.

I was thinking of monkeypatching the struct loading code in pyelftools in CLE using a C-backed implementation. What do you think?

rhelmot · 2020-04-29T16:37:23Z

all of my tests were with load_debug_info=False. I think your idea could maybe work but we would need to read the entire file into memory first and I don't really know how we would keep track of that.

rhelmot · 2020-04-29T16:40:26Z

also which level of abstraction were you thinking of monkeypatching pyelftools at? I can't seem to find a level in between "redo the whole gigantic mess" and "so small I don't think it would help anything"

ltfish · 2020-04-29T16:57:38Z

I'm thinking of moving elftools/common/construct_utils.py into C.

github-actions · 2022-05-18T02:12:27Z

This issue has been marked as stale because it has no recent activity. Please comment or add the pinned tag to prevent this issue from being closed.

ltfish · 2022-11-16T21:53:01Z

One of the timeout binaries that we definitely want to be able to load: asterisk.zip

ltfish added the enhancement label Feb 15, 2020

ltfish assigned rhelmot and ltfish Feb 15, 2020

ltfish added the help wanted label Feb 15, 2020

rhelmot added a commit that referenced this issue Apr 29, 2020

micro optimize hot code in clemory; see #231

3903726

rhelmot added a commit that referenced this issue Apr 29, 2020

don't parse relocations if we're not going to use them; see #231

92f14c0

github-actions bot added the stale label May 18, 2022

rhelmot added pinned and removed stale labels May 18, 2022

twizmwazin removed the pinned label Nov 16, 2022

twizmwazin unassigned rhelmot Nov 16, 2022

twizmwazin removed the help wanted label Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling CLE, pyelftools, and pefile #231

Profiling CLE, pyelftools, and pefile #231

ltfish commented Feb 15, 2020

rhelmot commented Apr 29, 2020

ltfish commented Apr 29, 2020

rhelmot commented Apr 29, 2020

rhelmot commented Apr 29, 2020

ltfish commented Apr 29, 2020

github-actions bot commented May 18, 2022

ltfish commented Nov 16, 2022

Profiling CLE, pyelftools, and pefile #231

Profiling CLE, pyelftools, and pefile #231

Comments

ltfish commented Feb 15, 2020

rhelmot commented Apr 29, 2020

ltfish commented Apr 29, 2020

rhelmot commented Apr 29, 2020

rhelmot commented Apr 29, 2020

ltfish commented Apr 29, 2020

github-actions bot commented May 18, 2022

ltfish commented Nov 16, 2022