Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

Compiled functions are repeated due to use of static inline #2179

Open
pwmarcz opened this issue Feb 21, 2021 · 6 comments
Open

Compiled functions are repeated due to use of static inline #2179

pwmarcz opened this issue Feb 21, 2021 · 6 comments
Assignees

Comments

@pwmarcz
Copy link
Contributor

pwmarcz commented Feb 21, 2021

Our header files define functions as static inline. That means than when a function is not inlined by compiler, is treated as static, i.e. separate for a compilation unit. As a result, if it's used in multiple compilation units, it gets repeated in the binary:

$ nm libsysdb.so | cut -d' ' -f3 | sort | uniq -cd | sort -nr
     54 
     53 get_cur_thread
     50 get_cur_tid
     49 __lock
     45 __unlock
     20 create_lock
     19 qstrgetstr
     17 locked
     17 die_or_inf_loop
     15 qstrsetstr
     15 qstrfree
     10 qstrempty
     10 clear_lock
      8 thread_wakeup
      8 shim_get_tcb
      8 set_handle_fs
      8 destroy_lock
      7 size_align_up
      7 __set_free_mem_area
      7 __ref_inc
      7 __ref_dec
      7 memory_migrated
      7 get_mem_obj_from_mgr_enlarge
      7 get_ipc_msg_size
      7 create_mem_mgr_in_place
      7 create_mem_mgr
      6 init_align_up
      6 free_mem_obj_to_mgr
      5 __shim_tcb_init
      5 set_tls
      5 LINUX_PROT_TO_PAL
      5 is_internal
      5 get_thread_handle_map
      5 debug_setbuf
      4 thread_sleep
      4 shim_tcb_init
      4 set_cur_thread
      4 qstrcopy
      4 pal_get_tcb
      4 dentry_get_path_into_qstr
      4 create_event
      3 shim_sigismember
      3 set_handle_map
      3 set_event
      3 __range_not_ok
      3 pal_context_set_retval
      3 lock_created
      3 is_internal_tid
      3 get_ipc_msg_with_ack_size
      3 enable_locking
      3 create_lock_runtime
      3 access_ok
      2 thread_setwait
      2 spinlock_unlock
      2 spinlock_lock
      2 _spinlock_is_locked
      2 spinlock_is_locked
      2 shim_sigdelset
      2 set_default_tls
      2 pal_context_get_sp
      2 pal_context_get_retval
      2 pal_context_copy
      2 LINUX_OPEN_FLAGS_TO_PAL_OPTIONS
      2 install_new_event
      2 event_handle
      2 dentry_get_name
      2 debug_spinlock_take_ownership
      2 debug_spinlock_giveup_ownership
      2 clear_event
      2 __bytes2hexstr

That's compiled with DEBUG. Without DEBUG the list is much shorter:

     23 lock.isra.0
     12 lock.isra.0.constprop.0
      4 __ref_dec.part.0
      4 die_or_inf_loop
      2 install_new_event

(I'm not sure if the first two are due to this issue of a quirk of GCC compilation).

I haven't checked but (at least with DEBUG) these repeated functions might account for a significant part of the LibOS binary (and some of libpal as well).

These functions should be defined as inline. However, that currently breaks the build during linking. For example, after changing create_event from static inline to inline:

ld: shim_async.o: in function `init_async':
/home/pawel/graphene/LibOS/shim/src/shim_async.c:130: undefined reference to `create_event'
ld: bookkeep/shim_handle.o: in function `rs_handle':
/home/pawel/graphene/LibOS/shim/src/bookkeep/shim_handle.c:760: undefined reference to `create_event'
ld: ipc/shim_ipc_helper.o: in function `init_ipc_helper':
/home/pawel/graphene/LibOS/shim/src/ipc/shim_ipc_helper.c:181: undefined reference to `create_event'
ld: sys/shim_epoll.o: in function `shim_do_epoll_create1':
/home/pawel/graphene/LibOS/shim/src/sys/shim_epoll.c:53: undefined reference to `create_event'

I suspect that the definitions are static inline because the above did not work, and that's probably because we have a custom linker script.

Proposed solution

Change all instances of static inline in header files to inline, and fix the build process so that we can use inline.

More information

Wikipedia, emphasis mine:

Some recommend an entirely different approach, which is to define functions as static inline instead of inline in header files.[2] Then, no unreachable code will be generated. However, this approach has a drawback in the opposite case: Duplicate code will be generated if the function could not be inlined in more than one translation unit. The emitted function code cannot be shared among translation units because it must have different addresses. This is another drawback; taking the address of such a function defined as static inline in a header file will yield different values in different translation units. Therefore, static inline functions should only be used if they are used in only one translation unit, which means that they should only go to the respective .c file, not to a header file.

@mkow
Copy link
Member

mkow commented Feb 22, 2021

Ad. the wiki quote: all of this is just a workaround for us not using Link Time Optimization and having all code in C files.
We should be doing this, but AFAIK our hackery in the linker scripts needs first to be removed (and ideally all our linker scripts altogether), but this is (again: AFAIK) blocked on cleaning up IPC/checkpointing, which uses some linker magic.

CC: @boryspoplawski

@yamahata
Copy link
Contributor

When compiled with DEBUG, -O2 isn't specified. See Scripts/Makefile.configs. So this isn't surprise.
Can you please check DEBUG + -O2?

@dimakuv
Copy link
Contributor

dimakuv commented Feb 22, 2021

Agreed with @mkow, but is there a particular problem with the current situation? Is it just annoying for debugging, or something else is blocked by this?

@pwmarcz
Copy link
Contributor Author

pwmarcz commented Feb 22, 2021

When compiled with DEBUG, -O2 isn't specified. See Scripts/Makefile.configs. So this isn't surprise.
Can you please check DEBUG + -O2?

Roughly similar to just -O2:

      8 __lock.constprop.3
      8 __lock
      5 __lock.constprop.4
      4 die_or_inf_loop
      4 __set_free_mem_area.part.2
      4 __lock.constprop.6
      3 __lock.constprop.5
      2 install_new_event
      2 free_mem_obj_to_mgr.isra.5.part.6
      2 __unlock.part.1.constprop.5
      2 __unlock
      2 __ref_dec.part.3

Agreed with @mkow, but is there a particular problem with the current situation? Is it just annoying for debugging, or something else is blocked by this?

Since it turns out the issue is mostly with DEBUG=1, yes, it's mostly annoying for debugging - the file size is blown up.

As for blocking... well, we were hesitant to enable inlining for a frequently-used function in #2177 because of this (because in debug-mode it would appear in 50+ copies, one for each .c file it's used in).

Anyway, yes, I agree we should do both: fix the issue with inline, and enable LTO. Assuming LTO works well with gcc - I don't have much experience with it.

@boryspoplawski
Copy link
Contributor

(...) our hackery in the linker scripts needs first to be removed (and ideally all our linker scripts altogether), but this is (again: AFAIK) blocked on cleaning up IPC/checkpointing, which uses some linker magic.

Yes, we need to rework checkpointing first (it uses section sorting, don't ask me why it's done this way).

Regarding this issue: tbh I don't see the problem, we will get rid of this once we have LTO. For now I don't mind having 5 functions repeated couple of times (especially considering they are very small).

@pwmarcz
Copy link
Contributor Author

pwmarcz commented Feb 22, 2021

I guess it's not too serious, but having fifty-three copies of get_cur_thread in the binary is painful to look at :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants