|
| 1 | +# Tiny ELF loader |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +This is a failed submission for IOCCC 2013. |
| 6 | + |
| 7 | +This is a tiny dynamic linker/loader for ELF. This loads programs |
| 8 | +built on Linux, and it runs on Linux, Mac OSX, Cygwin, and possibly on |
| 9 | +other OS. thanks to its Linux/glibc emulation layer. This means you |
| 10 | +can run Linux programs on other OSes. This works only for x86. This |
| 11 | +program is similar to how WINE works. |
| 12 | + |
| 13 | +## Usage |
| 14 | + |
| 15 | +Note that all commands below assumes you have all of my submission in |
| 16 | +the current directory. |
| 17 | + |
| 18 | +### Usage for Linux and Mac OSX |
| 19 | + |
| 20 | + $ make |
| 21 | + $ ./elf bin/hello |
| 22 | + $ ./elf bin/i386-tcc-32 # help is shown |
| 23 | + |
| 24 | +To compile something with TinyCC (http://tinycc.org/), which is based |
| 25 | +on a former IOCCC winning entry (http://www0.us.ioccc.org/2001/bellard.c), |
| 26 | +you need to set up your environment by mkenv.sh. This script downloads |
| 27 | +two debian packages and extracts it to set up "linux" directory which |
| 28 | +contains include files and object files for TinyCC. This script |
| 29 | +requires curl, ar, tar, and perl. |
| 30 | + |
| 31 | + $ ./mkenv.sh |
| 32 | + $ ./elf bin/i386-tcc-32 -E ./hello.c |
| 33 | + $ ./elf bin/i386-tcc-32 ./hello.c -o hello-tcc |
| 34 | + $ ./elf ./hello-tcc |
| 35 | + |
| 36 | +You can compile more complex programs with the TCC loaded by this ELF |
| 37 | +loader. For example, let's compile the source code of TCC itself. |
| 38 | + |
| 39 | + $ curl -L -O http://download.savannah.gnu.org/releases/tinycc/tcc-0.9.26.tar.bz2 # or wget |
| 40 | + $ tar -xvjf tcc-0.9.26.tar.bz2 |
| 41 | + $ cd tcc-0.9.26 |
| 42 | + $ ./configure |
| 43 | + $ cd .. |
| 44 | + $ ./elf bin/i386-tcc-32 -o i386-tcc-32-tcc tcc-0.9.26/tcc.c -DONE_SOURCE -DTCC_TARGET_I386 -DCONFIG_SYSROOT='"linux"' -DCONFIG_TCCDIR='"linux/tcc"' -g -O2 -m32 -lm -ldl |
| 45 | + $ ./elf ./i386-tcc-32-tcc |
| 46 | + |
| 47 | +Of course, you can load the TCC built by the original TCC. |
| 48 | + |
| 49 | + $ ./elf ./i386-tcc-32-tcc -o i386-tcc-32-tcc-tcc tcc-0.9.26/tcc.c -DONE_SOURCE -DTCC_TARGET_I386 -DCONFIG_SYSROOT='"linux"' -DCONFIG_TCCDIR='"linux/tcc"' -g -O2 -m32 -lm -ldl |
| 50 | + $ ./elf ./i386-tcc-32-tcc-tcc # this still works |
| 51 | + |
| 52 | +### Usage for Cygwin |
| 53 | + |
| 54 | +See also usage for Linux and Mac as well. |
| 55 | + |
| 56 | +Unfortunately, Cygwin does not support MAP_FIXED for 4k boundaries so |
| 57 | +we need to use special Linux binaries whose segments are aligned to |
| 58 | +64k boundaries. |
| 59 | + |
| 60 | + (cygwin) $ tar -xvzf for_cygwin.tgz |
| 61 | + (cygwin) $ make |
| 62 | + (cygwin) $ ./elf bin/hello-aligned |
| 63 | + (cygwin) $ ./elf bin/i386-tcc-32-aligned # help is shown |
| 64 | + |
| 65 | +You can build Linux binaries with i386-tcc-32-aligned, but you cannot |
| 66 | +run the output because it is not aligned properly. However, you can |
| 67 | +run the output on Linux. |
| 68 | + |
| 69 | + (cygwin) $ ./mkenv.sh |
| 70 | + (cygwin) $ ./elf bin/i386-tcc-32-aligned -E ./hello.c |
| 71 | + (cygwin) $ ./elf bin/i386-tcc-32-aligned ./hello.c -o hello-tcc-win |
| 72 | + (cygwin) $ ./elf ./hello-tcc-win # mmap fails |
| 73 | + (linux) $ ./hello-tcc-win # works |
| 74 | + |
| 75 | +You can reproduce the -aligned binaries by using align.lds. |
| 76 | + |
| 77 | + (linux) $ gcc -m32 hello.c -Wl,-Talign.lds -o hello-aligned |
| 78 | + |
| 79 | +### Chain load |
| 80 | + |
| 81 | +You can load this loader itself. |
| 82 | + |
| 83 | + $ ./elf bin/elf-linux bin/hello |
| 84 | + $ ./elf bin/elf-linux bin/i386-tcc-32 |
| 85 | + |
| 86 | +For Cygwin, please use hello-aligned and i386-tcc-32-aligned instead. |
| 87 | + |
| 88 | +Of course, on Linux and Mac, you still can run programs built by TCC |
| 89 | +chain-loaded by this loader loaded by this loader. |
| 90 | + |
| 91 | + $ ./elf bin/elf-linux bin/i386-tcc-32 ./hello.c -o hello-tcc |
| 92 | + $ ./elf bin/elf-linux ./hello-tcc |
| 93 | + |
| 94 | +You can reproduce elf-linux by |
| 95 | + |
| 96 | + (linux) $ gcc -m32 -g -Wall -W elf.o -rdynamic -ldl -Wl,-Ttext-segment=0x3000000 -Wl,-Talign.lds -o elf-linux |
| 97 | + |
| 98 | +As you see, the start address of elf-linux was adjusted for Linux, and |
| 99 | +the alignment of elf-linux was adjusted for Cygwin. |
| 100 | + |
| 101 | +Note that you cannot load elf-linux twice, because the address layout |
| 102 | +of elf-linux is fixed. |
| 103 | + |
| 104 | + $ ./elf bin/elf-linux bin/elf-linux # fails |
| 105 | + |
| 106 | +### Add Linux only APIs |
| 107 | + |
| 108 | +This loader cannot run arbitrary Linux binaries on other OSes mainly |
| 109 | +because its Linux emulation layer lacks a lot of functions. However, |
| 110 | +you can easily add such functions. For example, see the following |
| 111 | +C code: |
| 112 | + |
| 113 | + #include <stdio.h> |
| 114 | + #include <string.h> |
| 115 | + int main() { |
| 116 | + char buf[] = "hello"; |
| 117 | + memfrob(buf, 5); |
| 118 | + puts(buf); |
| 119 | + return 0; |
| 120 | + } |
| 121 | + |
| 122 | +This code uses memfrob, which is a glibc-only function, and this will |
| 123 | +not work on Mac or Cygwin. |
| 124 | + |
| 125 | + $ ./elf ./memfrob # linux only |
| 126 | + |
| 127 | +However, by providing the implementation of memfrob in elf.c, you can |
| 128 | +run this program on Mac or Cygwin. Please add the following code at |
| 129 | +the bottom of elf.c: |
| 130 | + |
| 131 | + void* memfrob(void* v, size_t n) { |
| 132 | + char* p = (char*)v; |
| 133 | + while (n--) { |
| 134 | + *p++ ^= 42; |
| 135 | + } |
| 136 | + return v; |
| 137 | + } |
| 138 | + |
| 139 | + $ make |
| 140 | + $ ./elf ./memfrob # now it works on everywhere! |
| 141 | + |
| 142 | +## Obfuscation techniques |
| 143 | + |
| 144 | +### ASCII arts |
| 145 | + |
| 146 | +The code itself provides some ideas about what code does. The first |
| 147 | +three letters, 'E', 'L', and 'F', are just some preprocessor |
| 148 | +directives and some data. The face of elf is Linux emulation layer. |
| 149 | + |
| 150 | +Then, the next box which has four cells explains how ELF objects look |
| 151 | +like. An ELF object always starts with an ELF header. The code around |
| 152 | +the first cell actually parses the ELF headers. Notice the string in |
| 153 | +the cell ("ELF Header") is used as a part of the error message. |
| 154 | + |
| 155 | + $ ./elf hello.c # not ELF Header |
| 156 | + |
| 157 | +Then, multiple program headers follow. You see the following for-loop |
| 158 | +at the top of the second cell. |
| 159 | + |
| 160 | + for(K=E+=13;K<E+E[-2]%65536*8;K+=8){ |
| 161 | + |
| 162 | +This is the loop which handles program headers. Then, next line starts |
| 163 | +with |
| 164 | + |
| 165 | + if(*K==1) |
| 166 | + |
| 167 | +The code in this if-clause handles PT_LOAD (==1). |
| 168 | + |
| 169 | +At the top of the 4th cell, you will see |
| 170 | + |
| 171 | + if(*K==2) |
| 172 | + |
| 173 | +The code after this handles PT_DYNAMIC (==2). |
| 174 | + |
| 175 | +### Compactness |
| 176 | + |
| 177 | +Another notable characteristic of this code is its |
| 178 | +compactness. elf-tiny.c is the compressed version of this program, |
| 179 | +which has no error checks. elf-tiny.c has only less than 1000 |
| 180 | +bytes. I'd claim this is the tiniest ELF loader in the world, but it |
| 181 | +just works on multiple OSes: |
| 182 | + |
| 183 | + $ make elf-tiny |
| 184 | + $ ./elf-tiny ./hello |
| 185 | + |
| 186 | +To achieve this extreme compactness, a number of techniques are |
| 187 | +used. One good example is |
| 188 | + |
| 189 | + 1[(I*)O] |
| 190 | + |
| 191 | +at the top of the 4th cell. This <index>[<array>] style is well known |
| 192 | +obfuscation technique, but this code uses this style because this is |
| 193 | +shorter than |
| 194 | + |
| 195 | + ((I*)O)[1] |
| 196 | + |
| 197 | +or |
| 198 | + |
| 199 | + *((I*)O+1) |
| 200 | + |
| 201 | +Another example is magic numbers like 7417633*159. This is 0x464c457f, |
| 202 | +which is the magic of ("\x7fELF") in little endian. |
| 203 | + |
| 204 | +Finally, the following code snippet is one of my favorite in this |
| 205 | +program: |
| 206 | + |
| 207 | + O=strstr(T,H=*((char**)D[6]+M/256*4)+D[5]),G=O?U[(O-T)/6]:Y(0,H) |
| 208 | + |
| 209 | +This obtains an address of a symbol from its name. Do you see how it |
| 210 | +works? Why is strstr for T necessary? |
| 211 | + |
| 212 | +## Philosophy |
| 213 | + |
| 214 | +This entry focuses on a fairly overlooked tool, the dynamic linking |
| 215 | +loader. I wanted to show how compact the code of dynamic loaders can |
| 216 | +be by implementing the loader in less than 1000 bytes. Another goal of |
| 217 | +this entry was to show how useful the dynamic loaders can be, by |
| 218 | +allowing users to run Linux binaries on another OS. |
| 219 | + |
| 220 | +One more thing this entry would demonstrate is the portability of x86. |
0 commit comments