Life of an Executable

Process Loading under Linux
The Linux operating system uses the execve system call or one of its variants to load a process.
First, the kernel allocates internal structures. Then, it parses the given ELF file. If the ELF file has a specified interpreter field, it is executed. For executables that use dynamic libraries, this program is the dynamic linker, which is responsible for loading the required dynamic libraries.
The kernel then passes some information to the started process, such as the program’s arguments, environment variables, etc, by copying them to the process' stack.
The kernel also adds a virtual dynamic library, named linux-vdso.so.1
, to implement faster system calls if the underlying hardware allows it. The kernel then passes control to the entry point of the interpreter (usually, the dynamic linker).
Once the dynamic linker has done its job, it transfers the control to the entry point as specified in the ELF executable (typically, the _start
symbol). Then, the __libc_start_main()
routine is called, which sets up certain C library mechanisms.
After that, the control is passed to the init
code, which executes the global constructors stored in the .ctors
section.
Eventually, the function main
is called. When the it finishes, __libc_start_main()
passes the control to the fini
code, which executes the global destructors stored in the .dtors
section.
The following C code illustrates how to define a constructor and a destructor that will be run before and after the function main()
, respectively.
#include <stdio.h>
void __attribute__((constructor)) program_init(void)
{
printf("\ninit\n");
}
void __attribute__((destructor)) program_fini(void)
{
printf("\nfini\n");
}
int main(void)
{
printf("\nmain\n");
return 0;
}
See here and here for more information on what happens on the kernel side.
Dynamic Linking
The goal of dynamic linking is to save code space by sharing common (read-only) code between processes. The virtual memory mechanism allows shared code to be mapped to the same physical memory pages. Shared code is contained in shared libraries (or dynamic libraries) files (file extension is .so
under Linux, .dll
under Microsoft Windows and .dylib
under macOS). Shared libraries have code and data sections, just like executables.
On Linux, programs use dynamic linking by default, unless they are built with the compilation flag -static
. The compiler only needs to know the function prototypes. The static linker leaves information about which dynamic libraries to load in the .dynamic
section.
Here is an example of the contents of a .dynamic
section for the /bin/ls
executable:
$ readelf --dynamic /bin/ls
Dynamic section at offset 0x21a38 contains 28 entries:
Tag Type Name/Value
0x0000000000000001 (NEEDED) Shared library: [libselinux.so.1]
0x0000000000000001 (NEEDED) Shared library: [libc.so.6]
Dynamic libraries can also load other dynamic libraries. For the same /bin/ls
executable, one can obtain an exhaustive list thanks to the dynamic linker ldd
:
$ ldd /bin/ls
linux-vdso.so.1 (0x00007ffc7b91a000)
libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fbf02c5c000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbf02a00000)
libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007fbf02966000)
/lib64/ld-linux-x86-64.so.2 (0x00007fbf02cb8000)
Role of the Dynamic Linker
The dynamic linker is responsible for managing shared libraries on behalf of the executable. The primary function of the dynamic linker consists in resolving the missing addresses in the executable during runtime. A relocation is a hint about the fact that a particular address needs to be “fixed” at runtime. Before running the code, the dynamic linker needs to go through all relocations and fix the referred addresses. Many different types of relocations exist and they are ABI-dependent.
Here is an illustration on how a relocation works:
$ cat a.c
extern int x[100];
int *y = x + 57;
$ cat b.c
int x[100];
$ gcc -nostdlib -shared -fpic -s -o b.so b.c
$ gcc -nostdlib -shared -fpic -o a.so a.c ./b.so
$ readelf -r a.so
Relocation section '.rela.dyn' at offset 0x2c8 contains 1 entry:
Offset Info Type Sym. Value Sym. Name + Addend
000000002000 000100000001 R_X86_64_64 0000000000000000 x + e4
Notice that the dynamic library a.so
contains a relocation related to the symbol x
defined in the other dynamic library b.so
. When loading a.so
in memory, the dynamic linker must replace the address at offset 0x2000
with the address of x + 0xe4
. Note that an int
variable is coded in 4 bytes, and 57 such variables represent 228 bytes, or 0xe4
bytes in hexadecimal.
Global Offset and Procedure Linkage Tables
A shared library can be loaded at any address in a process space. So how does the executable know how to access to a global variable stored in it? A level of indirection is helpful here. It is called the Global Offset Table (GOT).
When a shared library is loaded, it finds a relocation for the symbol that writes the symbol's address into the process' GOT.
Similarly, how does the executable know how to access a function that's stored in a dynamic library? The program does not call the external routine directly. Instead, it uses a Procedure Linkage Table (PLT) stub.
That mechanism implements lazy binding. When the dynamic linker loads a shared library, it puts an identifier and a resolution function at specific locations in the GOT. On the first call of the function (❶, on the picture left), it falls into the default stub (❷), which loads the identifier (❸) and calls the dynamic linker (❹). Then, the dynamic linker patches the correct function address into the GOT and executes the function (❺). This ensures that next time the original PLT entry is called (❶, on the picture right), the correct function address is present (❷) and called (❸).

In the next episode, I’ll discuss topics related to reverse engineering native languages. Stay tuned!
Thanks for reading Crumbs of Cybersecurity! Subscribe for free to receive new posts and support my work.