4 min read

Life of an Executable

Photo by Ryoji Iwata / Unsplash
Photo by Ryoji Iwata / Unsplash

Process Loading under Linux

The Linux operating system uses the execve system call or one of its variants to load a process.

First, the kernel allocates internal structures. Then, it parses the given ELF file. If the ELF file has a specified interpreter field, it is executed. For executables that use dynamic libraries, this program is the dynamic linker, which is responsible for loading the required dynamic libraries.

The kernel then passes some information to the started process, such as the program’s arguments, environment variables, etc, by copying them to the process' stack.

The kernel also adds a virtual dynamic library, named linux-vdso.so.1, to implement faster system calls if the underlying hardware allows it. The kernel then passes control to the entry point of the interpreter (usually, the dynamic linker).

Once the dynamic linker has done its job, it transfers the control to the entry point as specified in the ELF executable (typically, the _start symbol). Then, the __libc_start_main() routine is called, which sets up certain C library mechanisms.

After that, the control is passed to the init code, which executes the global constructors stored in the .ctors section.

Eventually, the function main is called. When the it finishes, __libc_start_main() passes the control to the fini code, which executes the global destructors stored in the .dtors section.

The following C code illustrates how to define a constructor and a destructor that will be run before and after the function main(), respectively.

#include <stdio.h>

void __attribute__((constructor)) program_init(void)  
{
      printf("\ninit\n");
}
    
void  __attribute__((destructor)) program_fini(void) 
{
    printf("\nfini\n");
}
    
 int main(void)
{
    printf("\nmain\n");  
 
   return 0;
}

See here and here for more information on what happens on the kernel side.

Dynamic Linking

The goal of dynamic linking is to save code space by sharing common (read-only) code between processes. The virtual memory mechanism allows shared code to be mapped to the same physical memory pages. Shared code is contained in shared libraries (or dynamic libraries) files (file extension is .so under Linux, .dll under Microsoft Windows and .dylib under macOS). Shared libraries have code and data sections, just like executables.

On Linux, programs use dynamic linking by default, unless they are built with the compilation flag -static. The compiler only needs to know the function prototypes. The static linker leaves information about which dynamic libraries to load in the .dynamic section.

Here is an example of the contents of a .dynamic section for the /bin/ls executable:

$ readelf --dynamic /bin/ls

Dynamic section at offset 0x21a38 contains 28 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libselinux.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]

Dynamic libraries can also load other dynamic libraries. For the same /bin/ls executable, one can obtain an exhaustive list thanks to the dynamic linker ldd:

$ ldd /bin/ls
	linux-vdso.so.1 (0x00007ffc7b91a000)
	libselinux.so.1 => /lib/x86_64-linux-gnu/libselinux.so.1 (0x00007fbf02c5c000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fbf02a00000)
	libpcre2-8.so.0 => /lib/x86_64-linux-gnu/libpcre2-8.so.0 (0x00007fbf02966000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fbf02cb8000)

Role of the Dynamic Linker

The dynamic linker is responsible for managing shared libraries on behalf of the executable. The primary function of the dynamic linker consists in resolving the missing addresses in the executable during runtime. A relocation is a hint about the fact that a particular address needs to be “fixed” at runtime. Before running the code, the dynamic linker needs to go through all relocations and fix the referred addresses. Many different types of relocations exist and they are ABI-dependent.

Here is an illustration on how a relocation works:

$ cat a.c
    extern int x[100];
    int *y = x + 57;

$ cat b.c
    int x[100];

$ gcc -nostdlib -shared -fpic -s -o b.so b.c
$ gcc -nostdlib -shared -fpic -o a.so a.c ./b.so
$ readelf -r a.so

Relocation section '.rela.dyn' at offset 0x2c8 contains 1 entry:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000002000  000100000001 R_X86_64_64       0000000000000000 x + e4

Notice that the dynamic library a.so contains a relocation related to the symbol x defined in the other dynamic library b.so. When loading a.so in memory, the dynamic linker must replace the address at offset 0x2000 with the address of x + 0xe4. Note that an int variable is coded in 4 bytes, and 57 such variables represent 228 bytes, or 0xe4 bytes in hexadecimal.

Global Offset and Procedure Linkage Tables

A shared library can be loaded at any address in a process space. So how does the executable know how to access to a global variable stored in it? A level of indirection is helpful here. It is called the Global Offset Table (GOT).

When a shared library is loaded, it finds a relocation for the symbol that writes the symbol's address into the process' GOT.

Similarly, how does the executable know how to access a function that's stored in a dynamic library? The program does not call the external routine directly. Instead, it uses a Procedure Linkage Table (PLT) stub.

That mechanism implements lazy binding. When the dynamic linker loads a shared library, it puts an identifier and a resolution function at specific locations in the GOT. On the first call of the function (❶, on the picture left), it falls into the default stub (❷), which loads the identifier (❸) and calls the dynamic linker (❹). Then, the dynamic linker patches the correct function address into the GOT and executes the function (❺). This ensures that next time the original PLT entry is called (❶, on the picture right), the correct function address is present (❷) and called (❸).

Dynamic Symbol Resolution Involving Lazy Binding on Linux

In the next episode, I’ll discuss topics related to reverse engineering native languages. Stay tuned!


Thanks for reading Crumbs of Cybersecurity! Subscribe for free to receive new posts and support my work.