Structure of an Executable

Executable Formats
[...] executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file that must be interpreted (parsed) by a program to be meaningful. The exact interpretation depends upon the use. "Instructions" is traditionally taken to mean machine code instructions for a physical CPU.
Source: Wikipedia
In a nutshell, an executable is a file that can be parsed by the underlying operating system and contains machine code that can be executed directly by the CPU. Many people use the term executable to describe files containing bytecode that are intended for a software interpreter, such as Java or Python bytecode, or for scripting languages, too.
Examples of well-known executable formats include:
- a.out, COFF, ELF: used on Unix-like operating systems;
- Mach-O: used in the Apple ecosystem by macOS and iOS;
- COM, MZ: used by DOS and old versions of Microsoft Windows;
- PE: used on Microsoft Windows.
ELF Format
ELF stands for Executable and Linkable Format. This file format supports executables, object code, shared libraries, and core dumps.
Its high-level file layout consists of an ELF Header, the Program Header Table, describing segments, the Section Header Table, describing sections, and data referred to by entries from the program header table or the section header table (or both).
Segments contain information needed for the runtime phase of the ELF file. Sections contain information needed for linking and relocation.
The structure of an ELF header can be viewed with the readelf
utility.
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: EXEC (Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x4027e0
Start of program headers: 64 (bytes into file)
Start of section headers: 107352 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 8
Size of section headers: 64 (bytes)
Number of section headers: 29
Section header string table index: 28
[...]
The ELF format defines two different interpretations: the first one is for the linking process and the second one is for the execution process. Sections refer to object code that is waiting to be linked. One or more sections map to a segment in the executable. Examples of sections include .text
(containing machine code), .data
(containing initialized global and static variables), .got
(containing the Global Offset Table), .rodata
(containing immutable data), etc.
Another typical example of a section is a symbol table. It is like a map: it shows where symbols (essentially, human-readable identifiers of functions, or constants) are located in an ELF file.
There are two types of symbol tables: the main symbol table, usually named .symtab
, which is used by the linker at compile time, and the dynamic symbol table, named .dynsym
, which is used by the dynamic linker at run time. Symbols are needed when linking because the linker looks in the symbol tables when resolving symbols.
A relocation is some blank space, like an unspecified memory address, that needs to be filled later (typically, at execution time).
Program headers tell the operating system what information is needed to load the binary into memory and run it. For example, PT_INTERP
is a string pointer to an interpreter for the binary file (typically the dynamic loader), and PT_LOAD
specifies a given segment location in the ELF file, its virtual base address when alive, and the initial permissions of the segment.
PE Format
PE stands for Portable Executable. The PE format is mainly used in the Microsoft Windows ecosystem and UEFI firmwares. It is a modified version of the Unix COFF (Common Object File Format) format.
The PE format is similar in structure to ELF, but there are some differences. For example, both ELF and PE formats organize data into sections such as .text
and .data
, but ELF files also define program segments to control memory loading more explicitly.
Mach-O Format
Mach-O is short for Mach Object. Like ELF and PE, it's a format for executables, object codes, shared libraries and core dumps. It's used in most systems based on the Mach kernel, including macOS and iOS, in the Apple ecosystem.
The Mach-O header is followed by a series of load commands and one or more segments. Each segment contains a certain number of sections, ranging from 0 to 255.
The Mach-O header and load commands support multi-architecture binaries. This means that a single binary file can contain the code for several different hardware architectures. For example, an iOS binary can contain code for the ARMv6, ARMv7, ARMv7s, ARMv8, x86
and x86-64
architectures. This makes it possible to run it on different models of mobile phones and emulators used at development time.
In the next episode, I’ll cover the lifecycle of an executable on Linux. Stay tuned!
Thanks for reading Crumbs of Cybersecurity! Subscribe for free to receive new posts and support my work.