What is an executable really?#

An executable file is a bunch of machine code instructions packed into a structure that your operating systems understand. Every OS has its own way of representing instructions, which is why you cannot run programs compiled for Linux on Windows and vice versa. Executable files in Windows and Linux follow the PE(portable executable) and ELF(Executable and Linkable Format) formats respectively. We will be looking at ELF files today.

What does an ELF file look like?#

If you know what the ELF format looks like, skip to this section.

To understand what an ELF file looks like, Let’s write a simple program and compile it ourselves.

Let’s write a program called helloworld.rs. It prints “Hello, world” in Rust(what a surprise!).

fn main() {
	println!("Hello, world);
}

Now, Let’s compile it.

rustc helloworld.rs

We have an executable now. Let’s examine it using file and readelf which are default command line utilities in Linux.

[endless@fedora]~/Documents/codeF4ult% file helloworld
helloworld: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=4a152b7639ff9d035facc8d0fd7fbaa870cd4aa9, with debug_info, not stripped

[endless@fedora]~/Documents/codeF4ult% readelf helloworld -h
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2s complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x14050
  Start of program headers:          64 (bytes into file)
  Start of section headers:          3891240 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         12
  Size of section headers:           64 (bytes)
  Number of section headers:         43
  Section header string table index: 41

Let’s quickly go through what the output is. file is a pretty simple command. It tell us that ‘helloworld’ is a 64-bit ELF file compiled for x86-64 (as opposed to ARM64), it is dynamically linked, the dynamic linker/loader(the tool that loads the binary from disk to memory and also handles dynamic linking) it uses is /lib64/ld-linux-x86-64.so.2

readelf -h (-h for headers) shows us all the headers of ‘helloworld’. We won’t go deep into the structure of an ELF file here but these are some of the more important parts for now.

  • Magic: is the four bytes of any ELF that is used by the OS to know if the file is a executable or not.
  • Entry point address: This is the virtual memory address of the first instruction to be executed when the program starts.
  • start of program headers: This is the offset to start of the program header table which means in this case you will find the program header table 64 Bytes from the beginning of the file. An Offset is a way referring to the location(memory address) of something relative to something else(in this case, start of the file). Program headers are also called Segments.
  • start of section header: This is the offset to start of the Section Header Table. Section Headers are also called Sections..

What are the Program and Section Header Tables?#

We need to keep in mind that segments and sections are two ways of dividing the file into pieces. Think of it as having two views of the same file. One view (segments) is for the loader to load the file into memory and the other view is for the linker which stitches all the different snippets of code together to form one coherent program(this is a simplification). When we run an ELF file, the file needs to be taken from your hard drive and put into memory. This is called loading and is done by the kernel. The kernel does it by reading the segments and mapping them into a virtual address space. Virtual addressing is not in scope today but simply put it is necessary for isolating programs so you can’t access one program’s memory from another.

Program Headers#

Let’s look at the program headers(segments) of our little program

[endless@fedora]~/Documents/codeF4ult% readelf -l helloworld                                            
                                                                                                          
Elf file type is DYN (Position-Independent Executable file)                                               
Entry point 0x14050
There are 12 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002a0 0x00000000000002a0  R      0x8
  INTERP         0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000013044 0x0000000000013044  R      0x1000
  LOAD           0x0000000000013050 0x0000000000014050 0x0000000000014050
                 0x000000000003ea10 0x000000000003ea10  R E    0x1000
  LOAD           0x0000000000051a60 0x0000000000053a60 0x0000000000053a60
                 0x0000000000002758 0x00000000000035a0  RW     0x1000
  LOAD           0x00000000000541b8 0x00000000000571b8 0x00000000000571b8
                 0x00000000000009b8 0x0000000000000a80  RW     0x1000
  TLS            0x0000000000051a60 0x0000000000053a60 0x0000000000053a60
                 0x0000000000000020 0x0000000000000050  R      0x8
  DYNAMIC        0x00000000000536f0 0x00000000000556f0 0x00000000000556f0
                 0x00000000000001d0 0x00000000000001d0  RW     0x8
  GNU_RELRO      0x0000000000051a60 0x0000000000053a60 0x0000000000053a60
                 0x0000000000002758 0x00000000000035a0  R      0x1
  GNU_EH_FRAME   0x000000000000cbe0 0x000000000000cbe0 0x000000000000cbe0
                 0x00000000000010dc 0x00000000000010dc  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x0
  NOTE           0x00000000000002fc 0x00000000000002fc 0x00000000000002fc
                 0x0000000000000044 0x0000000000000044  R      0x4

I only included the program headers here for now, we’ll look at the section headers later.

In our case there are 11 different segments in the binary. Let’s go through them one by one.

  • PHDR: The program header table is here. This is useful for dynamic linking
  • INTERP: This one contains the path to the dynamic loader as we can see above lib64/ld-linux-x86-64.so.2
  • LOAD: This segment is really important, it gets mapped to memory. Your code and data sections are part of this segment.
  • DYNAMIC: This is also for dynamic linking and points to the .dynamic section
  • GNU_RELRO: marks some regions that should eventually be read-only. It is a security feature.
  • GNU_EH_FRAME: These are used to store exception handlers. If your code has any try and except, it would go here. Rust must be doing something similar during compilation even though we didn’t having exception handling logic in our code
  • GNU_STACK: This stores the stack information. The stack is where program execution flow happens
  • NOTE: this entry has some auxiliary information like version.

We can use dumpelf to look at each entry in the program header table separately but I put one entry here because it was too long.

/* Program Header #0 0x40 */
{
	.p_type   = 6          , /* [PT_PHDR] */
	.p_offset = 64         , /* (bytes into file) */
	.p_vaddr  = 0x40       , /* (virtual addr at runtime) */
	.p_paddr  = 0x40       , /* (physical addr at runtime) */
	.p_filesz = 672        , /* (bytes in file) */
	.p_memsz  = 672        , /* (bytes in mem at runtime) */
	.p_flags  = 0x4        , /* PF_R */
	.p_align  = 8          , /* (min mem alignment in bytes) */
},

That is enough about segments for now.

Section Headers#

Sections are not loaded into memory like segments are. Once the executable is built, sections are of use to the OS. When they are of use is during linking.

If you have ever written any Rust or C code, you must have seen something that looks like this. use std::io #include <stdio> These are utilities from the standard library, when you import them using syntax like that, you are telling a program called the linker to link those std lib functions to the executable. Rust and C both used to use a linker called ld but the newest versions of Rust come by default with Rust’s own linked called rust-lld. ld is also used as a standalone linker to link standalone assembly files. I would tell you how a linker works if I knew but I don’t so let’s focus on the headers.

We can view these using objdump just as we did with segments.

[endless@fedora]~/Documents/codeF4ult% objdump -h helloworld                                            
                                                                                                          
helloworld:     file format elf64-x86-64                                                                  
                                                                                                          
Sections:                                                                                                 
Idx Name          Size      VMA               LMA               File off  Algn                            
  0 .interp       0000001c  00000000000002e0  00000000000002e0  000002e0  2**0                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  1 .note.ABI-tag 00000020  00000000000002fc  00000000000002fc  000002fc  2**2                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  2 .note.gnu.build-id 00000024  000000000000031c  000000000000031c  0000031c  2**2                       
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  3 .dynsym       000006a8  0000000000000340  0000000000000340  00000340  2**3                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  4 .gnu.version  0000008e  00000000000009e8  00000000000009e8  000009e8  2**1                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  5 .gnu.version_r 00000120  0000000000000a78  0000000000000a78  00000a78  2**2                           
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  6 .gnu.hash     0000001c  0000000000000b98  0000000000000b98  00000b98  2**3                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  7 .dynstr       0000043b  0000000000000bb4  0000000000000bb4  00000bb4  2**0                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  8 .rela.dyn     00003ee8  0000000000000ff0  0000000000000ff0  00000ff0  2**3                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
  9 .rela.plt     00000030  0000000000004ed8  0000000000004ed8  00004ed8  2**3                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
 10 .gcc_except_table 00002b94  0000000000004f08  0000000000004f08  00004f08  2**2                        
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
 11 .rodata       00005218  0000000000007aa0  0000000000007aa0  00007aa0  2**4                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
 12 .eh_frame_hdr 000010fc  000000000000ccb8  000000000000ccb8  0000ccb8  2**2                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
 13 .eh_frame     000053f0  000000000000ddb8  000000000000ddb8  0000ddb8  2**3                            
                  CONTENTS, ALLOC, LOAD, READONLY, DATA                                                   
 14 .text         0003eb20  00000000000141b0  00000000000141b0  000131b0  2**4                            
                  CONTENTS, ALLOC, LOAD, READONLY, CODE                                                   
 15 .init         0000001b  0000000000052cd0  0000000000052cd0  00051cd0  2**2                            
                  CONTENTS, ALLOC, LOAD, READONLY, CODE                                                   
 16 .fini         0000000d  0000000000052cec  0000000000052cec  00051cec  2**2                            
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 17 .plt          00000030  0000000000052d00  0000000000052d00  00051d00  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 18 .tdata        00000020  0000000000053d30  0000000000053d30  00051d30  2**3
                  CONTENTS, ALLOC, LOAD, DATA, THREAD_LOCAL
 19 .tbss         00000030  0000000000053d50  0000000000053d50  00051d50  2**3
                  ALLOC, THREAD_LOCAL
 20 .data.rel.ro  00001c70  0000000000053d50  0000000000053d50  00051d50  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 21 .fini_array   00000008  00000000000559c0  00000000000559c0  000539c0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 22 .init_array   00000010  00000000000559c8  00000000000559c8  000539c8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 23 .dynamic      000001d0  00000000000559d8  00000000000559d8  000539d8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 24 .got          000008f0  0000000000055ba8  0000000000055ba8  00053ba8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 25 .got.plt      00000028  0000000000056498  0000000000056498  00054498  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 26 .relro_padding 00000b40  00000000000564c0  00000000000564c0  000544c0  2**0
                  ALLOC
 27 .tm_clone_table 00000000  00000000000574c0  00000000000574c0  000544c0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 28 .data         000009b8  00000000000574c0  00000000000574c0  000544c0  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 29 .bss          000000c8  0000000000057e78  0000000000057e78  00054e78  2**3
                  ALLOC
 30 .comment      000000b9  0000000000000000  0000000000000000  00054e78  2**0
                  CONTENTS, READONLY
 31 .annobin.notes 000000f7  0000000000000000  0000000000000000  00054f31  2**0
                  CONTENTS, READONLY
 32 .gnu.build.attributes 00000144  0000000000000000  0000000000000000  00055028  2**2
                  CONTENTS, READONLY, OCTETS
 33 .debug_abbrev 00000f4f  0000000000000000  0000000000000000  0005516c  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 34 .debug_info   00103687  0000000000000000  0000000000000000  000560bb  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 35 .debug_aranges 000078a0  0000000000000000  0000000000000000  00159742  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 36 .debug_ranges 0006dd20  0000000000000000  0000000000000000  00160fe2  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 37 .debug_str    00164ad8  0000000000000000  0000000000000000  001ced02  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS
 38 .debug_line   0006d6c7  0000000000000000  0000000000000000  003337da  2**0
                  CONTENTS, READONLY, DEBUGGING, OCTETS

This is a lot of information, we will only look at a few sections which are important.

Section Name Description
.text contains all the executable code.
.data stores initialized global variables.
.bss holds uninitialized global variables, it is allocated at runtime.
.debug contains debugging information.
.plt this is the procedure linkage table, used to dynamically link shared libraries, contains entries for function calls from shared libraries.
I want to show(and also see for myself) that section headers don’t matter when the linking is done. To confirm that, we will write a small script that removes section headers and try to run the file afterwards. I will be using Rust to write the script but Python should be more than enough.

Note is we are only removing section header information from the file, not the sections themselves. If you remove sections themselves, you would lose all the code and everything else actually. This is what I meant earlier when I said segments and sections are two ways to dividing a file. You need one view while loading and the other while linking.

use std::fs::OpenOptions;
use std::io::{Seek, SeekFrom, Write};

fn main() -> std::io::Result<()> {
    let path = "helloworld"; // path of the test executable

    let mut file = OpenOptions::new().read(true).write(true).open(path)?;

    file.seek(SeekFrom::Start(0x28))?;
    file.write_all(&0u64.to_le_bytes())?;

    file.seek(SeekFrom::Start(0x3C))?;
    file.write_all(&0u16.to_le_bytes())?;

    file.seek(SeekFrom::Start(0x3E))?;
    file.write_all(&0u16.to_le_bytes())?;

    Ok(())
}

check number of section headers again with readelf -h helloworld. Mine was 43 as seen above. Compile the above snippet with rustc and run.

[endless@fedora]~/Documents/codeF4ult% rustc section_stripper.rs 
[endless@fedora]~/Documents/codeF4ult% ./section_stripper 
[endless@fedora]~/Documents/codeF4ult% readelf -h helloworld 
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  Class:                             ELF64
  Data:                              2s complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              DYN (Position-Independent Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x141b0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          0 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         12
  Size of section headers:           64 (bytes)
  Number of section headers:         0
  Section header string table index: 0

Now that there’s no sections, let’s run helloworld and see if it runs!

[endless@fedora]~/Documents/codeF4ult% ./helloworld 
Hello, world

And it does. You can try the same thing with segments and what will happen.

Entry Point#

Entry Point is the memory address where code execution starts. You might think this would point to the main function but no, a lot of stuff needs to happen before we reach main. We are done with what a binary is and can move on to how it runs.

That is the end of ELF file format primer

What happens after ./a.out#

When you run this command, you get an executable file

gcc hello.c -o a.out

When you run ./a.out in the terminal, the shell forks itself which means it creates a new child process and calls a function called execve which makes a syscall with the same name. This function takes the path of the executable, command line arguments for the program and their count as arguments.

When you make a syscall(system call), you are giving up control to the kernel and asking it to do something for you that you don’t the permissions to do, same here with the execve syscall. Because of the syscall, kernel takes the arguments provided by the C wrapper(of the same name, both are called execve), opens the file and reads the first 128 bytes or so. it checks for the ELF bytes to confirm if it is an ELF or any other registered binary formats(binfmt). binfmt allows the system to use other interpreters like Java, Python and others.

Loading#

After confirming the file is in fact an ELF file, the kernel parses the ELF headers and looks at these e_type: it can be ET_EXEC (statically linked) or ET_DYN (shared library or PIE which is Position independent Executable or dynamically linked) e_phoff: the offset to the program header table e_phnum: the count of program headers

Using these headers, the kernel reads the program headers table and tears down the calling process. The fork that the shell made when execve was called, that is the calling process. Tearing down in this case means

  • destroys all virtual memory areas
  • drops all signal handlers
  • closes all file descriptors which are close-on-exec
  • resets memory descriptor to blank This means everything related to the child process got destroyed, however the process is still alive but as a blank slate, the executable will be mapped to this memory region.

The kernel iterates through each of the program headers to find PT_LOAD segments.

For each header the kernel calls the elf_map() function which uses do_mmap() underneath to map the segments into virtual memory with specified permissions.

Now that the executable is in memory, the next step is linking.

What is Linking again?#

Linking in the traditional sense refers to the compiler gluing all the code in different files and folder together into one coherent blob but that is not what we are talking about here. Our concern here is with dynamic linking. If every program were to be statically linked, each binary would multiple times larger than it is now and every computer would have a lot of redundant code wasting space. So essential functionality used by all the programs are bundled together into what’s called a shared library like libc. It is a library of code shared by other processes. This exists on your user’s machine (almost every linux machine) and your rust or C code links with these shared libraries at runtime and use apis available from libc for essentials functions like printf(), malloc and much more. This is called Dynamic linking and it is handled before your code runs.

The kernel looks for PT_INTERP segment which contains the path to the interpreter. The interpreter is the linker/loader that handles dynamic linking, not related to python interpreter or something like that. The kernel then loads this “interpreter” like it loaded our executable into memory by mapping its PT_LOAD segments to the same address space. If the program is statically linked, PT_INTERP won’t exist and this step will be skipped.

Setting up the stack#

The kernel allocates the stack region. The stack is built with the base in the highest memory region and top being at the lowest memory region. This is what is meant by “the stack grows downward”. This is designed so that the stack and heap grow towards each other, so there is no need to estimate a size for either. If stack is smaller, it leaves more space for the heap and vice versa. If both grew in the same direction, we would need to estimate maximum sizes because we are putting one over the other and this can cause overflows if you estimate wrong and there would be wasted space for whatever is at the end.

  1. The first thing to be pushed to the (now) empty stack are the environment strings, each of them null-terminated.
  2. Next are the argument strings which are what you passed like ./a.out --name ndL3ss.
  3. After that, the kernel generates 16 bytes from its CSPRNG (Cryptographically Secure Pseudo-Random Number Generator) and writes them onto the stack below the strings. It’ll become AT_RANDOM later.
  4. Now we pad the stack to make it align with 16 bytes. The x86-64 ABI –Application Binary Interface, which is used to make sure all programs speak the same language and can run in an OS. In this case linux– requires this alignment. So, if at this point our stack was 46 bytes long, the padding would be two bytes of zeroes, this would make the size of the stack 48 bytes, a multiple of 16. We do it now because nothing that comes after this breaks the alignment.
  5. Now the kernel pushes the auxiliary vector. This is bunch of information the interpreter (like ld-linux.so) needs to do their job and the kernel sets it up here. Documentation
  6. Next the kernel pushes a NULL which is 8 bytes of Zero, then pushes one 8-byte pointer for each environment string which points to the strings at the base of the stack from earlier. The args are pushed in reverse, so when the stack is popped, envp[0] pops first and the rest follow in that order.
  7. Next it does the same thing with argument vectors. One NULL, then all the arguments in reverse order, so the first one(argv[0]) pops first.
  8. Finally the kernel pushes a single 8-byte integer which is argc. If you run ./a.out this would be 1.
  9. The rsp is set to argc which is bottom of the stack(grows downward, so pops first)

The stack is setup. The kernel now sets rip (the instruction pointer) to the dynamic linker’s entry point and drops to userspace.

The interpreter reads the auxiliary vector from the stack (step 5) to link libraries so functions like printf() and malloc() can run. That is out of scope today but when it is done, it jumps to ) _start which we will talk about next.

_start#

Start is the actual entry point of your program which the e_entry header in the ELF file points to. It comes from an object file the linker(rust-lld or ld which actually links object generated by rustc and gcc) links to the executable during compile time.

What _start does is to take the stack layout left by kernel and turn it into a function call to __libc_start_main.

This is what _start looks like

0000000000014050 <_start>:
   14050:       f3 0f 1e fa             endbr64
   14054:       31 ed                   xor    %ebp,%ebp
   14056:       49 89 d1                mov    %rdx,%r9
   14059:       5e                      pop    %rsi
   1405a:       48 89 e2                mov    %rsp,%rdx
   1405d:       48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
   14061:       50                      push   %rax
   14062:       54                      push   %rsp
   14063:       45 31 c0                xor    %r8d,%r8d
   14066:       31 c9                   xor    %ecx,%ecx
   14068:       48 8d 3d d1 01 00 00    lea    0x1d1(%rip),%rdi        # 14240 <main>
   1406f:       ff 15 4b 18 04 00       call   *0x4184b(%rip)        # 558c0 <__libc_start_main@GLIBC_2.34>
   14075:       f4                      hlt 

Let’s go through this snippet step by step.

xor %ebp,%ebp

This performs the xor operation on ebp against itself, which is how you turn the value of ebp into zero. This is because _start is the first function that runs in a program, and the base pointer(ebp) is used as the return address for a function. _start can’t return to anything because it is the first function, so base pointer is turned to zero.

mov %rdx,%r9

This moves the value of rdx into r9. This is a pointer to a clean up function used by __libc_start_main later.

pop %rsi

rsp at the moment points to argc which we talked about earlier when setting up the stack, it is popped off to the rsi so now points to argv[0]. We are storing argc in rsi.

mov %rsp,$rdx

This moves the value of rsp into rdx.

and    $0xfffffffffffffff0,%rsp

performs a bit wise AND operation between that number and rsp. This is done to re-align the stack to 16 bytes after we popped off argc which was 8 bytes.

push   %rax

This is junk data added to keep alignment after next step which pushes 8 bytes into the stack so we need another 8 to keep alignment. Now if you’re thinking, the pop %rsi and the push %rsp(which is next) should cancel each other’s misalignment and there should be no need for alignment adjustment like the past two instructions, I don’t what to say to that. Hopefully somebody other than me reads this, has an answer and lets me know. Maybe the AND operation from earlier doesn’t assume that kernel gives _start an aligned kernel.

push   %rsp

pushes the value of rsp itself into the stack which was used as an argument to __libc_start_main but not anymore and is kept for alignment reasons which again makes no sense to me.

xor   %r8d,%r8d
xor   %ecx,%ecx

zeroes these two registers which are arguments for __libc_start_main

lea    0x1d1(%rip),%rdi

loads the address of main into rdi using RIP-relative addressing. This is the first argument to __libc_start_main

call   *0x4184b(%rip)

This calls __libc_start_main. This function doesn’t usually return. It calls main() when that returns, it calls exit().

hlt

This is a safety net in case __libc_start_main does return something. This instruction halts the CPU, if performed in userspace will trigger a fault and the kernel kills the process.

What does __libc_start_main() do?#

  • It computes envp address from the base of the stack because unlike argv, it doesn’t get pushed on to the stack separately.
  • It runs some constructor functions. This is code that needs to run before main. Something like allocating memory outside main that is used by main. malloc() needs to run before main. This situation is handled by constructor functions.
  • It registers cleanup functions so when main returns, destructor functions run and flush buffers.
  • Sets up threading infrastructure
  • Calls main with arguments and environment variables and when main returns it passes the return value to exit()

Now let us revise, execve takes the arguments from the shell or GUI and loads the executable to memory in its own process. It then sets up the stack with environment variables, arguments and auxiliary vectors. _start inherits the stack from the kernel, it calls __libc_start_main which creates a stack frame which does a lot of stuff and calls main() which does its thing and calls whatever functions it wants to call which create more stack frames. When any single function is done, it will return to its caller collapsing that stack frame, this will flow all the way back to __libc_start_main which calls exit() when main() returns which terminates the program.

References#

ELF man pages An article that goes deep into how we get to main.