What Happens When You Run an Executable File in Linux
What is an executable really?#
An executable file is a bunch of machine code instructions packed into a structure that your operating systems understand. Every OS has its own way of representing instructions, which is why you cannot run programs compiled for Linux on Windows and vice versa. Executable files in Windows and Linux follow the PE(portable executable) and ELF(Executable and Linkable Format) formats respectively. We will be looking at ELF files today.
What does an ELF file look like?#
If you know what the ELF format looks like, skip to this section.
To understand what an ELF file looks like, Let’s write a simple program and compile it ourselves.
Let’s write a program called helloworld.rs. It prints “Hello, world” in Rust(what a surprise!).
fn main() {
println!("Hello, world);
}
Now, Let’s compile it.
rustc helloworld.rs
We have an executable now. Let’s examine it using file and readelf which are default command line utilities in Linux.
[endless@fedora]~/Documents/codeF4ult% file helloworld
helloworld: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=4a152b7639ff9d035facc8d0fd7fbaa870cd4aa9, with debug_info, not stripped
[endless@fedora]~/Documents/codeF4ult% readelf helloworld -h
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2s complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x14050
Start of program headers: 64 (bytes into file)
Start of section headers: 3891240 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 12
Size of section headers: 64 (bytes)
Number of section headers: 43
Section header string table index: 41
Let’s quickly go through what the output is.
file is a pretty simple command. It tell us that ‘helloworld’ is a 64-bit ELF file compiled for x86-64 (as opposed to ARM64), it is dynamically linked, the dynamic linker/loader(the tool that loads the binary from disk to memory and also handles dynamic linking) it uses is /lib64/ld-linux-x86-64.so.2
readelf -h (-h for headers) shows us all the headers of ‘helloworld’. We won’t go deep into the structure of an ELF file here but these are some of the more important parts for now.
- Magic: is the four bytes of any ELF that is used by the OS to know if the file is a executable or not.
- Entry point address: This is the virtual memory address of the first instruction to be executed when the program starts.
- start of program headers: This is the offset to start of the program header table which means in this case you will find the program header table 64 Bytes from the beginning of the file. An Offset is a way referring to the location(memory address) of something relative to something else(in this case, start of the file). Program headers are also called Segments.
- start of section header: This is the offset to start of the Section Header Table. Section Headers are also called Sections..
What are the Program and Section Header Tables?#
We need to keep in mind that segments and sections are two ways of dividing the file into pieces. Think of it as having two views of the same file. One view (segments) is for the loader to load the file into memory and the other view is for the linker which stitches all the different snippets of code together to form one coherent program(this is a simplification). When we run an ELF file, the file needs to be taken from your hard drive and put into memory. This is called loading and is done by the kernel. The kernel does it by reading the segments and mapping them into a virtual address space. Virtual addressing is not in scope today but simply put it is necessary for isolating programs so you can’t access one program’s memory from another.
Program Headers#
Let’s look at the program headers(segments) of our little program
[endless@fedora]~/Documents/codeF4ult% readelf -l helloworld
Elf file type is DYN (Position-Independent Executable file)
Entry point 0x14050
There are 12 program headers, starting at offset 64
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x00000000000002a0 0x00000000000002a0 R 0x8
INTERP 0x00000000000002e0 0x00000000000002e0 0x00000000000002e0
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000013044 0x0000000000013044 R 0x1000
LOAD 0x0000000000013050 0x0000000000014050 0x0000000000014050
0x000000000003ea10 0x000000000003ea10 R E 0x1000
LOAD 0x0000000000051a60 0x0000000000053a60 0x0000000000053a60
0x0000000000002758 0x00000000000035a0 RW 0x1000
LOAD 0x00000000000541b8 0x00000000000571b8 0x00000000000571b8
0x00000000000009b8 0x0000000000000a80 RW 0x1000
TLS 0x0000000000051a60 0x0000000000053a60 0x0000000000053a60
0x0000000000000020 0x0000000000000050 R 0x8
DYNAMIC 0x00000000000536f0 0x00000000000556f0 0x00000000000556f0
0x00000000000001d0 0x00000000000001d0 RW 0x8
GNU_RELRO 0x0000000000051a60 0x0000000000053a60 0x0000000000053a60
0x0000000000002758 0x00000000000035a0 R 0x1
GNU_EH_FRAME 0x000000000000cbe0 0x000000000000cbe0 0x000000000000cbe0
0x00000000000010dc 0x00000000000010dc R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x0
NOTE 0x00000000000002fc 0x00000000000002fc 0x00000000000002fc
0x0000000000000044 0x0000000000000044 R 0x4
I only included the program headers here for now, we’ll look at the section headers later.
In our case there are 11 different segments in the binary. Let’s go through them one by one.
- PHDR: The program header table is here. This is useful for dynamic linking
- INTERP: This one contains the path to the dynamic loader as we can see above
lib64/ld-linux-x86-64.so.2 - LOAD: This segment is really important, it gets mapped to memory. Your code and data sections are part of this segment.
- DYNAMIC: This is also for dynamic linking and points to the .dynamic section
- GNU_RELRO: marks some regions that should eventually be read-only. It is a security feature.
- GNU_EH_FRAME: These are used to store exception handlers. If your code has any try and except, it would go here. Rust must be doing something similar during compilation even though we didn’t having exception handling logic in our code
- GNU_STACK: This stores the stack information. The stack is where program execution flow happens
- NOTE: this entry has some auxiliary information like version.
We can use dumpelf to look at each entry in the program header table separately but I put one entry here because it was too long.
/* Program Header #0 0x40 */
{
.p_type = 6 , /* [PT_PHDR] */
.p_offset = 64 , /* (bytes into file) */
.p_vaddr = 0x40 , /* (virtual addr at runtime) */
.p_paddr = 0x40 , /* (physical addr at runtime) */
.p_filesz = 672 , /* (bytes in file) */
.p_memsz = 672 , /* (bytes in mem at runtime) */
.p_flags = 0x4 , /* PF_R */
.p_align = 8 , /* (min mem alignment in bytes) */
},
That is enough about segments for now.
Section Headers#
Sections are not loaded into memory like segments are. Once the executable is built, sections are of use to the OS. When they are of use is during linking.
If you have ever written any Rust or C code, you must have seen something that looks like this.
use std::io
#include <stdio>
These are utilities from the standard library, when you import them using syntax like that, you are telling a program called the linker to link those std lib functions to the executable. Rust and C both used to use a linker called ld but the newest versions of Rust come by default with Rust’s own linked called rust-lld. ld is also used as a standalone linker to link standalone assembly files.
I would tell you how a linker works if I knew but I don’t so let’s focus on the headers.
We can view these using objdump just as we did with segments.
[endless@fedora]~/Documents/codeF4ult% objdump -h helloworld
helloworld: file format elf64-x86-64
Sections:
Idx Name Size VMA LMA File off Algn
0 .interp 0000001c 00000000000002e0 00000000000002e0 000002e0 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
1 .note.ABI-tag 00000020 00000000000002fc 00000000000002fc 000002fc 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
2 .note.gnu.build-id 00000024 000000000000031c 000000000000031c 0000031c 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
3 .dynsym 000006a8 0000000000000340 0000000000000340 00000340 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .gnu.version 0000008e 00000000000009e8 00000000000009e8 000009e8 2**1
CONTENTS, ALLOC, LOAD, READONLY, DATA
5 .gnu.version_r 00000120 0000000000000a78 0000000000000a78 00000a78 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
6 .gnu.hash 0000001c 0000000000000b98 0000000000000b98 00000b98 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
7 .dynstr 0000043b 0000000000000bb4 0000000000000bb4 00000bb4 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
8 .rela.dyn 00003ee8 0000000000000ff0 0000000000000ff0 00000ff0 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
9 .rela.plt 00000030 0000000000004ed8 0000000000004ed8 00004ed8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
10 .gcc_except_table 00002b94 0000000000004f08 0000000000004f08 00004f08 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
11 .rodata 00005218 0000000000007aa0 0000000000007aa0 00007aa0 2**4
CONTENTS, ALLOC, LOAD, READONLY, DATA
12 .eh_frame_hdr 000010fc 000000000000ccb8 000000000000ccb8 0000ccb8 2**2
CONTENTS, ALLOC, LOAD, READONLY, DATA
13 .eh_frame 000053f0 000000000000ddb8 000000000000ddb8 0000ddb8 2**3
CONTENTS, ALLOC, LOAD, READONLY, DATA
14 .text 0003eb20 00000000000141b0 00000000000141b0 000131b0 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
15 .init 0000001b 0000000000052cd0 0000000000052cd0 00051cd0 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
16 .fini 0000000d 0000000000052cec 0000000000052cec 00051cec 2**2
CONTENTS, ALLOC, LOAD, READONLY, CODE
17 .plt 00000030 0000000000052d00 0000000000052d00 00051d00 2**4
CONTENTS, ALLOC, LOAD, READONLY, CODE
18 .tdata 00000020 0000000000053d30 0000000000053d30 00051d30 2**3
CONTENTS, ALLOC, LOAD, DATA, THREAD_LOCAL
19 .tbss 00000030 0000000000053d50 0000000000053d50 00051d50 2**3
ALLOC, THREAD_LOCAL
20 .data.rel.ro 00001c70 0000000000053d50 0000000000053d50 00051d50 2**3
CONTENTS, ALLOC, LOAD, DATA
21 .fini_array 00000008 00000000000559c0 00000000000559c0 000539c0 2**3
CONTENTS, ALLOC, LOAD, DATA
22 .init_array 00000010 00000000000559c8 00000000000559c8 000539c8 2**3
CONTENTS, ALLOC, LOAD, DATA
23 .dynamic 000001d0 00000000000559d8 00000000000559d8 000539d8 2**3
CONTENTS, ALLOC, LOAD, DATA
24 .got 000008f0 0000000000055ba8 0000000000055ba8 00053ba8 2**3
CONTENTS, ALLOC, LOAD, DATA
25 .got.plt 00000028 0000000000056498 0000000000056498 00054498 2**3
CONTENTS, ALLOC, LOAD, DATA
26 .relro_padding 00000b40 00000000000564c0 00000000000564c0 000544c0 2**0
ALLOC
27 .tm_clone_table 00000000 00000000000574c0 00000000000574c0 000544c0 2**3
CONTENTS, ALLOC, LOAD, DATA
28 .data 000009b8 00000000000574c0 00000000000574c0 000544c0 2**3
CONTENTS, ALLOC, LOAD, DATA
29 .bss 000000c8 0000000000057e78 0000000000057e78 00054e78 2**3
ALLOC
30 .comment 000000b9 0000000000000000 0000000000000000 00054e78 2**0
CONTENTS, READONLY
31 .annobin.notes 000000f7 0000000000000000 0000000000000000 00054f31 2**0
CONTENTS, READONLY
32 .gnu.build.attributes 00000144 0000000000000000 0000000000000000 00055028 2**2
CONTENTS, READONLY, OCTETS
33 .debug_abbrev 00000f4f 0000000000000000 0000000000000000 0005516c 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
34 .debug_info 00103687 0000000000000000 0000000000000000 000560bb 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
35 .debug_aranges 000078a0 0000000000000000 0000000000000000 00159742 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
36 .debug_ranges 0006dd20 0000000000000000 0000000000000000 00160fe2 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
37 .debug_str 00164ad8 0000000000000000 0000000000000000 001ced02 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
38 .debug_line 0006d6c7 0000000000000000 0000000000000000 003337da 2**0
CONTENTS, READONLY, DEBUGGING, OCTETS
This is a lot of information, we will only look at a few sections which are important.
| Section Name | Description |
|---|---|
| .text | contains all the executable code. |
| .data | stores initialized global variables. |
| .bss | holds uninitialized global variables, it is allocated at runtime. |
| .debug | contains debugging information. |
| .plt | this is the procedure linkage table, used to dynamically link shared libraries, contains entries for function calls from shared libraries. |
| I want to show(and also see for myself) that section headers don’t matter when the linking is done. To confirm that, we will write a small script that removes section headers and try to run the file afterwards. I will be using Rust to write the script but Python should be more than enough. |
Note is we are only removing section header information from the file, not the sections themselves. If you remove sections themselves, you would lose all the code and everything else actually. This is what I meant earlier when I said segments and sections are two ways to dividing a file. You need one view while loading and the other while linking.
use std::fs::OpenOptions;
use std::io::{Seek, SeekFrom, Write};
fn main() -> std::io::Result<()> {
let path = "helloworld"; // path of the test executable
let mut file = OpenOptions::new().read(true).write(true).open(path)?;
file.seek(SeekFrom::Start(0x28))?;
file.write_all(&0u64.to_le_bytes())?;
file.seek(SeekFrom::Start(0x3C))?;
file.write_all(&0u16.to_le_bytes())?;
file.seek(SeekFrom::Start(0x3E))?;
file.write_all(&0u16.to_le_bytes())?;
Ok(())
}
check number of section headers again with readelf -h helloworld. Mine was 43 as seen above.
Compile the above snippet with rustc and run.
[endless@fedora]~/Documents/codeF4ult% rustc section_stripper.rs
[endless@fedora]~/Documents/codeF4ult% ./section_stripper
[endless@fedora]~/Documents/codeF4ult% readelf -h helloworld
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2s complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Position-Independent Executable file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0x141b0
Start of program headers: 64 (bytes into file)
Start of section headers: 0 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 12
Size of section headers: 64 (bytes)
Number of section headers: 0
Section header string table index: 0
Now that there’s no sections, let’s run helloworld and see if it runs!
[endless@fedora]~/Documents/codeF4ult% ./helloworld
Hello, world
And it does. You can try the same thing with segments and what will happen.
Entry Point#
Entry Point is the memory address where code execution starts. You might think this would point to the main function but no, a lot of stuff needs to happen before we reach main. We are done with what a binary is and can move on to how it runs.
That is the end of ELF file format primer
What happens after ./a.out#
When you run this command, you get an executable file
gcc hello.c -o a.out
When you run ./a.out in the terminal, the shell forks itself which means it creates a new child process and calls a function called execve which makes a syscall with the same name. This function takes the path of the executable, command line arguments for the program and their count as arguments.
When you make a syscall(system call), you are giving up control to the kernel and asking it to do something for you that you don’t the permissions to do, same here with the execve syscall. Because of the syscall, kernel takes the arguments provided by the C wrapper(of the same name, both are called execve), opens the file and reads the first 128 bytes or so. it checks for the ELF bytes to confirm if it is an ELF or any other registered binary formats(binfmt). binfmt allows the system to use other interpreters like Java, Python and others.
Loading#
After confirming the file is in fact an ELF file, the kernel parses the ELF headers and looks at these
e_type: it can be ET_EXEC (statically linked) or ET_DYN (shared library or PIE which is Position independent Executable or dynamically linked)
e_phoff: the offset to the program header table
e_phnum: the count of program headers
Using these headers, the kernel reads the program headers table and tears down the calling process. The fork that the shell made when execve was called, that is the calling process.
Tearing down in this case means
- destroys all virtual memory areas
- drops all signal handlers
- closes all file descriptors which are close-on-exec
- resets memory descriptor to blank This means everything related to the child process got destroyed, however the process is still alive but as a blank slate, the executable will be mapped to this memory region.
The kernel iterates through each of the program headers to find PT_LOAD segments.
For each header the kernel calls the elf_map() function which uses do_mmap() underneath to map the segments into virtual memory with specified permissions.
Now that the executable is in memory, the next step is linking.
What is Linking again?#
Linking in the traditional sense refers to the compiler gluing all the code in different files and folder together into one coherent blob but that is not what we are talking about here.
Our concern here is with dynamic linking.
If every program were to be statically linked, each binary would multiple times larger than it is now and every computer would have a lot of redundant code wasting space. So essential functionality used by all the programs are bundled together into what’s called a shared library like libc. It is a library of code shared by other processes.
This exists on your user’s machine (almost every linux machine) and your rust or C code links with these shared libraries at runtime and use apis available from libc for essentials functions like printf(), malloc and much more. This is called Dynamic linking and it is handled before your code runs.
The kernel looks for PT_INTERP segment which contains the path to the interpreter. The interpreter is the linker/loader that handles dynamic linking, not related to python interpreter or something like that. The kernel then loads this “interpreter” like it loaded our executable into memory by mapping its PT_LOAD segments to the same address space. If the program is statically linked, PT_INTERP won’t exist and this step will be skipped.
Setting up the stack#
The kernel allocates the stack region. The stack is built with the base in the highest memory region and top being at the lowest memory region. This is what is meant by “the stack grows downward”. This is designed so that the stack and heap grow towards each other, so there is no need to estimate a size for either. If stack is smaller, it leaves more space for the heap and vice versa. If both grew in the same direction, we would need to estimate maximum sizes because we are putting one over the other and this can cause overflows if you estimate wrong and there would be wasted space for whatever is at the end.
- The first thing to be pushed to the (now) empty stack are the environment strings, each of them null-terminated.
- Next are the argument strings which are what you passed like
./a.out --name ndL3ss. - After that, the kernel generates 16 bytes from its CSPRNG (Cryptographically Secure Pseudo-Random Number Generator) and writes them onto the stack below the strings. It’ll become
AT_RANDOMlater. - Now we pad the stack to make it align with 16 bytes. The x86-64 ABI –Application Binary Interface, which is used to make sure all programs speak the same language and can run in an OS. In this case linux– requires this alignment. So, if at this point our stack was 46 bytes long, the padding would be two bytes of zeroes, this would make the size of the stack 48 bytes, a multiple of 16. We do it now because nothing that comes after this breaks the alignment.
- Now the kernel pushes the auxiliary vector. This is bunch of information the interpreter (like
ld-linux.so) needs to do their job and the kernel sets it up here. Documentation - Next the kernel pushes a NULL which is 8 bytes of Zero, then pushes one 8-byte pointer for each environment string which points to the strings at the base of the stack from earlier. The args are pushed in reverse, so when the stack is popped,
envp[0]pops first and the rest follow in that order. - Next it does the same thing with argument vectors. One NULL, then all the arguments in reverse order, so the first one(
argv[0]) pops first. - Finally the kernel pushes a single 8-byte integer which is
argc. If you run./a.outthis would be 1. - The
rspis set to argc which is bottom of the stack(grows downward, so pops first)
The stack is setup. The kernel now sets rip (the instruction pointer) to the dynamic linker’s entry point and drops to userspace.
The interpreter reads the auxiliary vector from the stack (step 5) to link libraries so functions like printf() and malloc() can run. That is out of scope today but when it is done, it jumps to ) _start which we will talk about next.
_start#
Start is the actual entry point of your program which the e_entry header in the ELF file points to. It comes from an object file the linker(rust-lld or ld which actually links object generated by rustc and gcc) links to the executable during compile time.
What _start does is to take the stack layout left by kernel and turn it into a function call to __libc_start_main.
This is what _start looks like
0000000000014050 <_start>:
14050: f3 0f 1e fa endbr64
14054: 31 ed xor %ebp,%ebp
14056: 49 89 d1 mov %rdx,%r9
14059: 5e pop %rsi
1405a: 48 89 e2 mov %rsp,%rdx
1405d: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp
14061: 50 push %rax
14062: 54 push %rsp
14063: 45 31 c0 xor %r8d,%r8d
14066: 31 c9 xor %ecx,%ecx
14068: 48 8d 3d d1 01 00 00 lea 0x1d1(%rip),%rdi # 14240 <main>
1406f: ff 15 4b 18 04 00 call *0x4184b(%rip) # 558c0 <__libc_start_main@GLIBC_2.34>
14075: f4 hlt
Let’s go through this snippet step by step.
xor %ebp,%ebp
This performs the xor operation on ebp against itself, which is how you turn the value of ebp into zero. This is because _start is the first function that runs in a program, and the base pointer(ebp) is used as the return address for a function. _start can’t return to anything because it is the first function, so base pointer is turned to zero.
mov %rdx,%r9
This moves the value of rdx into r9. This is a pointer to a clean up function used by __libc_start_main later.
pop %rsi
rsp at the moment points to argc which we talked about earlier when setting up the stack, it is popped off to the rsi so now points to argv[0]. We are storing argc in rsi.
mov %rsp,$rdx
This moves the value of rsp into rdx.
and $0xfffffffffffffff0,%rsp
performs a bit wise AND operation between that number and rsp. This is done to re-align the stack to 16 bytes after we popped off argc which was 8 bytes.
push %rax
This is junk data added to keep alignment after next step which pushes 8 bytes into the stack so we need another 8 to keep alignment. Now if you’re thinking, the pop %rsi and the push %rsp(which is next) should cancel each other’s misalignment and there should be no need for alignment adjustment like the past two instructions, I don’t what to say to that. Hopefully somebody other than me reads this, has an answer and lets me know. Maybe the AND operation from earlier doesn’t assume that kernel gives _start an aligned kernel.
push %rsp
pushes the value of rsp itself into the stack which was used as an argument to __libc_start_main but not anymore and is kept for alignment reasons which again makes no sense to me.
xor %r8d,%r8d
xor %ecx,%ecx
zeroes these two registers which are arguments for __libc_start_main
lea 0x1d1(%rip),%rdi
loads the address of main into rdi using RIP-relative addressing. This is the first argument to __libc_start_main
call *0x4184b(%rip)
This calls __libc_start_main. This function doesn’t usually return. It calls main() when that returns, it calls exit().
hlt
This is a safety net in case __libc_start_main does return something. This instruction halts the CPU, if performed in userspace will trigger a fault and the kernel kills the process.
What does __libc_start_main() do?#
- It computes
envpaddress from the base of the stack because unlikeargv, it doesn’t get pushed on to the stack separately. - It runs some constructor functions. This is code that needs to run before main. Something like allocating memory outside main that is used by main. malloc() needs to run before main. This situation is handled by constructor functions.
- It registers cleanup functions so when main returns, destructor functions run and flush buffers.
- Sets up threading infrastructure
- Calls main with arguments and environment variables and when main returns it passes the return value to
exit()
Now let us revise,
execve takes the arguments from the shell or GUI and loads the executable to memory in its own process. It then sets up the stack with environment variables, arguments and auxiliary vectors. _start inherits the stack from the kernel, it calls __libc_start_main which creates a stack frame which does a lot of stuff and calls main() which does its thing and calls whatever functions it wants to call which create more stack frames. When any single function is done, it will return to its caller collapsing that stack frame, this will flow all the way back to __libc_start_main which calls exit() when main() returns which terminates the program.
References#
ELF man pages An article that goes deep into how we get to main.