Home / Tutorials / GCC Compilation Phases

GCC Compilation Phases

When you run gcc main.c -o myprogram, GCC performs four distinct phases to transform your source code into an executable. Understanding each phase helps with debugging, optimization, and building systems.

The four phases are:

  1. Preprocessing — text substitution before compilation
  2. Compilation — translating C to assembly
  3. Assembly — translating assembly to machine code (object files)
  4. Linking — combining object files into an executable

Phase 1: Preprocessing

The preprocessor (cpp) handles directives that begin with #. It runs before any actual compilation. Its job is purely textual:

Invoking the Preprocessor

You can run just the preprocessor step with:

cpp main.c -o main.i
# or equivalently:
gcc -E main.c -o main.i

The output .i file contains the fully preprocessed source — all #include files inlined, all macros expanded, no comments.

Conditional Compilation

A common use case is enabling debug output only in debug builds:

#ifdef DEBUG_MODE
    printf("x = %d\n", x);
#endif

To enable this at compile time:

gcc -DDEBUG_MODE main.c -o myprogram

The -D flag is equivalent to writing #define DEBUG_MODE at the top of the file.

Viewing Macro Expansions

The -E output lets you see exactly what the compiler receives after preprocessing — useful for debugging complex macros.


Phase 2: Compilation

The compiler proper takes the preprocessed source (.i file) and translates it into assembly language (.s file). This is where:

Stopping After Compilation

gcc -S main.c -o main.s

The -S flag tells GCC to stop after the compilation step, producing an assembly .s file.

Reading the Assembly

The assembly output contains human-readable (though dense) instructions:

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $5, -4(%rbp)
    movl    -4(%rbp), %eax
    popq    %rbp
    ret

Understanding this output is useful for performance analysis and confirming that your compiler optimizations are working as expected.


Phase 3: Assembly

The assembler (as) translates assembly language into machine code — binary instructions the CPU can execute — and packages the result into an object file (.o).

Object files use a standard format (ELF on Linux, Mach-O on macOS, COFF on Windows). They contain:

Producing Object Files

gcc -c main.c -o main.o
# or, from assembly:
as main.s -o main.o

Inspecting Object Files

The objdump tool lets you inspect the contents of an object file:

objdump -d main.o      # disassemble
objdump -t main.o      # show symbol table
objdump -r main.o      # show relocation entries

Relocation entries are important — they mark the locations in the object file where the linker must fill in real addresses for external symbols (functions and variables defined in other translation units).


Phase 4: Linking

The linker (ld, usually invoked via gcc) takes one or more object files and combines them into a final executable. Its jobs include:

Linking Multiple Object Files

gcc main.o utils.o -o myprogram

Linking with Libraries

gcc main.o -lm -o myprogram      # link with libm (math library)
gcc main.o -L. -lmylib -o prog   # link with a library in the current directory

Static vs Dynamic Linking

Viewing Link Dependencies

ldd myprogram     # list dynamic library dependencies
nm myprogram      # list symbols (defined and undefined)

Putting It All Together

You can perform all four phases manually:

# 1. Preprocess
gcc -E main.c -o main.i

# 2. Compile to assembly
gcc -S main.i -o main.s

# 3. Assemble to object file
as main.s -o main.o

# 4. Link
gcc main.o -o myprogram

Or let GCC handle all phases in one command (the most common approach):

gcc main.c -o myprogram

Understanding these phases makes it much easier to troubleshoot build errors, apply targeted optimizations, and reason about binary size and dependencies.