Home / Tutorials / GCC Compilation Phases

GCC Compilation Phases

4 min read C / C++ Open source Linux

When you run gcc main.c -o myprogram, GCC performs four distinct phases to transform your source code into an executable. Understanding each phase helps with debugging, optimization, and building systems.

The four phases are:

Preprocessing: text substitution before compilation
Compilation: translating C to assembly
Assembly: translating assembly to machine code (object files)
Linking: combining object files into an executable

Phase 1: Preprocessing

The preprocessor (cpp) handles directives that begin with #. It runs before any actual compilation. Its job is purely textual:

#include: replaces the directive with the contents of the specified file
#define: defines macros; every occurrence in the code is replaced with the expansion
#ifdef / #ifndef / #endif: conditional compilation; sections of code are included or excluded based on whether a macro is defined
Comments: stripped out entirely

Invoking the Preprocessor

You can run just the preprocessor step with:

cpp main.c -o main.i
# or equivalently:
gcc -E main.c -o main.i

The output .i file contains the fully preprocessed source, all #include files inlined, all macros expanded, no comments.

Conditional Compilation

A common use case is enabling debug output only in debug builds:

#ifdef DEBUG_MODE
    printf("x = %d\n", x);
#endif

To enable this at compile time:

gcc -DDEBUG_MODE main.c -o myprogram

The -D flag is equivalent to writing #define DEBUG_MODE at the top of the file.

Viewing Macro Expansions

The -E output lets you see exactly what the compiler receives after preprocessing, useful for debugging complex macros.

Phase 2: Compilation

The compiler proper takes the preprocessed source (.i file) and translates it into assembly language (.s file). This is where:

Syntax checking happens
Type checking occurs
Optimizations are applied (if -O flags are set)
The high-level C constructs are mapped to the instruction set of the target CPU

Stopping After Compilation

gcc -S main.c -o main.s

The -S flag tells GCC to stop after the compilation step, producing an assembly .s file.

Reading the Assembly

The assembly output contains human-readable (though dense) instructions:

main:
    pushq   %rbp
    movq    %rsp, %rbp
    movl    $5, -4(%rbp)
    movl    -4(%rbp), %eax
    popq    %rbp
    ret

Understanding this output is useful for performance analysis and confirming that your compiler optimizations are working as expected.

Phase 3: Assembly

The assembler (as) translates assembly language into machine code, binary instructions the CPU can execute, and packages the result into an object file (.o).

Object files use a standard format (ELF on Linux, Mach-O on macOS, COFF on Windows). They contain:

The compiled machine code for the translation unit
A symbol table listing all functions and variables defined or referenced
Relocation information (placeholders for addresses that will be filled in during linking)

Producing Object Files

gcc -c main.c -o main.o
# or, from assembly:
as main.s -o main.o

Inspecting Object Files

The objdump tool lets you inspect the contents of an object file:

objdump -d main.o      # disassemble
objdump -t main.o      # show symbol table
objdump -r main.o      # show relocation entries

Relocation entries are important, they mark the locations in the object file where the linker must fill in real addresses for external symbols (functions and variables defined in other translation units).

Phase 4: Linking

The linker (ld, usually invoked via gcc) takes one or more object files and combines them into a final executable. Its jobs include:

Symbol resolution: match each reference to an external symbol with its definition (e.g., a call to printf is matched with the definition in the C library)
Relocation: fill in the addresses that were left as placeholders during assembly
Library linking: incorporate code from static libraries (.a files) or record references to dynamic libraries (.so files) for runtime linking

Linking Multiple Object Files

gcc main.o utils.o -o myprogram

Linking with Libraries

gcc main.o -lm -o myprogram      # link with libm (math library)
gcc main.o -L. -lmylib -o prog   # link with a library in the current directory

Static vs Dynamic Linking

Static linking (-static): library code is copied into the executable. The binary is self-contained but larger.
Dynamic linking (default): the executable records which shared libraries it needs; the OS loads them at runtime. Multiple programs share the same library in memory.

Viewing Link Dependencies

ldd myprogram     # list dynamic library dependencies
nm myprogram      # list symbols (defined and undefined)

Putting It All Together

You can perform all four phases manually:

# 1. Preprocess
gcc -E main.c -o main.i

# 2. Compile to assembly
gcc -S main.i -o main.s

# 3. Assemble to object file
as main.s -o main.o

# 4. Link
gcc main.o -o myprogram

Or let GCC handle all phases in one command (the most common approach):

gcc main.c -o myprogram

Understanding these phases makes it much easier to troubleshoot build errors, apply targeted optimizations, and reason about binary size and dependencies.