What happens when you type gcc main.c
GCC stands for “GNU Compiler Collection”. GCC is an integrated distribution of compilers for several major programming languages. These languages currently include C, C++, Objective-C, Objective-C++, Java, Fortran, and Ada.
We will see each of the steps when we use the gcc command to perform each of the phases that a source code undergoes until its execution.
1 Preprocessing
Preprocessing is the first stage of C Build process in which all the preprocessor directives are evaluated.
- The input file for this stage is *.c file.
- The output file is *.i or preprocessed file.
- The preprocessor strips out comments from the input c file. Evaluate preprocessor directive by making substitution for lines started with #, and then produces a pure C code without any preprocessor directives.
- Note that if a bug/error happened in the preprocessor stage you normally won’t know its place as the output of the preprocessor goes directly into compiler, the error will be likely at the lines you used the preprocessor directive.
2 Compiler
In this stage the C code gets converted into architecture specific assembly by the compiler; this conversion is not a one to one mapping of lines but instead a decomposition of C operations into numerous assembly operations. Each operation itself is a very basic task.
- The input file for this stage is *.i file.
- The output file is *.s or *.asm file.
Types of semantic errors:
- Undeclared variable that is being used without declaration.
- Unavailable variables in a given scope, although they are declared.
- Incompatible types, for example, if you looked up a variable name and that variable name happens to be a character, then the usage of this particular name should not be part of the addition statement, for example. Because it’s meaningless to add a character to something else.
3. Assembler:
In this stage the assembly code that is generated by the compiler gets converted into object code by the assembler.
- The input file for this stage is *.asm file.
- The output file is *.o or *.obj file.
- Note that compilers nowadays can generate an object code without the need of an independent assembler.
- The output of this stage is an object file that contains opcodes and data sections.
After the code generation is finished the compiler allocates memory for code and data in sections; each section has different information and is defined by name or attributes of information stored in them.
4. Linker:
In this stage the different object files that are generated by the assembler gets converted into one relocatable file by the linker.
- The input file for this stage is *.o file, and c standard libraries .
- The output file is relocatable file.
- While combining the object files together, the linker performs the following operations:
- Symbol resolution.
- Relocation
1 Symbol resolution:
In multi-file program, if there are any references to labels defined in another file, the assembler marks these references as “unresolved”. When these files are passed to the linker, the linker determines the values of these references from other object files, and patches the code with the correct values. And if the linker didn’t find any references to these labels in any object file, it will throw and a linking error “unresolved reverence to variable”.
If the linker finds same symbol defined in two object files, it will report a “redefinition” error.
2 Relocation:
Relocation is the process of changing addresses already assigned to labels. This will also involve patching up all references to reflect the newly assigned address.
Primarily, relocation is performed for the following two reasons: Section Merging, and section placement.
5. Locator:
In this stage the process of assigning physical addresses to the relocatable file that is produced from the linker is performed using a locator.
- The input file for this stage relocatable file, and Linker script file.
- The output file is executable file.
- A locator is a tool that performs the conversion from relocatable program to executable binary image.
- The linker script file provides the locator with the required information about the actual memory layout and then the locator performs the conversion to produce a single executable binary file.
- Note that the locator can be found as a separate tool or with the linker.