The steps of compilation



In this article we are going to introduce the complete compilation process in detail, from the moment we write a file in source code, until we obtain an executable binary, in C language.


What is compilation in C language?

The compilation process is to convert one or more source code files into executable binary code for a specific hardware / software architecture.


This process involves several stages, which we are going to study below. but first, let's define "Source Code" and "Binary Code":


Source code is the program that we as programmers write, the plain text that "tells" the computer how to do things.
On the other hand, executable binary code, in general for any compiled language, and in particular for C language, is binary code (not text), which in turn can be executed on the computer. I clarify this, because one of the intermediate products of the compilation process is the object code, which although it is binary, cannot be executed and must continue its compilation process to the next stage, the link, or link.



A simple example

Suppose we have the following source code ... the classic "Hello World":

/*
* File: holamundo.c
* Mi primer "Hola Mundo" en Lenguaje C
* juncotic.com
*/

#include<stdio.h>

int main(int argc, const char *argv[]){
    printf("Hola mundo\n");
    return 0;
}
A simple compilation would be, on GNU / Linux systems, the following:
gcc helloworld.c -o helloworld
That will generate a binary file called helloworld, and whose description will be similar to the following

diego@cryptos:/tmp$ file holamundo
holamundo: ELF 64-bit LSB pie executable x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.2.0, BuildID[sha1]=141789414df18ece4fa044cf91f5e776bf75f959, not stripped
diego@cryptos:/tmp$
This means that the helloworld output file (output, hence the "-o") is of type ELF, the GNU / Linux executable format (similar to the windows exe).
We can run it without problems:

diego@cryptos:/tmp$ ./holamundo
Hola mundo

Compiling step by step

Let's now analyze the compilation process step by step, what a C language compiler does internally.

Preprocessing

The first thing the compiler does is preprocess the source file, that is, interpret all the pre-processing directives that we have used, such as #define, #include, #ifdef, etc ... and also, it will eliminate all the comments that we have written in the file.
In the particular case of our helloworld, it will include the stdio.h file (standard input / output header), and it will remove the comments.
Let's preprocess our example:
gcc -E helloworld.c -o helloworld.i
The "-E" modifier allows you to specify to the compiler (gcc) that it only preprocess, and that the output be written to the helloworld.i file. The .i extension is generally used for pre-processed files.
Now, helloworld.i is still source code, but if we see its content we will find something similar to this:

[....]
extern int pclose (FILE *__stream);

extern char *ctermid (char *__s) __attribute__ ((__nothrow__ , __leaf__));
# 840 "/usr/include/stdio.h" 3 4
extern void flockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));

extern int ftrylockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__)) ;

extern void funlockfile (FILE *__stream) __attribute__ ((__nothrow__ , __leaf__));
# 868 "/usr/include/stdio.h" 3 4

# 8 "holamundo.c" 2

# 9 "holamundo.c"
int main(int argc, const char *argv[]){

printf("Hola mundo\n");

return 0;
}

At the top we have more text, product of evaluating stdio.h, since all this is generated by interpreting the #include <stdio.h> directive. If we had more than one #include here we would see a combination of many lines of code.
At the end of this file is the code known to us, our "Hello World", of course, no comment.

Compilation

The next step is to compile our code. the result of the compilation is a non-executable binary code, called object code, whose characteristic extension is a ".o" file.
Let's compile:
gcc -c helloworld.i -o helloworld.o
And if we see the file type, this will be an ELF binary file, but not executable, as the previous one was.

diego@cryptos:/tmp$ file holamundo.o
holamundo.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped
diego@cryptos:/tmp$

Link

The next step to make this object code executable is to link, or "link" the object with the system libraries, the libraries that it uses.
In this case, all the functions included in the stdio.h header belong to the standard C library, so we only have to extract from it the functions that we want to link to our object, and then link it.
We can create our library file in the following way:

ar -cvr libholamundo.a holamundo.o

The file libholamundo.a will contain the necessary functions that we must link with our helloworld.o in order to create an executable.
Now, it will only be enough to link our object with said library file. By the way, it is a ".a" file, which comes from the English "Archive", and it is a static link library, as opposed to dynamic link libraries, which on GNU / Linux systems are called ".so" from " Shared Object «, and they are the equivalent of Windows« .DLL ».
gcc -Wall helloworld.o -L / tmp / -lholamundo -o helloworld
Here we have linked the helloworld.o file with the libholamundo.a library and we have generated the helloworld file. · The modifier "-L" indicates the path where the compiler should look for the libraries, while the "-l" indicates the particular library that we want to link to the object, since we can have several.
If we now execute "helloworld file" we will see an output similar to the first one, an executable ELF


If you are interested in this article, you can comment, and if you wish, contribute more information.

-- This is a job for Holberton School by Marco Sózaro. --


Comentarios

Entradas populares de este blog

What happens when you type "https.://www.google.com" in your browser and press "Enter"