Last modified on March 21, 2012, at 22:59

Compiler

This is an old revision of this page, as edited by RolandPlankton (Talk | contribs) at 22:59, March 21, 2012. It may differ significantly from current revision.

A compiler is a computer program which translates source code written in a high-level programming language into executable machine code. The act of doing this is called compilation.

There are compilers for the more common programming languages available on a wide variety of different computer architectures. This means that a program written in one of these languages is portable and can generally be used on any of the different machines just by using the appropriate compiler; note that there may be occasional problems with machine-dependencies, especially if there are substantial differences in the accuracy and size of numbers on the different machines.

Some compilers just perform a fast simple-minded translation from high-level source code to low-level machine language. Many compilers have one or more optional optimisation levels. Such optimisations generally make the resulting machine code 'better' (for some definition of 'better' such as 'faster' or 'smaller' or 'less power consumption'); higher speed may be important for frequently-used long-running programs; less memory usage may be important for embedded computers used within other devices; lower power consumption may be important for battery-powered computers (e.g. laptops). As the level/amount of optimisation increases the compiler tends to spend more and more time doing the optimisation. The word 'optimisation' is perhaps misleading - perfect optimisation is a very hard problem (NP-complete or even undecidable), so a conceptually better term would be 'improvement'.

Outline of the compilation process

In the freely available online book "Basics of Compiler Design"[1], Professor Torben Mogensen identifies seven phases within a compiler, with the various phases being performed in one or more passes over the program. A diagram illustrating these (and other optional optimisation) phases is shown on the last page of Professor Frank Pfenning's "Lecture Notes on Compiler Design: Overview"[2]

Lexical analysis: convert the source code into a sequence of tokens, such as variable names, keywords, numbers, and special symbols such as '+'.
Syntax analysis or parsing: arrange the tokens into a parse tree or syntax tree which represents the structure of the program.
Semantic analysis: annotate the parse tree with semantic actions, and perform various consistency checks; a symbol table is used to keep track of names declared by the programmer.
Intermediate code generation: use the annotated parse tree to generate code in some simple machine-independent intermediate language.
Register allocation: map the symbolic names used in the intermediate code on to the registers available in the target machine.
Assembly code generation: translate the intermediate code into assembly language for the target machine.
Assembly and linking: translate the assembly language into executable machine code.

References

  1. Basics of Compiler Design. Retrieved on 2012-03-19.
  2. Lecture Notes on Compiler Design: Overview (pdf). Retrieved on 2012-03-19.