A compiler is a computer program which translates source code written in a high-level programming language into executable machine code. The act of doing this is called compilation.
There are compilers for the more common programming languages available on a wide variety of different computer architectures. This means that a program written in one of these languages may be portable and can often be used on any of the different machines just by using the appropriate compiler; note that there may be occasional problems with machine-dependencies, especially if there are substantial differences in the accuracy and size of numbers on the different machines.
Some compilers just perform a fast simple-minded translation from high-level source code to low-level machine language, while other compilers may optionally take (much) longer to perform the compilation by trying to improve or optimise the resulting machine code.
History
The first use of the term 'compiler', in relation to program translation, is credited to Grace Hopper[1], based on the legal meaning of compilation, since her translator for the A-0 language merely put together various snippets of pre-built code.
The first translator which matched the current computer definition of a compiler, was the Fortran compiler for the IBM 704 computer. The Fortran manual was published in October 1956, and the compiler itself was made available to users in April 1957.[2]
Computer users nowadays may not realise the severe technical constraints which applied to programming in the mid 1950's. Modern personal computers have the entire central processing unit on a single chip (integrated circuit), have main memories (RAM) containing many hundred of millions of bytes, and have processing speeds of several million instructions per second. The IBM 704 computer (fairly advanced for its time) had a central processor made up of many thousands of electronic valves (each the size of a light bulb), had a main memory of only four thousand 36-bit words, and had a processing speed of perhaps forty thousand instructions per second.
Outline of the compilation process
In the freely available online book "Basics of Compiler Design"[3], Professor Torben Mogensen identifies seven phases within a compiler, with the various phases being performed in one or more passes over the program. A diagram illustrating these (and other optional optimisation) phases is shown on the last page of Professor Frank Pfenning's "Lecture Notes on Compiler Design: Overview"[4]
- Lexical analysis: convert the source code into a sequence of tokens, such as variable names, keywords, numbers, and special symbols such as '+'.
- Syntax analysis or parsing: arrange the tokens into a parse tree or syntax tree which represents the structure of the program.
- Semantic analysis: annotate the parse tree with semantic actions, and perform various consistency checks; a symbol table is used to keep track of names declared by the programmer.
- Intermediate code generation: use the annotated parse tree to generate code in some simple machine-independent intermediate language.
- Register allocation: map the symbolic names used in the intermediate code on to the registers available in the target machine.
- Assembly code generation: translate the intermediate code into assembly language for the target machine.
- Assembly and linking: translate the assembly language into executable machine code.
Optimisation
Many compilers have one or more optional optimisation levels. As the level/amount of optimisation increases the compiler tends to spend more and more time doing the optimisation. The word 'optimisation' is perhaps misleading - perfect optimisation is a very hard problem (NP-complete or even undecidable), so a conceptually better term would be 'improvement'.
Such optimisations generally make the resulting machine code 'better', for some definition of 'better':
- faster: higher speed may be important for frequently-used long-running programs.
- smaller: less memory usage may be important for embedded computers used within other devices.
- low power: less power consumption may be important for battery-powered computers (e.g. laptops).
Note that it is often only possible to optimise for a single criterion, since such optimisation may well make other criteria worse; it has been accepted for decades that many programs have a space/time trade-off so that reducing memory usage will increase execution time and vice versa.
The bulk of the machine-independent optimisation takes place after the intermediate code is generated, effectively by re-arranging and modifying this code. Some machine-dependent optimisation may take place during register allocation. The bulk of the machine-dependent optimisation takes place during the generation of assembly language; this often includes extensive peep-hole optimisation to replace a group of assembler instructions by some shorter or simpler group.
References
- ↑ The World's First COBOL Compilers. Retrieved on 2012-03-24.
- ↑ Twenty five years of Fortran (pdf). Retrieved on 2012-03-27.
- ↑ Basics of Compiler Design. Retrieved on 2012-03-19.
- ↑ Lecture Notes on Compiler Design: Overview (pdf). Retrieved on 2012-03-19.