Last modified on April 1, 2012, at 16:51

Compiler

This is an old revision of this page, as edited by RolandPlankton (Talk | contribs) at 16:51, April 1, 2012. It may differ significantly from current revision.

A compiler is a computer program which translates source code written in a high-level programming language into executable machine code. The act of doing this is called compilation.

There are compilers for the more common programming languages available on a wide variety of different computer architectures. This means that a program written in one of these languages may be portable and can often be used on any of the different machines just by using the appropriate compiler; note that there may be occasional problems with machine-dependencies, especially if there are substantial differences in the accuracy and size of numbers on the different machines.

Some compilers just perform a fast simple-minded translation from high-level source code to low-level machine language, while other compilers may optionally take (much) longer to perform the compilation by trying to improve or optimise the resulting machine code.

Many programs, when first written, contain errors, so it is important that compilers detect these errors and provide a clear report as to what the error is and where it was detected (unfortunately the point of detection can be in some cases be many lines away from the actual source of the error).

History

The first use of the term 'compiler', in relation to program translation, is credited to Grace Hopper,[1] based on the legal meaning of compilation, since her translator for the A-0 language merely put together various snippets of pre-built code.

The first translator which matched the current computer definition of a compiler, was the Fortran compiler for the IBM 704 computer. The Fortran manual was published in October 1956, and the compiler itself was made available to users in April 1957.[2]

Computer users nowadays may not realise the severe technical constraints which applied to programming in the mid 1950's. Modern personal computers have the entire central processing unit on a single chip (integrated circuit), have main memories (RAM) containing many hundred of millions of bytes, and have processing speeds of several million instructions per second. The IBM 704 computer (fairly advanced for its time) had a central processor made up of many thousands of electronic valves (each the size of a light bulb), had a main memory of only four thousand 36-bit words, and had a processing speed of perhaps forty thousand instructions per second (modern mobile phones are more powerful than this).

Most compilers in the 1950's and 1960's were written in low-level assembly languages; there wasn't much choice at that time since the computer hardware wasn't capable of supporting compilers written in high-level languages. The use of high-level languages for writing compilers became more mainstream with the advent of Pascal in 1970[3] and C between 1969 and 1973.[4]

Some compilers are written in the language which they translate (self-hosting), though some sort of bootstrapping process is needed to get the first version of such a compiler working. The first self-hosting compiler was for Lisp in 1962.[5] Both the Pascal and C compilers mentioned above are self-hosting.

Outline of the compilation process

In the freely available online book "Basics of Compiler Design"[6], Professor Torben Mogensen identifies seven phases within a compiler, with the various phases being performed in one or more passes over the program. A diagram illustrating these (and other optional optimisation) phases is shown on the last page of Professor Frank Pfenning's "Lecture Notes on Compiler Design: Overview"[7]

Lexical analysis: convert the source code into a sequence of tokens, such as variable names, keywords, numbers, and special symbols such as '+'.
Syntax analysis or parsing: arrange the tokens into a parse tree or syntax tree which represents the structure of the program; check that the structure is valid.
Semantic analysis: annotate the parse tree with semantic actions, and perform various consistency checks; a symbol table is used to keep track of names declared by the programmer.
Intermediate code generation: use the annotated parse tree to generate code in some simple machine-independent intermediate language.
Register allocation: map the symbolic names used in the intermediate code on to the registers available in the target machine.
Assembly code generation: translate the intermediate code into assembly language for the target machine.
Assembly and linking: translate the assembly language into executable machine code.

Optimisation

Many compilers have one or more optional optimisation levels. As the level/amount of optimisation increases the compiler tends to spend more and more time doing the optimisation. The word 'optimisation' is perhaps misleading - perfect optimisation is a very hard problem (NP-complete or even undecidable), so a conceptually better term would be 'improvement'.

Such optimisations generally make the resulting machine code 'better', for some definition of 'better':

  • faster: higher speed may be important for frequently-used long-running programs.
  • smaller: less memory usage may be important for embedded computers used within other devices.
  • low power: less power consumption may be important for battery-powered computers (e.g. laptops).

Note that it is often only possible to optimise for a single criterion, since such optimisation may well make other criteria worse; it has been accepted for decades that many programs have a space/time trade-off so that reducing memory usage will increase execution time and vice versa.

The bulk of the machine-independent optimisation takes place after the intermediate code is generated, effectively by re-arranging and modifying this code. Some machine-dependent optimisation may take place during register allocation. The bulk of the machine-dependent optimisation takes place during the generation of assembly language; this often includes extensive peep-hole optimisation to replace a group of assembler instructions by some shorter or simpler group.

References

  1. Harold "Bud" Lawson and Howard Bromberg. The World's First COBOL Compilers. Retrieved on 2012-03-24.
  2. J A N Lee. Twenty five years of Fortran (pdf). Retrieved on 2012-03-27.
  3. Pascal-P: The portable Pascal compiler. Retrieved on 2012-03-31.
  4. Dennis M. Ritchie. The Development of the C Language. Retrieved on 2012-03-31.
  5. Tim Hart and Mike Levin. AI Memo 39-The new compiler. Retrieved on 2012-03-29.
  6. Basics of Compiler Design. Retrieved on 2012-03-19.
  7. Lecture Notes on Compiler Design: Overview (pdf). Retrieved on 2012-03-19.