Code:
[ Compiler Construction ]
First of all , what is a Comipler ??
Compiler is a computer program which converts your source code into an assembly code.
Source code is a code written in programming languages like C++ , C , Java , which are usually termed as High level Language.
Compiler takes your source code and checks it for errors and if not found any , converts it into an assembly code .
assembly code is a code written in a assembly language which is a low level language.
Construction
Basically it is a two pass Compiler,
Front End and back End.
Front End is composed of two things
Scanner
Parser
Back End is composed of many parts
few of them are
Code Optimization
Instruction Selection
Registry Allocation
Scanner is called as a lexical analyzer also known as tokenizer.
lexical analysis is the process of converting a sequence of characters into a sequence of tokens.
Token is composed of two things
<token type , value>
example
Consider this expression in C++ programming language.
sum=3+2;
sum = identifier
= Assignment operator
3 =Number
+ =Addition operator
2 =Number
; =End of statement
Token = <number , 3>
how it works ?
Scanner has some basic definations of the token type for the language we are working on.
firstly we deifne a regular language for the Scanner and then derive regular expressions from it.
then we convert it into a Non Deterministic finite automata then to Deterministic finite automata and then minimizes it.
Regular language can be
[0-9] means 0 to 9
[a-z]* * means in a loop b
a|b a or b
thse are regular expressions defined for our language .
we make a finite automata through this expression.
Lex is a tool which is used to generate this Scanner.
Coding in C++ for Scanner
%
{#include "tokendef.h"
}
%
D [0-9]
L [a-zA-Z_]
id {L} ({L} {D})*
%%
void {return(TOK_VOID); }
int {return (TOK_INT}; }
(D)+
while {return (TOK_WHILE);}
if {return (TOK_IF);}
else {return (TOK_ELSE);}
%%
#define void 1
#define int 2
#define wihle 3
#define if 4
#define else 5
.cpp file
#include
int main(){
FlexLexer lex;
int tc = lex. [method name which basically contains all the defination statement ]
while(tc !=0){
cout <<tc <<" , " <<lex.YYText() <<endl;
tc =lex.[same method name]
}
}
}
Parser
Parser is used to create an Intermediate Representation of this tokens.
one form of this IR is Abstract Syntax Tree [AST]
which is derived from Parse Tree.
Parser deals with the Syntax and Semantics.
it basically checks wheather your expression belongs to your programming language or not
lets take an example to further explain this
expression
fi(a=20;a<78;a++){
as per the syntax it should be "if" , instead of "fi"
also , there is no semicolon at the end.
this is what you call syntax.
Scanner will pass this as an identifier.
what is semantics?
Linguistic semantics is the study of meaning that is used to understand human expression through language.
Other forms of semantics include the semantics of programming languages.
lets take an example to explain this
suppose you initialize a variable "a"
and do not declare it,
our Comipler should throw a semantic error.
The whole concept of Parser is based on CFG
which is Context free Grammer.
It is a formal grammar in which every production rule is of the form
V --> w
where V is a single nonterminal symbol, and w is a string of nonterminals.
There are certain rules defined in CFG which helps us to make Parser.
There are variuos types of parsing technique
top down parsing
bottom up parsing
preductive parsing
recursive descend parsing
Backup Normal Form(BNF) notation was used to derive certain rules for making syntax for programming languages.
These rules are based on
[Context free Grammer]
the way of doing this is
take your expression and start deriving it
either form the left hand side or right hand side
example
goal --> expr
these expr is further derived .
lets take an example
expression x-2*y
lets do a leftmost derivation
goal ---> expr
expr--?> expr , op, term
expr --->expr , op , term
result --> (x-2)*y
when you do this same derivation by using right most derivation
result would be x-(2*y)
now in the case of programming language,
our expressions should be based on BODMAS rule.
for this we define levels.
Top up down parsing
in this method , for the construction of our parse tree, we follow our path from root to last leaf
let us take an example
same expression used before
x-2*y
we follow the construction from root .
this is our rule for defining grammer
goal -> expr
expr -> expr + term [here we are taking + just to try out our rule]
later we will see that we have not applied the correct rule , then we will change our operator.
how we will do this ? by backtrack
backtrack is a method which tells you to go back and change your rule.
now we will change our operator to - wherever it was +.
our last term in this rule is always factor
replace your actual value by factor.
this technique only works for right recursive grammer
it fails when left recursive grammer occurs.
eg for left recursive grammer
term --> term * factor
now we have to convert this left recursive grammer to right one
expr --> term expr'
expr' ----> + term expr'
expr'------> - term expr'
expr'----->e [epsolong sign]
this is done by using epsolong sign.
Preductive Parsing
this technique is known as Preductive Parsing because we predict for the next token.
it is based on the LL(k) property
L--> left to right scan
L--> left side parsing
k is the look ahead token.
very complex technique to use.
LL(k) property is true
when A->a amd A->b
then first(a) U first(b) = null;
this property was basically used in Recursive descend Parsing.
in this we have to make functions of each non terminal , i.e for expr(), term(), Eprime().
We use OOP language C++ , so that we can use objects and concept of Inheritance to make it simpler and easy.
first thing we have to do is make a SuperClass NonTerminal and then all the non terminals will be the sub class of this superclass.
class NonTerminal{
public:NonTerminal(Scanner* sc){
s= sc; tree =null;
}
virtual = ~NonTerminal();
virtual bool isPresent() =0;
TreeNode* AST(){return tree;}
protected:
scanner* s;
TreeNode* tree;
}
our Grammer rule
goal --->expr
expr---> term expr*
expr* --> + term expr'
| - term expr*
we have to make subclasses of each Non Terminal
class expr:public NonTerminal{
bool expr::isPresent(){
Term* op1 = new Term(s);
if(op1->ispresent())
return false;
tree = op1->AST();
Eprime* op2 =new Eprime(s,tree);
if(op2->ispresent())
tree = op2->AST();
return true;
Subclass for Eprime
bool Eprime::ispresent(){
int op = s->nextToken();
if(op==PLUS || op==MINUS){
s->advance();
Term* op2 = new Term(s);
if(!op2->isPresent()
{syntaxError(s);
TreeNode* t2 = op2->AST();
tree = new TreeNode(op, exprsofar, t2);
Eprime op3 = new Eprime(s, tree);
if(op3->isPresent())
tree = op3->AST();
return true;
}
else return false;
More to Come [] . .