SUPER: A Generator of Efficient Compilers

Anatoly Bezdushny and Vladimir Serebriakov

Computing Centre of the Academy of Sciences,

Vavilova 40, Moscow, 177333, USSR.


Abstract. This paper presents the principle features of the SUPER system: namely attributed context-free grammars (cfg) in extended BNF, subgrammars, and attributed ambiguous tree grammars for code generation. A subgrammar is a cfg that has a set of start symbols, and a set of terminals that are either nonterminals of other subgrammars, or proper terminals. A compiler description consists of a set of attributed subgrammar descriptions that have a definition of the interface that comprises a definition of exported/imported objects and a description of the attributed cfg, and/or procedures and functions used in the attribute rules. Code selection is performed by means of ambiguous tree grammars. The target instructions are described by attributed productions of a tree grammar. Productions are supplied with cost-attribute rules to compute cost values which allow the code generator to select a minimal-cost derivation tree according to specified criteria. If a derivation tree is selected then the derivation-tree attributes are evaluated, and the corresponding instructions are emitted.

1. Introduction

SUPER (which is Russian abbreviation for Syntax Directed Translation) is a compiler generating system developed in the Computing Centre of the Academy of Sciences in Moscow. The previous versions of the system have been implemented on several computers:PDP-11, Intel-8086, MC-68020, and VAX-11. Some different compilers have been produced with the system, e.g. Modula-2 compilers for these computers.

SUPER is intended to provide support for all phases of the compiler implementation process. System development is guided by the following aims:

- the system should achieve the level of production quality software product;

- it should generate efficient compilers;

- the system should be usable in practical compiler implementation projects;

- it should be portable, modular and simple to use.

The system is a programming environment for the attribute based programming language. Each tool of the environment generates a set of host language modules. The generated modules, invariant modules, and user-defined modules form the set of components for the implementation of a compiler. The compiler is collected from these modules in accordance with a user-defined structure of a compiler.

This paper is devoted to the principle features of the system input language that supports attributed context-free grammars in extended BNF, subgrammars, and ambiguous tree grammars for code generation. A subgrammar is a cfg that has a set of start symbols, and a set of terminals that are either nonterminals of other subgrammars, or proper terminals. An attributed subgrammar is an subgrammar augmented with attributes and attribute rules.

A compiler description consists of a set of attributed subgrammar descriptions that have two parts:

- a definition of the interface that comprises a definition of its start symbols, terminals and imported or exported constants, types, files and functions;

- a description of the attributed productions in extended BNF and/or procedures and functions used in the attribute rules.

Code selection is described by means of ambiguous tree grammars. The target instructions are described by attributed productions of a tree grammar. Productions are supplied with cost-attribute rules to compute cost values which allow the code generator to select a minimal-cost derivation tree according to specified criteria. If a derivation tree is selected then the derivation-tree attributes are evaluated, and the corresponding instructions are emitted.

The paper is organized as follows. Section 2 is devoted to the principle features of the system input language. In section 3 we consider code selection descibed by means of ambiguous tree grammars.

1.1. Notation

We shall use extended BNF in the description of the system input language. Brackets "[" and "]" are used to mark an optional construct. Parentheses "(" and ")" stand for a sequence of instances of a repetitive construct. A repetitive construct can be followed by a separating construct that separates elements of the sequence. The repetitive and separating constructs are disjointed by slash "/". Alternative constructs are separated by vertical line "|" and surrounded by angular brackets "<" and ">". Nonterminal symbols and tokens classes (terminal symbols that have an infinite set of literal representations) are denoted by identifiers. Separators (terminals with a single literal representation) are strings that are enclosed in apostrophes.

2. The system input language

The description of a language to be implementated must be written in the system input language. This description consists of a set of attributed subgrammar descriptions. Informally, an attributed subgrammar is a separate compiled component of the full attribute grammar. An attribute grammar can be divided into attributed subgrammars on the basis of syntactic as well as semantic properties of it.

2.1. Attributed subgrammars

A Context-Free SubGrammar (CFSG) is a four-tuple SG=(Vn,Vt,P,Vs). The set Vn of nonterminal symbols consists of a set Vo of symbols declared in the subgrammar and a set Vi of symbols imported from other subgrammars. The set Vt of terminal symbols contains a set Vnt of proper terminals and a set Vpt of pseudo-terminals of the subgrammar. Vs is a set of the start symbols of the subgrammar. It is the subset of Vo. P is a set of productions that have the form X0::=X1 X2 ... Xm, where m>=0, X0 in Vo and Xj in Vn+Vt for 0<=j<=m. The right-hand side of the production is called the alternative of X0.

The start symbols and pseudo-terminals of the subgrammar can be imported by other subgrammars and appear in alternatives of their nonterminals. In one of the subgrammars that forms a grammar and imports a pseudo-terminal, we can either define its alternatives or redefine it as a proper terminal. In the first case the pseudo-terminal is the nonterminal of the grammar.

An attributed subgrammar is a subgrammar augmented with attributes and attribute rules. An attributed subgrammar is used to specify properties of a corresponding language component. The subgrammar description consists of constant declarations, type definitions, procedure/function definitions, terminal and nonterminal symbols definitions, the sequence of alternatives of its nonterminals associated with attribute rules.

In the SUPER system there are three compilation units: a start subgrammar, an interface of a subgrammar and a subgrammar declaration.

The start subgrammar has to contain definitions of the grammar start symbol and its alternatives. The start subgrammar has no exported objects (symbols, constants, types, procedures). It imports objects from other subgrammars that consist of the pairs of compilation units: the interface of a subgrammar and the subgrammar declaration.

The interface of a subgrammar contains definitions of objects that are exported from it. All declared objects are exported. All start symbols and pseudo-terminals of a subgrammar have to be declared in its interface unit. The pseudo-terminals have to be enumerated in the header of the interface unit also. The subgrammar interface unit has the following form:

SubgrammarInterface ::= 'INTERFACE' Class 'SUBGRAMMAR' SubgrammarName
[ '(' PseudoTerminalsEnumeration ')' ] ';' [( Import ';' )]
[( < SymbolsDefinition | ObjectsDefinition > )]
'END' SubgrammarName '.'
Import ::= [ 'FROM' UnitName ] 'IMPORT' ImportedObjects
Class ::= [ AttributeGrammarClass ] [ SyntaxGrammarClass ]
AttributeGrammarClass ::= < 'L_ATTRIBUTED' | 'ONE_VISIT' | 'ORDERED' >
SyntaxGrammarClass ::= < 'LL1' | 'TREE' >

The subgrammar declaration unit contains definitions of alternatives of subgrammar start symbols and nonterminals declared in the subgrammar declaration unit. The definition of alternatives of a nonterminal specifies nonterminal context-free properties and associates attribute rules with productions for it. The subgrammar declaration unit and the start subgrammar unit are described as follows:

StartSubgrammar ::= Class 'GRAMMAR' SubGrammar
SubgrammarDefinition ::= 'SUBGRAMMAR' SubGrammar
SubGrammar ::= SubgrammarName ';' [( Import ';' )]
[( <SymbolsDefinition |ObjectsDefinition |AlternativesDefinition> )]
[ 'INITIALIZE' ( [ Operator ] / ';') ]
'END' SubgrammarName '.'

Example 2.1. Subgrammar units:

File StatementG.spd contains:

INTERFACE L_ATTRIBUTED LL1 SUBGRAMMAR StatementG (NonStandardST);
SYMBOL Statement, NonStandardST::.
DEFINE => VAR CheckNonStdST :PROC; =<
END StatementG.

File StatementG.spr contains:

SUBGRAMMAR StatementG;
FROM ErrLib IMPORT ErrNum;
FROM ExpressionG IMPORT Expression;
SYMBOL StatList, BoolExpr ::.

FOR BoolExpr ::. ::= Expression =>
_( IF $Expression.type<>TBOOLEAN THEN ErrNum(45)
END; _); =<
FOR StatList ::. ::= ( [Statement] / ';' ) => =<
FOR Statement
::= 'WHILE' BoolExpr 'DO' StatList
'END' => =<
::= 'REPEAT' StatList 'DO' BoolExpr => =<
. . .
::= NonStandardST => CheckNonStdST; =<
END StatementG.

File Example1.spr contains:

L_ATTRIBUTED LL1 GRAMMAR Example1;
FROM SYSTEM IMPORT WORD;
FROM SUPERLIB IMPORT NULPROC;
FROM IdHashing IMPORT HashId, FoundId, identifier;
FROM DeclarationG IMPORT PTDESCR, Olabel;
FROM ExpressionG IMPORT Expression;
FROM StatementG IMPORT Statement, NonStandardST,
CheckNonStdST;
DEFINE => PROCEDURE CreateLabel . . . =<
FOR Program (AXIOM) ::.
::= . . .
FOR NonStandardST ::= 'GOTO' identifier =>
<IdentCode:WORD;>
$'GOTO':
$identifier:
_( IF FoundId( $NonStandardST.IdentCode ) THEN
IF PTDESCR( $NonStandardST.IdentCode )^.mode Olabel
THEN ErrNum(78) END
ELSE
CreateLabel( $NonStandardST.IdentCode );
HashId ( $NonStandardST.IdentCode );
END; _);
=< . . .
INITIALIZE
CheckNonStdST :=NULPROC;
END Example1.

File Example2.spr contains:

L_ATTRIBUTED LL1 GRAMMAR Example2;
FROM ExpressionG IMPORT Expression;
FROM StatementG IMPORT Statement, NonStandardST,
CheckNonStdST;
SYMBOL NonStandardST(VOID).
FOR Program (AXIOM) ::.
::= . . .
END Example2.

2.2. Symbol definition

We need a type concept to define types suitable for the abstraction of the attributes' meaning. The meta-language is a typed language. The domain of each attribute is defined by its type. An objects definition allows to define types, constants, and procedures. All symbols of an attribute grammar, and their attributes must be declared by a symbol definition. A symbol definition introduces a symbol name, its attributes, and their types. It associates attributes of the given types with the corresponding symbols.

SymbolsDefinition ::=
'SYMBOL' [( (SymbolId [SymbolMode]/',') [SymbolProperties] '.')]
SymbolProperties ::=
< AttributesDefinition | TerminalReperesentations >
AttributesDefinition::=
'::' [( (AttributeId[AttributeMode] / ',') ':' Type / ';')]
TerminalReperesentations ::= '::' ( string / ',' )
SymbolMode ::=
'('(<'AXIOM'|'TOKEN'|'GLOBAL'|'VOID'|OperatorArity>/',')')'
OperatorArity ::= 'ARITY' number

It is possible to specify a symbol mode in a symbol definition. The symbol mode TOKEN indicates that a terminal symbol is a tokens class. Lexical items, called tokens, are divided into two groups: separators and tokens classes. A separator has a single literal representation, e.g. ':=', 'TYPE'. A tokens class represents a infinite set of literal strings, e.g. identifiers, numbers. In an attribute grammar each tokens class is described by its name, names and types of its attributes. Values of tokens class attributes are defined outside an attribute grammar.

If a context-free subgrammar is a tree subgrammar then each terminal symbol is an operator with fixed arity. The arity of a terminal is specified in its definition. Nonterminal symbols are 0-arity operators.

If a nonterminal does not have the symbol mode GLOBAL then its attributes have to be used locally (local attributes), otherwise remote access to the attributes is allowed (global or remote attributes). Informally, if the remote access to attributes is allowed, instances of the attributes can be accessed inside of a subtree of the nonterminal.

Given a derivation tree, nodes of the tree are labeled with symbols, and instances of attributes are attached to the nodes. Let the attributes of a nonterminal X be remote attributes, node N is labeled with X and the direct ancestor of N is node M, X ::= Y1 Y2 ...Yn and Z0 ::= Z1 ... X ...Zk are productions applied at nodes N and M respectively. Instances of remote attributes have visibility scopes that are described as follows:

- if a subtree of N does not have another node labeled with X then the visibility scope of the attribute instances of N is the subtree of M;

- if there are other nodes N1,N2,...,Ni labeled with X in a subtree of N then the visibility scope of the attribute instances of N is the subtree of M with the exception of subtrees of direct descendants of N1,N2,...,Ni.

The grammar start symbol is a nonterminal symbol with the mode AXIOM. The grammar start symbol does not appear on alternatives of any nonterminal. The attributes of the grammar start symbol are global attributes. Therefore their attribute instances can be accessed inside of any derivation tree.

The symbol mode VOID is used to indicate a void symbol. A production is considered as undeclared if the void symbol appears in it. The system ignores void symbols and void productions. This possibility can be used to refuse a pseudo-terminal. See example 2.1.

A symbol definition without a SymbolProperties part allows to add a symbol mode to already defined or imported symbol. For example, we can redefine pseudo-terminal as proper terminal. The symbol mode GLOBAL cannot be used in any symbol redefinition.

2.3. Definition of alternatives

A subgrammar description unit contains a set of definitions of alternatives for subgrammar start symbols and nonterminals declared in the subgrammar declaration unit. The definition of nonterminal alternatives associates attribute rules with productions for the nonterminal. The productions can be described in extended BNF. Brackets are used to mark an optional construct. Parentheses denote a sequence of instances of a repetitive construct. Alternative constructs are surrounded by angular brackets.

AlternativesDefinition ::=
'FOR' NonterminalId [ [SymbolMode] [AttributesDefinition] '.' ]
[ 'CRITERIA' AttributesDefinition '=>' ( AttributeRule ';' ) '=<']
( Alternative )
Alternative ::=
Production '=>' [ AttributeRulePart ] '=<'
[ 'COST' '=>' ( AttributeRule ';' ) '=<']
Production ::=
['RULE' RuleLabel] [NonterminalId] '::=' [( Item )]
Item ::=
< SYMBOL | '[' (Item) ']'
| '<' ( [(Item)] / '|' ) '>'
| '(' (Item) [ '/' (Item) ] ')' >
SYMBOL ::= < string | SymbolId >

A set of attribute rules is associated with each nonterminal alternative. Attribute rules specify how to compute the values of attribute occurrences for symbols appearing in the corresponding nonterminal alternative.

AttributeRulePart ::=
[RuleAttributesDefinition] [( AttributeRules )]
RuleAttributesDefinition ::=
'<' ( (AttributeId/',') ':' Type ';' ) '>'

An attribute name identifies an attribute of a symbol appearing in the corresponding production.

AttributeName ::= SymbolName '.' AttributeId
Symbol_name ::= ['$'] '$' SymbolId [Qualifier]
Qualifier ::= < '^' | '[' number ']' >

A symbol name either represents a symbol occurring in the corresponding production or stands for the explicit upward access to remote attribute instances. If the same symbol appears several times in the production then these occurrences are distinguished by indexing the symbol name: X[1] denotes the first occurrence of the symbol X in the alternative, X[i] - the i-th occurrence of X, and X[0] stands for the left-hand side nonterminal. If the symbol does not appear in the associated production, or if the symbol name is qualified by '^', then the symbol name is used to refer to a remote attribute.

If the production right-hand side is divided into optional, repetitive and alternative constructs by brackets ( "[]", "(/)" and "<|>" respectively) then the corresponding attribute rules part has to be divided into groups of attribute rules by labels of the form "$" bracket ":". The order of label enumeration has to be the same as the order of the brackets enumeration. The attribute rules of groups are associated with constructs surrounded by the corresponding brackets.

AttributeRules ::=
[(< AttributeRule ';' | '$' SYMBOL ':' |
'$<:' ( AttributeRules / '$|:' ) '$>:' |
'$[:' AttributeRules [ '$|:' AttributeRules ] '$]:' |
'$(:' AttributeRules [ '$/:' AttributeRules ] '$):'
>)]

The attribute rules associated with alternative constructs are enclosed by labels "$<:", "$>:". The label "$|:" separates the attribute rules concerning various alternative constructs. The label "$[:", "$]:" points to the attribute rules associated with an optional construct. The attribute rules which have to be evaluated if the construct is not used, are written after "$|:". An optional construct:

Z ::= u [w] v => Fu $[: Fw $|: Fe $]: Fv =<

is transformed to the alternative construct:

Z ::= u <w|> v => Fu $<: Fw $|: Fe $>: Fv =<

The attribute rules surrounded by "$(:", "$):" connect attribute instances of the repetitive construct.

Let "{" and "}" stand for one of the brackets pairs, and "${:" and "$}:" the corresponding labels. If an attribute-rules part

Fu ${: Fw $}: Fv

is associated with a production

Z ::= u {w} v

then the set Fw of attribute rules is associated with the construct {w}, and the sets Fu, Fv of attribute rules with the rest of the production. Constructs may contain other constructs. Therefore, constructs have to be processed repeatedly until no construct is applicable any more.

The construct {w} is considered a special form, where one or more productions for an implicit nonterminal X to be placed instead of {w}. The modified attribute rules Fw and some additional attribute transfer rules are associated with these productions.

An attribute d of a symbol A of a repetitive construct (w) is denoted by $A.d and $$A.d. Let us consider a sequence of instances of a repetitive construct in left-to-right manner.

In attribute rules associated with the repetitive construct, the $A.d represents instances of attribute d attached to the current element of a sequence of the repetitive construct w (w-sequence) and the $$A.d specifies attribute instances attached to the next element of a w-sequence. The information flow specified by attribute rules $A.a:=F($$B.b,...) and $$A.a:=F($B.b,...) goes strictly right to left and left to right respectively. All denotations of attributes that are used as parameters of the function F must be the same.

In attribute rules associated with an alternative construct allocated into repetitive construct w, the $A.d stands for the attribute instance attached to the first element of the w-sequence and the $$A.d the attribute instance attached to the last element of the w-sequence.

The meaning of these denotations depends on the position of the attribute rule in the attribute - rules part. For example, assume a construct w does not contain repetitive constructs, a nonterminal A is in w, and an alternative of nonterminal Z is defined as follows

Z ::= u1 ( u2 < u3 ( u4 < w > v4 ) v3 > v2 ) v1 =>
Fu1 $(:Fu2 $<:Fu3 $(:Fu4 $<:Fw $>:Fv4 $):Fv3 $>:Fv2 $):Fv1 =<

then

in rules $A.d stands for $$A.d stands for

the attribute d of A of the attribute d of A of

Fw,Fu4,Fv4 the current element of the next element of

the current w-sequence the current w-sequence

Fu3,Fv3 the first element the last element

the current w-sequence the current w-sequence

Fu2,Fv2 the last element the first element

the current w-sequence the next w-sequence

Fu1,Fv1 the first element the last element

the first w-sequence the last w-sequence

The system transforms productions in extended BNF to productions in BNF.

So a production

Z ::= u {w} v => Fu ${: Fw $}: Fv =<

becomes

Z ::=u X v

and additional productions are entered

Z ::= u X v
X ::= w1 | w2 | ... | wk for <w1|w2|...|wk>"
X ::= w Y for "(w)" "(w/t)"
Y ::= w Y | for "(w)"
Y ::= t w Y | for "(w/t)"

In accordance with production transformations, attribute rules Fu,Fw,Fv are also transformed. Attributes are associated with the nonterminals X and Y. Attribute transfer rules are added to Fu,Fw,Fv.

The transformation process goes from a surrounded construct to the surrounding. The construct {w} of alternative u{w}v is replaced by the nonterminal X, if w contains grammar symbols only. Alternative u{w}v becomes uXv=u'{w'}v', where w' does not contain constructs and the process can continue with {w'}.

For each attribute d of each symbol A of w the corresponding attribute Ad is associated with the nonterminals X,Y. For each attribute h of each symbol B of u or v a corresponding attribute Bh is associated with the nonterminals X,Y. Attribute Ad and d, Bh and h are the same type attributes. In attribute rules Fu,Fv names of attributes d of symbols A of w is replaced by names of the corresponding attributes Ad of X and in attribute rules Fw attributes h of B of u,v is replaced by the corresponding attributes Bh of X.

Attribute transfer rules connecting attributes d and Ad, h and Bh are added to Fw and Fu,Fv. Values of attribute d of A and the corresponding attribute Ad of X are connected as follows: either the value of the attribute d is equal to the value of the corresponding attribute Ad or the value of synthesized attribute Ad of X is undefined, if d is not used in Fu,Fv,Fw. Therefore, attributes of symbols of one alternative can appear in attribute rules of another one. For example,

A ::= <C|> =>
$<: $C.a:=1; $|: $C.a:=0; $>: $A.a:=$C.a; =<

If a symbol A appears in a repetitive construct then for each synthesized attribute d of A the corresponding attribute Ab is associated with each symbol X (Y) that is entered for constructs containing A. The attribute Ab is used to transfer values of attribute d of A of the next or the last element of sequence of instances of a repetitive construct. Therefore, the denotation $$A.d of an attribute d of A of a repetitive construct that appears in one alternative can be used in attribute rules of the other one. For example,

A ::= <(C)|> =>
$C.a:=0; $<: $$C.a:=$C.a+1; $|: $$C.a:=$C.a; $>: $A.a:=$C.a; =<

The attributed grammar AG1 in extended BNF, that describes the sequence of lists containing either 'a' or 'b', is shown in example 2.3.1. Low bytes of attributes d of A and B is used to calculate the number of instances of the corresponding letters and high bytes - to calculate the number of lists in the sequence. The attribute grammar AG2 that is AG1 transformed in BNF is shown in example 2.3.2.

Example 2.3.1. Attribute grammar AG1 in extended BNF:

FOR A ::d : INTEGER. ::= 'a' => =<

FOR B ::d : INTEGER. ::= 'b' => =<

FOR C ::h : INTEGER. ::= 'c' => $C.h:=256; =<

FOR D ::a,b : INTEGER.

::= [( ([< A | B >] / ',') C )] '.' =>

$A.d :=0; $D.b := $B.d;

$[:$(:$(:$[:$<: $$A.d:= $A.d+1; $B.d :=$$B.d;

$|: $$A.d:= $A.d; $B.d :=$$B.d+1; $>:

$|: $$A.d:= $A.d; $B.d :=$$B.d;

$]:$): $$A.d:= $A.d+$C.h; $B.d :=$$B.d+$$C.h;

$): $|: $$A.d:= $A.d; $B.d :=$$B.d;

$]: $D.a :=$$A.d; $$B.d:=0; =<

Example 2.3.2. The attribute grammar AG2 in BNF corresponding AG1:

FOR T :: Ad,Ab,Bd,Bb:INTEGER. ::= A (*<A|*)=>

(* $$A.d :=$A.d+1; $B.d :=$$B.d; *)

$T.Ab:=$T.Ad+1; $T.Bd:=$T.Bb;

$A.d :=$T.Ad; =<

T ::= B (*|B>*)=>

(* $$A.d :=$A.d; $B.d :=$$B.d+1;*)

$T.Ab:=$T.Ad; $T.Bd:=$T.Bb+1

$B.d :=$T.Bb; =<

FOR Z :: Ad,Ab,Bd,Bb:INTEGER. ::= T (*([T])*)=>

(* The following rules appear due to extracting attribute
dependencies specified by rules $A.a:=F($$B.b...), $$A.a:=F($B.b...)
of an alternative construct that allocated into a repetitive
construct.

$$T.Ad:=$T.Ab; $T.Bb:=$$T.Bd; *)

$Z.Ab:=$T.Ab; $T.Bb:=$Z.Bb;

$T.Ad:=$Z.Ad; $Z.Bd:=$T.Bd; =<

Z ::= (*([T])*)=>

(* $$T.Ad:=$T.Ad; $T.Bd:=$$T.Bd;*)

$Z.Ab:=$Z.Ad; $Z.Bd:= $Z.Bb; =<

FOR X :: Ad,Ab,Bd,Bb:INTEGER. ::= Z Y (*(Z/',')*)=>

(* $$Z.Ad:=$Z.Ab; $Z.Bb:=$$Z.Bd; see above *)

$Y.Ad:=$Z.Ab; $Z.Bb:=$Y.Bd

$X.Ab:=$Y.Ab; $Y.Bb:=$X.Bb;

$Z.Ad:=$X.Ad; $X.Bd:=$Z.Bd; =<

FOR Y :: Ad,Ab,Bd,Bb:INTEGER. ::= ',' Z Y (*(Z/',')*)=>

(* $$Z.Ad:=$Z.Ab; $Z.Bb:=$$Z.Bd; see above *)

$Y[1].Ad:=$Z.Ab; $Z.Bb:=$Y[1].Bd

$Y[0].Ab:=$Y[1].Ab; $Y[1].Bb:=$Y[0].Bb;

$Z.Ad:=$Y[0].Ad; $Y[0].Bd:=$Z.Bd; =<

Y ::= (*(Z/',')*)=>

$Y.Ab:=$Y.Ad; $Y.Bd:=$Y.Bb; =<

FOR V :: Ad,Ab,Bd,Bb:INTEGER. ::= X C W (*(X C)*) =>

(* $$X.Ad:=$X.Ab+$C.h; $X.Bb:=$$X.Bd+$$C.h;
X.Ad,X.Bd have been replaced by X.Ab,X.Bb to indicate that X
corresponds to repetitive construct containing symbols A,B.*)

$W.Ad:=$X.Ab+$C.h; $X.Bb:=$W.Bd+$C.h;

$V.Ab:=$W.Ab; $W.Bb:=$V.Bb;

$X.Ad:=$V.Ad; $V.Bd:=$X.Bd; =<

FOR W :: Ad,Ab,Bd,Bb:INTEGER. ::= X C W (*(X C)*)=>

(* $$X.Ad:=$X.Ab+$C.h; $X.Bb:=$$X.Bd+$$C.h;*)

$W[1].Ad:=$X.Ab+$C.h; $X.Bb:=$W[1].Bd+$C.h;

$W[0].Ab:=$W[1].Ab; $W[1].Bb:=$W[0].Bb;

$X.Ad :=$W[0].Ad; $W[0].Bd:=$X.Bd; =<

W ::= (*(X C)*)=>

$W.Ab:=$W.Ad; $W.Bd:=$W.Bb; =<

FOR U :: Ad,Ab,Bd,Bb:INTEGER. ::= (*[V]*) =>

(* $$V.Ad:=$V.Ab; $V.Bb:=$$V.Bd;*)

$U.Ab:=$U.Ad; $U.Bd:=$U.Bb; =<

U ::= V (*[V]*) =>

(* $D.a :=$$V.Ad; $$V.Bd:=0; *)

$U.Ab:=$V.Ab; $V.Bb:=$U.Bb;

$V.Ad:=$U.Ad; $U.Bd:=$V.Bd; =<

FOR D :: a,b : INTEGER. ::= U '.' =>

(* $U.Ad:=0; $D.b :=$U.Bd;*)

$U.Ad:=0; $D.b :=$U.Bd;

(* $D.a :=$$U.Ad; $$U.Bd:=0;*)

$D.a :=$U.Ab; $U.Bb:=0; =<

We see that attribute grammar in extended BNF are much more compact and transparent than the corresponding attribute grammar in BNF. It is evident that a lot of possibilities are withdrawn in the case repetitive construct. However, by using extended BNF, many recursive constructs can be implemented by iterative statements, and such attribute dependencies allow efficient memory allocation.

3. The use of the parsing method for code generation
in compiler writing system SUPER

3.1. Introduction

Most recent systems that generate compiler back-ends treat code generation as a tree-rewriting process. Machine instructions are described by tree-rewriting rules consisting of a replacement node, a tree template and code generation actions. Code selection is a process of finding subtrees in the input tree that match tree templates and rewriting these matched subtrees by the corresponding replacement node. The process reduces the input tree to a single node. The machine code is generated by the actions associated with tree-rewriting rules applied in the process. The sequence of the matched subtrees is named a cover of the input tree. The process is usually nondeterministic since more than one template can match a subtree or a portion of it. Cost values are associated with the tree-rewriting rules to indicate the quality of the generated code. The cost of the cover of the input tree is the sum of the costs of the covering rules. The cost values are used to resolve ambiguities in the selection of a cover.

Several implementations of code-generator generating systems use a general pattern matching technique and dynamic-programming algorithms to determine a minimal cover [11,12]. These systems generate fast and efficient compiler back-ends. We give a theoretical foundation of such techniques.

Graham and Glanville [6,7,8] use LR parsing techniques to find the cover of the input tree. The replacement node of a tree-rewriting rule and its tree template written in prefix order is interpreted as a production of a context-free grammar (tree grammars). The set of such productions is a context-free grammar that is used to build an LR parser. The cover can be found by parsing a linearized prefix form of the input tree. The machine code is generated during the reductions of the parsing process. Several difficulties have to be overcome with the LR parsing approach. An LR machine grammar can have many thousands of productions, so parser tables can be too large. The grammars are normally ambiguous, so specialized techniques have to be built into the parser generator to resolve ambiguities. The determinism of the parsing process does not allow the selection of optimal code however.

The well-known method of Earley can be used to parse general context-free languages. The code generator produces optimal code but it takes a lot of time and space. It is well-known that Earley's algorithm takes O(n*n*n) time and O(n*n) space.

The resulting code generator accepts a tree-oriented language. The input sentence is parsed, and a minimal cost derivation tree is selected according to specified cost-attribute rules and the criteria. Following this, the derivation-tree attributes are evaluated through the top-down left-to-right walk of the minimal-cost derivation tree, and corresponding instructions are emitted.

In the remainder of this section we first show how to describe code generators by using attribute tree grammars. Then we investigate the algorithm of Cocke, Younger and Kasami and modify it to accept only tree grammars. Finally, an algorithm which determines minimal-cost derivation trees is given.

3.2. Code generation by attributed tree grammars

A compiler front-end analyzes the input program and translates it into an intermediate representation (IR). The task of a back end is to transform the IR into the machine code. Many factors determine choosing an appropriate IR. We assume that IR is a sequence of prefix expressions. Each operator has a fixed arity and may have attributes whose values are calculated in an earlier compiler phase.

The prefix expression in which the operator a0 is applied to expressions e1,e2,...,ek is a0 e1 e2...ek, where Arity(a0)=k. If a0...an is the prefix expression and the operator a0 has arity k then there is only one decoding of a0...an as a0 e1 e2...ek, where e1,e2,...,ek are prefix expressions. So for each position i in the prefix expression a0..an we can compute position (i+j), such that ai...am=ai e1 e2...ek, where m=i+j-1 and k=Arity(ai). The string ai..am is called an ai-expression and j is the length of the ai-expression.

For each prefix expression, a corresponding expression tree can be built. If the operator a0 is applied to prefix expressions e1,e2,...,ek in the prefix expression z=a0 e1 e2...ek (Arity(a)=k) then its expression tree T0 is shown in Figure 3.2.1, where T1,T2,...,Tk are expression trees for expressions e1,e2,...,ek, respectively.

(a0) T0
/ | \
/ | \
/ | \
/\ / \ ... /\
/T1\ /T2 \ /Tk\
---- ----- ----

Figure 3.2.1: Expression tree for z=a0 e1 e2...ek

Figure 3.2.2 shows an example prefix expression and expression tree for a:=a+5 where a is a local variable stored on the stack at offset CONST(a) from a stack pointer stored in register BASE(i). The attribute values of operators are indicated in parentheses. The @ operator makes its argument a memory address. The ^ operator returns the contents of a memory location.

:=
/ \
/ \
@ +
| / \
+ CONST(5) ^
/ \ |
CONST(a) BASE(a) @
|
+
/ \
CONST(a) BASE(a)

:= @ + CONST(a) BASE(a) + CONST(5) ^ @ + CONST(a) BASE(a)

Figure 3.2.2: Expression tree for a:=a+5

A code generator processes a prefix expression and generates machine code for it. An algorithm that reads a prefix expression and produces code for each operator separately generates bad code, since one machine instruction, in addition to a proper action, usually performs an address calculation and a memory fetch.

So the code generator has to choose several operators to produce one machine instruction. To achieve this, we use attributed tree grammars.

The tree grammar G=(N,T,P,S) fulfills the following conditions:

(1) Each operator a of T has a fixed arity Arity(a)>=0.

(2) The right-hand side of each production A->w of P is a prefix expression with operators from T and operands from N.

A machine instruction is described by a production of an attributed grammar. The machine code is emitted by the actions associated with the productions. Nonterminals in the production right-hand side represent intermediate results calculated early by other instructions. The production left-hand side nonterminal represents a result of the instruction. Figure 3.2.3 contains productions to process the expression of figure 3.2.2

Operator ::= ':=' '@' Reg Reg =>
emit: MOV Reg[2],@Reg[1] =<
Reg ::= '@' '+' CONST Reg =>
emit: MOV CONST(Reg[1]),Reg[0] =<
Reg ::= '+' Reg Reg =>
emit: ADD Reg[1],Reg[0]; Reg[2]=Reg[0] =<
Reg ::= '+' CONST Reg =>
emit: ADD #CONST,Reg[0]; Reg[1]=Reg[0] =<
Reg ::= '^' Reg =>
emit: MOV @Reg[1],Reg[0] =<
Reg ::= CONST =>
emit: MOV #CONST,Reg[0] =<
Reg ::= BASE => =<

Figure 3.2.3: Production needed to process the expression of figure 3.2.2

The code generator parses an input prefix expression and builds a derivation-tree. Then the derivation tree attributes are evaluated during a traversal of it, and instructions are emitted in accordance with actions of productions applied in the derivation tree.

However, the tree grammars that are used to describe the target machine are usually ambiguous grammars. There are several possible derivations for a given input sentence. Cost-attributes associated with nonterminals and cost-attribute rules associated with productions are used to indicate qualities of the code, for example, code size and execution time. The cost-attributes are evaluated according to the cost-attribute rules during the parsing of a prefix expression. Ambiguities during parsing are resolved in accordance with cost criteria. Each nonterminal is supplied with a cost criterion. Using nonterminal cost-attributes values, the criterion determines the desired production to be applied to the nonterminal. The grammar start symbol production, that is selected in accordance with the start symbol criterion, determines a single derivation tree. This method allows the desired cheapest derivation tree to be foumd.

If the derivation tree is detected, derivation-tree attributes are evaluated during the top-down left-to-right walk traversal, and corresponding instructions are emitted.

3.3. General parsing algorithm of Cocke, Younger and Kasami

Let G=(N,T,P,S) be a context free grammar (CFG). V will denote the set N+T. Elements of N will be denoted by capitals A,B,C,D,E. Elements of T will be denoted by a,b,c,d and elements of V by capitals X,Y,Z. Elements of V* will be denoted by u,v,w,x,y and elements of T* by z. The symbol e stands for the empty string.

Let G be e-free CFG and z=a[1] a[2]...a[n] be a sentence to be recognized. We assume that the CFG is e-free, unless otherwise stated. The algorithm of Cocke, Younger and Kasami was originally presented for application on grammars in Chomsky normal form. It can be modified to accept CFG in general form [13,14]. The algorithm builds a triangular recognition table h. The name h[i,j] stands for elements of the table h where 1<=i<=n and 0<=j<=n-i+1. Elements h[i,j] contains the items [A->uv.w] .

h[i,j]={[A->u.v] | A->uv in P and u =>* a[i]...a[i+j-1]} That is, j symbols of the input sentence z starting at i-th symbol are derived from A. The sentence z is in L(G) iff S is in h[1,n].

The initial step of the algorithm is the construction of the sets h[i,0]. The main part of the algorithm is the construction of h[i,j] from the previous constructed sets h[i,k] for every 1<=i<=n, 1<=j<=n-i+1 and 0<=k<=j-1. The final step tests whether the start symbol S is a member of the h[1,n].

Algorithm 1:

VAR a :ARRAY[1..n] of Tterminal;
h :ARRAY[1..n],[0..n] of Titem;
BEGIN
FOR i:=1 TO n DO (* h[i,0] are computed *)
h[i,0]:={ [A->.w] | A->w in P, w=a[i] w' } +
{ [B->.u] | B->u in P, u=Au' and [A->.w] in h[i,0] };
(* initialize h[i,j] *)
END;
FOR every 1<=i<=n, 1<=j<=n-i+1 DO h[i,j]:={};
(* h[1,1],h[2,1],...,h[1,2],...,h[1,n] are computed *)
END;
FOR j:=1 TO n DO (*Increasing length*)
FOR i:=1 TO n-j+1 DO (*For each allowed position*)
h[i,j]:=h[i,j]+{[A->ua.v]| [A->u.av] in h[i,j-1],
a=a[i+j-1] and a in T }
FOR k:=0 TO j-1 DO (* Closure *)
h[i,j]:=h[i,j]+{[A->uB.v]| [A->u.Bv] in h[i,k],
[B->w.] in h[i+k,j-k] and B in N }
END
END
END;
Test whether [S->w.] is a member of h[1,n]
END

Computing the sets h[i,1],...,h[i,n-i+1] for each position i, the algorithm recognizes all derivations of a[i]...a[k] (k<=n-i+1). It inserts the item [A->w.] into the set h[i,k-i+1] if some derivation A=>* a[i]...a[k] is recognized.

The algorithm constructs n+(n-1)+...+2+1=n*(n+1)/2 sets h[i,j]. Its space complexity is O(n*n) where n is the length of the string to be recognized. The number of objects in the set h[i,j] is O(|P|). The algorithm does O(j) steps to process all set pairs (h[i,k],h[i+k,j-k]) (k<j) computing h[i,j]. This means that the time complexity of the algorithm is O(n*n*n).

Computing the sets h[i,j] (j<=n-i+1) the algorithm solves two problems. The first problem consists of finding productions A->u a[p] w a[r]...a[q] v (u a[p]=>*a[i]...) that match the substring a[i]...a[p]...a[r]...a[q]...a[k] of input sentence a[1] a[2]...a[n], where a[p],a[r]...,a[q] are the terminals of the right-hand side of productions and u,w,...,v in N*. The first process tests whether there is the derivation

u' a[p] w' a[r]...a[q] v' =>* a[i]...a[p]...a[r]...a[q]...a[k]

for some u',...,v' in N* and |u'|=|u|,...,|v'|=|v|. Solving the second problem, the algorithm tests whether there is the derivation

A => z0 B z1 ... C zm =>* z0 z1' z1 ... zm' zm = a[i]...a[k]

such that A => z0 B z1 ... C zm is the matched production and

B =>* z1' ... C =>* zm'

are some subderivations that are recognized during the computation of sets h[p,q] (p>i and q<k-i+1).

The items [A->u.v] of h[i,m] (v<>e) are used during the computation of sets h[i,j] (m<=j) to solve the first problem. The items [A->w.] of h[i,m] are used to solve the second problem when sets h[l,j] (l<=i,m<=j) are computed.

In the following modification of the algorithm we organize the set of items [A->w.] for each position i (r[i]) to separate data of the first and the second processes and to simplify the search of subderivations that correspond the matching production. Instead of the triangular recognition table h[i,j] linear table h'[i] is used to find the matched production. The algorithm computes h[i,j] from n down to 1. It recognizes derivations A=>*a[l]...a[j] when all possible derivations B=>*a[i]...a[m] are recognized for all l<i.

Algorithm 2:

VAR
a :ARRAY[1..n] OF Tterminal;
h':ARRAY[0..n] OF SET OF Titem;
r :ARRAY[1..n] OF SET OF Tpair;
(* The type Tpair describes sets of pairs (production,length),
such that for each item [A->w.] of h[i,j](of Alg.1) the pair
(A>w,j) is a member of r[i] *)
BEGIN
FOR i:=1 TO n DO r[i]:={} END;
FOR i:=n DOWNTO 1 DO
h'[0]:={ [A->.w] | A->w in P, w=a[i] w' } +
{ [B->.u] | B->u in P, u=Au' and [A->.w] in h'[0] };
FOR j:=1 TO n-i+1 DO h'[j]:={} END;
FOR j:=1 TO n-i+1 DO
FOR each item [A->u.av] in h'[j-1] (*matching production*)
such that a in T and a=a[i+j-1] DO
IF v<>e THEN h'[j]:=h'[j]+{[A->ua.v]}
ELSE r [i]:=r [i]+{(A->ua,j)}
END
END;
FOR k:=0 TO j-1 DO (*look for subderivations*)
FOR each item [A->u.Bv] in h'[k] such that
B in N and some pair (B->w,j-k) in r[i+k] DO
IF v<>e THEN h'[j]:=h'[j]+{[A->uB.v]}
ELSE r [i]:=r [i]+{(A->uB,j)}
END
END
END
END (* FOR j *)
END; (* FOR i *)
Test whether (S->w,n) is a member of r[1]
END

The sets r[i] have sizes O(n). This means that the space and time complexities are the same as for the previous algorithm.

The quality of the algorithm can be improved by means of the following strategy.

For each set h[i,j] we can pre-compute all values k1...km, such that the pairs (h[i,k1],h[i+k1,j-k1]) ...(h[i,km],h[i+km,j-km]) are only used to compute h[i,j]. We can replace the loop on k of the algorithm by a call to the procedure COMP(i,j) that computes for each i and jh[i,j]:=F(h[i,k1],h[i+k1,j-k1],...,h[i,km],h[i+km,j-km]) using corresponding pre-computed values k1...km. The algorithm keeps the pre-computed values k1...km in some structure before the parsing is started. The procedure gets them during parsing.

It is interesting if the time complexity of the pre-computation does not exceed O(n*n) and a fixed amount of time is required to keep and to restore information.

If the pre-computation can be done taking time O(n) we can only compute O(n) sets of k1...km. This means that O(n) sets h[i,j] have to be computed and the time complexity of the corresponding modification of the algorithm becomes O(n).

3.4. Parsing tree grammars

The tree grammars are used to describe the target machine instructions. Each string that is derived from some nonterminal A in a tree grammar, is a prefix expression. So, if S =>* a1...an then for each subderivation A =>* ai...ak and B =>* ai...al the equality k=l holds and k-i+1 is the length of the ai-expression. Therefore instead of using the table h to do production matching, we shall find the production that matches the string a[i]..a[k] of the input sentence a[1]...a[n] through the left-to-right walk of a[i]...a[k] using pre-computed lengths of prefix expressions to jump from the matched terminal to the next terminal to be matched.

Chain productions are independent of operators, so they have to be tested separately. The application of a chain production may depend on another chain production. Therefore, chain productions have to be tested repeatedly until no chain productions are applicable any more. We assume that there are no cycles in chain production applications.

Algorithm 3:

VAR
a:ARRAY[1..n] OF Tterminal;
r:ARRAY[1..n] OF SET OF Tproduction;
(* For each item [A->w.] of h[i,j](of Alg.1) the element (A->w)
is a member of r[i] *)
l:ARRAY[1..n] OF INTEGER;
(* l[i] is the length of a[i]-expression *)
h":Titem;
(* item that is used to match production *)

BEGIN

(* Pre-computation k1,...,km*)
Compute the length l[i] of a[i]-expression for each position i.
(* Recognize input sentence*)
FOR i:=n DOWNTO 1 DO
FOR each production A -> a[i] y of P DO
j:=i+1;
IF l[i]>1 THEN (*The first terminal a[i] is already matched*)
h":=[A->a[i].y];
REPEAT (*Find a[i]y that match a[i]..a[i+l[i]-1]*)
Let h"=[A->u.Xv]
IF X in T THEN
IF X=a[j] THEN j:=j+1 ELSE exit END
ELSE (* X in N *)
IF some X->w in r[j] THEN j:=j+l[j] ELSE exit END
END;
h":=[A->uX.v];v (*Jump to the next symbol to be matched*)
UNTIL j=i+l[i];
END (*l[i]>1*);
IF j=i+l[i] THEN r[i]:=r[i]+{(A->a[i]y)} END
END (*for*);
(* Match the chain rules *)
WHILE exists a production C->A of P such that
some element (A->w) in r[i]
and (C->A) not in r[i]
DO r[i]:=r[i]+{(C->A)}
END
END; (* for i *)
Test whether (S->w) is a member of r[1]
END

The sets r[i] have sizes O(|P|). It is evident that the time and space complexity of the algorithm for tree grammars is O(n).

Let G be a tree grammar. For each sentence z of L(G) an expression tree can be built. We can rewrite algorithm 3 so it will process expression trees instead of strings. The following algorithm traverses an expression tree in depth-first right-to-left order and finds subtrees in it that match the right-hand sides of productions in a bottom-up manner. We use right-to-left order only to be agree with computing h[i,j] from n down to 1. It is evident that we can use a left-to-right traversal of the expression tree.

The procedure MATCHED(n,[A->u.Xvy]) traverses some neighbourhood of the subtree in depth-first order starting at node n. It tests whether the subtree matches the prefix expression Xv of a matching production A->uXvy and searches subderivations that correspond to the matching production. Therefore, if the production matches the subtree then the string that corresponds to the subtree is derived from A, and the production can be applied in this derivation. We assume that nonterminals are 0-arity operators. The applicable productions are stored directly in the tree nodes. The data structure that represents a tree node has the following form:

PTnode=^Tnode;
Tnode=RECORD
op:Tterminal;
son:ARRAY[1..MaxArity] OF PTnode;
rules:SET OF Tproduction
END;

The algorithm for recognizing expression trees is the following:

Algorithm 4:

The corresponding parts of algorithm 3 are shown in comments.

VAR root:PTnode;
PROCEDURE PARSE(n:PTnode);
PROCEDURE MATCHED(n:PTnode; h":Titem);
VAR matching:BOOLEAN;
BEGIN
Let h"=[A->u.Xvy], v= v1 v2 ... vm, m=Arity(X)
IF m=0 THEN (*IF l[i]=1*)
IF X in T THEN
IF X=n^.op (* match production *) (*IF X=a[j]*)
THEN return(TRUE) ELSE return(FALSE) END
ELSE (*X in N*) (* look for subderivations *)
IF some production X->w in n^.rules
THEN return(TRUE) ELSE return(FALSE) END
END

ELSE (* m>0 *) (*IF l[i]>1*)
IF X=n^.op THEN (*IF X=a[j]*)
matching:=TRUE;
FOR i:=1 TO m DO (*j:=j+l[j] *)
matching:=matching AND (*UNTIL j=i+l[i]*)
MATCHED(n^.son[i],[A->uXv'.vi v"y])
END;
return(matching) (*h":=[A->uX.v]*)
ELSE return(FALSE) END
END
END (* match *)

BEGIN
FOR i:=Arity(n^.op) DOWNTO 1 DO (*FOR i:=n DOWNTO 1*)
PARSE(n^.son[i]) END;
n^.rules:={};
FOR each production A->bu of P such that b=n^.op DO
IF MATCHED(n,[A->.bu]) (*IF j=i+l[i]*)
THEN n^.rules:=n^.rules+{(A->bu)} END
END;

(* Match the chain rules *)
WHILE exists a production C->A of P such that
some element (A->w) in n^.rules
and (C->A) not in n^.rules
DO n^.rules:=n^.rules+{(C->A)}
END
END;

BEGIN
(* Pre-computation *)
Build the expression tree for a sentence z to be recognized;
root:= pointer to expression tree;
(* Recognize input sentence*)
PARSE(root);
Test whether some production S->w is a member of root^.rules;
END

The output of algorithm 4 is the expression tree decorated with the sets of applicable productions.

The tree grammars can be ambiguous grammars. So, there are several possible derivations for a given input sentence. We can build all possible derivations using the decorated expression tree of algorithm 4 and applicable production table r and the expression length table l of algorithm 3.

3.5. Minimal cost derivation tree

The tree grammars that are used to describe the target machine are usually ambiguous grammars. There are several possible derivations for a given input sentence. To generate code, it is needed to select a single derivation tree. The tree has to represent desired qualities of the code, e.g. code size and execution time.

A derivation tree can be selected according to specified cost-attribute, cost-attribute rules and criteria. Cost-attributes are associated with grammar nonterminals. Cost-attribute rules are associated with grammar productions. Attributes of every nonterminal A are attached to every node n of an expression tree t. Their values are computed in parallel to the parsing of the input sentence. Attribute values reflect some properties of derivations from A.

Assume that for the node n, an applicable production p:A::=z0 X0 z1...Xk zk is detected, where zi in T* (0<=i<=k) and Xj in N (1<=j<=k). The cost-attribute rules corresponding to the production p are evaluated as follows. According to the applicable production there are descendants n1,...,nk of n which correspond to the nonterminals X1,...,Xk respectively. Cost-attributes values are computed in a bottom-up order. First cost-attributes are initialized with the undefined value. It is assumed that cost-attributes values for all descendants n1,...,nk are already computed. If an attribute rule

A.a:=f(Xi.b,Xj.c,...) for 1<=i,j<=k

is associated with p then the attribute rule instance

n.a:=f(ni.b,nj.c,...)

is evaluated. The computed value is the value of the attribute a of the nonterminal A in the node n for the production p. It is possible there are several applicable productions for A. To choose a single applicable production for A, a cost criterion is used. Each nonterminal is supplied with its cost criterion. The criterion determines the desired production to be applied to the nonterminal. It uses early and currently computed cost-attributes values of nonterminal A. If it decides that the currently tested applicable production is better than the early investigated one, or nonterminal A cost-attributes values are undefined, it assigns currently computed values to the cost-attributes of A and stores the corresponding production in the node n. At the end of parsing, the stored production of the grammar start symbol defines a single derivation tree.

To modify algorithm 4 to determine a single derivation tree we have to add cost-attributes, attribute rules and cost criteria to it. Since an undefined value is used as a cost-attributes value, we can withdraw testing subderivations that correspond to the matching production. The data structure of a tree node has to have the following form:

PTnode=^Tnode;
Tnode=RECORD
op:Tterminal;
son:ARRAY[1..MaxArity] OF PTnode;
nonterm: ARRAY[Tnonterminal] OF RECORD
CostAttr : ADDRESS;
Production : Tproduction;
END
OperatorAttributes: ...
END;

In general the field CostAttributes points to an area allocated for cost-attrubutes of the corresponding nonterminal. If the required area can be located in one machine word then it is allocated directly in this field. Each operator may have attributes whose values are calculated in an earlier compiler phase. These values are stored in the field OperatorAttributes.

The body of the procedure PARSE becomes

BEGIN
FOR i:=Arity(n^.op) DOWNTO 1 DO PARSE(n^.son[i]) END;
FOR each A of N DO WITH n^.nonterm[A]
CostAttr:=UndefinedValue;
production:=Undefined;
END;
FOR each production A->bu of P such that b=n^.op DO
IF MATCHED(n,[A->.bu]) THEN
ComputeCostAttribOf(A,n,(A->bu));
EvaluateCriterionOf(A,n^.nonterm[A].CostAttr);
IF (A->bu) is better than early investigated
production for A THEN
Update(n^.nonterm[A].CostAttr)
n^.nonterm[A].production:=(A->bu)
END
END
END;

(* Match the chain rules *)
WHILE exists a production C->A of P such that
the production is better than
early investigated production for A
DO
ComputeCostAttribOf(C,n,(C->A));
EvaluateCriterionOf(C,n^.nonterm[C].CostAttr);
IF (C->A) is better THEN
Update(n^.nonterm[C].CostAttr)
n^.nonterm[C].production:=(C->A)
END
END
END;

If the derivation tree is detected, then the derivation-tree attributes are evaluated through the top-down left-to-right walk of the minimal cost derivation tree, and corresponding instructions are emitted. To traverse the derivation tree evaluator, start with the root of the expression tree and the grammar start symbol. It uses the production p:A::=z0 X0 z1...Xk zk saved in the node n and a given nonterminal A to determine corresponding nodes n1,...,nk and nonterminal X1,...,Xk. Then it traverses recursively each node ni with the nonterminal Xi.

The algorithms that are used in the BEG system [11] and the TWIG system[12] to find a minimal cover are similar to modified algorithm 4. The BEG system incorporates the matching algorithm directly into the generated code generator. The TWIG system use the pattern-matching automaton. Both systems select a single cover of IR-tree by means of a simple fixed cost criterion.

4. Conclusion

The SUPER system supports the efficient implementation of compiler front ends and back ends. We have shown the principle features of the system input language: namely attributed context-free subgrammars in extended BNF, and attributed tree grammars for code generation.

We investigate the algorithm of Cocke, Younger and Kasami, and modify it to accept only tree grammars. An algorithm which determines minimal cost derivation trees is given. We believe the main advantage of our approach to code generation is the use of the usual parsing and attribute techniques for the generation of code and attributed tree grammars for specifications.

References

1. Aho A.U., Corasick M.J. Efficient string matching: An aid to bibliographic search. Comm.ACM,V.18,N.6,1975.

2. Ganapathy M., Fischer C.N.,Hennessy J. Retargetable code generation. ACM Comp.Surveys, V.14, N.4, 1982.

3. Chow F.C.,Ganapathy M. Intermediate languages in compiler construction - a bibliography. ACM SIGPLAN Not., V.18., N.11.

4. Hoffman C.W., O'Donnell M.J. Pattern matching trees. J.ACM, V.29, N.1, 1989.

5. Wasielew S.G. A compiler writing system with optimization capabilities for complex order structures. Ph.D.Th., Northwestern University, Evanston, 1972.

6. Glanville R.S. A machine independent algorithm for code generation and its use in retargetable compilers. Ph.D.Th, Univ.of California, Berkeley, 1977.

7. Graham S.L., Glanville R.S. A new method for compiler code generation. Proc.of the 5th ACM Symp.on Principles of programming languages, ACM, 1978.

8. Graham S.L., Henry R.R., Schulman R.A. An Experiment in table driven code generation. ACM SIGPLAN Not., V.17, N.6, 1982.

9. Ganapathy M., Fischer C.N. Description driven code generation using attribute grammars. Proc.of the 9th ACM Symp.on Principles of programming languages, ACM,1982.

10. Christopher T.W., Hatcher P.J., Kukuk P.C. Using dynamic programming to generate optimal code in Graham-Glanville style code generation. ACM SIGPLAN Not., V.19, N.6, 1982.

11. Emmelman H., Schroer F.W., Landweher R. BEG - a generator for efficient back-ends. ACM transaction of programming languages and systems, V.11, N.4, 1989.

12. Aho A.U., Ganapathi M., Tjiang S.W. Code generation using tree matching and dynamic programming. ACM transaction of programming languages and systems, V.11, N.4, 1989.

13. Graham S.L., Harrison M.A., Ruzzo W.L. Am improved context-free recognizer. ACM transaction on programming language systems, N.2, 1980.

14. Harrison M.A. Introduction to formal language theory. Addison-Wesley, Reading, Mass., 1978.

15. A.Bezdushny, The use of general dynamic programming parsing method for code generation, Computing Centre of the Academy of Sciences of the USSR, 1990.

16. A.Bezdushny, V.Serebriakov, The input language of the compiler writing system SUPER, Computing Centre of the Academy of Sciences of the USSR, 1986.

17. A.Bezdushny, V.Serebriakov, SUPER: A practical attributed compiler writing system, Proceedings Bilateral Workshop on Compiler Construction, University of Twente, Enschede, The Netherlands, 1990.

18. A.Bezdushny, Code generation in compiler writing system SUPER, Computing Centre of the Academy of Sciences of the USSR,1991.

19. A.Bezdushny, V.Serebriakov, The extension of input language of the compiler writing system SUPER, Computing Centre of the Academy of Sciences of the USSR,1991.

20. H.Alblas, A characterization of attribute evaluation in passes. Acta Informatica 16, 1981.

21. H. Genzinger,K.Ripken,R.Wilhelm, Automatic generation of optimizing multipass compilers. Information Processing 77, 1977.

22. K.Räihä, M.Saarinen, E.Soisalon-Soininen, M.Tienari, The compiler writing system HLP, Dep't of Comp. Sciences, Report A-1978-2, 1978.

23. U.Kastens, B.Hutt, E.Zimmermann, GAG: - A practical compiler generator. Lecture Notes in Computer Science, 1982.

24. D.Knuth, Semantics of context-free languages, Math. Systems Theory 2, 1968.