A common task in language translators is to generate target language text or a text based set of error messages. Macro processors provide a convenient mechanism for generating textual output and we adopt a macro language as a domain specific meta language for text attributes and show its use in an example translator for a simple language. Translators often generate complex internal data structures which are queried during language translation. In type checking, an internal representation of program defined types is generated and an environment structure is created for associating type and variable names with their types. This is a very specific domain and the definition of attributes used in solving this sub problem can benefit from being written in a domain specific language. We describe a simplified version of such a language in our example. Another area in which domain specific languages are useful is in program optimization and parallelization. The process often represents the source program in the form of flow graphs, data dependency graphs, and program dependency graphs [7]. In this domain a language with built in graph construction and analysis operations can simply solution specifications [21].
Since we have many distinct problem sub domains in language processing tool specification, it is only natural to specify their solutions in domain specific languages. These languages raise the level of abstraction by providing data types and control structures for the specific items in a domain. This frees the specification writer from dealing with implementation details and leaves him or her free to concentrate on the problem solution, not on its encoding in a general purpose language. The use of domain specific meta languages in the definitions of attributes can thus significantly simplify attribute grammar specifications.
The aim of this research is to explore possible meta languages for defining various semantic aspects of programming language extensions, called intentions, for the Intentional Programming (IP) system [19] under development at Microsoft Research. An intention can be seen as a production and definitions of the appropriate attributes which allow it to be added into an existing language framework. The aim of IP is to allow applications programmers to extend their programming languages to meet their particular needs in a given problem domain. When a programmer defines a new intention, the syntactic representation and its semantics must be specified. Thus, we are interested in exploring different meta languages which the programmer might find useful in defining intentions. The system described here provides a convenient mechanism for designing and implementing such meta languages.
We describe some related work in simplifying the specifications of language processors in Section Related work. Section Motivating example provides a motivating language translation problem. In Section Meta languages we give its solution in the form of an attribute grammar which uses three meta languages for defining attributes: a general purpose language similar to those in most attribute grammar systems, a macro language for defining the textual target code, and a simple language designed to specify an environment. Section The tools briefly describes how domain specific meta languages can easily be implemented and integrated into our system. Section Discussion provides a discussion.
Attribute grammars are usually specified with all attribute definitions for a production's non-terminals written together with the production. Instead of decomposing attribute grammars by production, they can also be decomposed by attribute, that is, writing all definitions for one or more related attributes together [1,6,5]. Most attribute grammar systems [9,11, 12,18] provide methods for grouping attribute definitions by attribute by textually regrouping definitions by production before analysis of the attribute specifications. Grouping definitions by attribute allows one to concentrate on a specific sub problem solved by an attribute's definitions.
In a method proposed by Dueck and Cormack [6] attribute definition templates are used to automatically generate several attribute definitions. Definition templates are associated with a production pattern that is matched against the productions in a context free grammar. Those which match are given attribute definitions generated from the templates by the matching information. Eli [9] provides similar facilities but also allows different types of reference mechanisms to attribute values in remote nodes in the tree so that a definition is not restricted to referencing attributes on the parent and child nodes only.
Macro processors have often been used to implement language translators [23, 4,22,17,21]. Essentially, one defines a macro for each construct in the source language. The body of the macro expands into the translation of the construct in the target language. Some macro processors provide powerful extensions to perform rudimentary semantic analysis tasks on the source program. Tanenbaum [20], for instance, used the symbol table facilities of the ML/I macro processor to hold the type and run-time address of the program variables in a source program. Macro processors are however rarely used as a single solution in writing language translators since the extensions provided for handling non-textual data are inadequate for the complex semantics of modern languages. Macro processors are an appropriate mechanism for generating textual attribute values, but processes in other domains should be written in a more appropriate language.
x
is visible in the smallest enclosing
scope except for any inner blocks which also define an identifier
x
. A concrete syntax for this language is given below:
program: <Program> = "program" <Block> "." list: <Block> = <StmtList> slist1: <StmtList> = <Stmt> slist2: <StmtList> = <StmtList> <Stmt> local: <Stmt> = "[" <Block> "]" dec: <Stmt> = "dec" "id" use: <Stmt> = "use" "id"An example program may be:
program use a use b dec b [ dec a use a use b ] dec a use a .The use of
a
in the inner block refers to the inner declaration
of a
while the use of Enter
statement which is labelled by the lexical level of the block and the
number of local variables defined in the block. The exit of a block
has an Exit
statement labelled by only the lexical level. A
variable reference is indicated by a Ref
statement which is
labelled by the lexical level and offset of the variable's
declaration. The offset of a variable declaration is the order in
which it appears in the block. Our example program above would be
mapped into the following target program:
Enter 0 2, Ref 0 0, Ref 0 1, Enter 1 1, Ref 1 0, Ref 0 1, Exit 1, Ref 0 0, Exit 0
code
- a synthesized text attribute defining the target
code, level
- an inherited integer attribute defining
the lexical or block-nesting level, and env
- an
inherited attribute defining the environment that can be queried by
operations id_level
and id_offset
to give,
respectively, the level and offset of a variable's declaration. A
third operation number_locals
returns the number of local
variable declarations in a block.
We present the solution in an aspect oriented manner [5] such that attribute definitions are grouped by attribute or aspect, not production. Thus, all the definitions of an attribute appear together in a single file. When defining attributes with different meta languages grouping by attribute instead of production makes the specification easier to read. In this simple example we will attach the semantic functions to the concrete syntax rules. This allows us generate a parser directly from the grammar rules and to avoid the hassle of mapping from the concrete to abstract syntax.
We first present the definitions of the level
attribute
written in a general purpose meta language to introduce our basic
framework and notations. This language is essentially
Haskell [3] with embellished naming conventions for
referencing attributes associated with terminal and nonterminal
symbols. It is similar to the attribute definition languages used in
other systems. We then present the definitions of the code
attributes using a domain specific macro language. These definitions
compare favorably to the equivalent definitions written in the general
purpose meta language. Finally we present the definitions of the
environment attribute env
in its domain specific language.
level
attribute is an inherited attribute which
defines the lexical level of the blocks in the source program. Since
it is an inherited attribute, we can use a default copy rule which
copies inherited attribute values from parent to child nodes if no
other rule is specified. Thus, we need only write attribute
definitions for the program
and local
productions:
/\ level Haskell program: <Program> = "program" <Block> "." <Block>.level is 0 local: <Stmt> = "{" <Block> "}" <Block>.level is <Stmt>.level + 1The first line gives the name of the attribute being defined (
level
) and the domain specific language used to write
the definitions (a Haskell-like meta language). Attributes of a
nonterminal are referenced by following the name of the nonterminal in
angle brackets by a dot (.) and the attribute name. If more
than one nonterminal in a production have the same name, the name of
the non-terminal is followed by an integer indicating its order in the
production.
code
attribute. When using macros, one does not
write the commands to build up the textual value of an attribute, but
instead writes a macro body which is expanded by plugging attribute
values of terminals and nonterminals into the holes in the macro
body. These holes are the formal parameters of the macro and are just
the references in the macro to attribute values of terminals and
nonterminals written in the same style as above.
When using macros to define a synthesized attribute of the nonterminal
on the left hand side of the production we can drop the attribute
reference to the left of the is
keyword seen above so
that all of the text between the production and the following
production is seen as the body of the macro defining the attribute's
value. Since we group attribute definitions by attribute, all
definitions in a single file define the same attribute, thus there is
no confusion. The specification for the unary productions
program
, slist1
, and local
are not shown
since they simply copy the code
attribute of the single child to
the parent.
/\ code Text list: <Block> = <StmtList> Enter <Block>.level ~number_locals <StmtList>.env <StmtList>.level~ <StmtList>.code Exit <Block>.level slist2: <StmtList> = <StmtList> <Stmt> <StmtList>_2.code <Stmt>.code use: <Stmt> = "use" "id" Ref ~id_level <Stmt>.env "id".lex~ \ ~id_offset <Stmt>.env "id".lex~ dec: <Stmt> = "dec" "id" empty_stringThe definition of the
code
attribute for the <Block>
nonterminal of the list
rule is the three line macro shown above
beginning with Enter
and ending on the last non-blank line
above the following rule. At compile time, this macro is expanded by
filling the formal parameters (the attribute references) with their
values. The generated text begins with an Enter
statement,
followed by the lines of code of the statement list (each indented two
spaces as shown in the macro) and concludes with an Exit
statement. This text becomes the value of the code
attribute of
<Block>
. (Note that the backslash (\
) is a line
continuation symbol.)
The Enter
statement, which needs the number of local
variables in the block, and the Ref
statement, which
needs the level and offset of the referenced variable, both take
parameters which are not stored directly as attributes. These values
are extracted from data structures stored in attributes and thus we
need some common interface language which a macro can use to
extract the required information. We naturally choose as this
interface the general purpose meta language used in Section Lexical level. Inside a macro, phrases in this
language are written between tildes (~
) and are
expressions which return the desired value to be copied into the
expanded text of the macro. In the list
macro above, the
phrase
~number_locals <StmtList>.env <StmtList>.level~is written in the general purpose attribute definition language and makes use of the
number_locals
function to compute the number
of local variables defined in the block with the level
of the
statement list. Similarly, the use
macro uses the common
interface language to extract from the environment <Stmt>.env
the level and offset of the identifier whose lexeme is given by
"id".lex
.
The macro language provides a domain specific meta language which is
clearly a better choice than a general purpose specification language.
If we had written the definition of the code
attribute for the
slist2
production using the general purpose meta language, it
would have looked like the following where ++
is string
concatenation.
slist2: <StmtList> = <StmtList> <Stmt> <StmtList>.code is <StmtList>_1.code ++ "\n" ++ <Stmt>.codeThe definition for
list
is even more daunting:
list: <Block> = <StmtList> <Block>.code is "Enter " ++ show <Block>.level ++ " " ++ show (number_locals <StmtList>.env <StmtList>.level) ++ "\n" ++ <StmtList>.code ++ "\nExit " ++ show <Block>.levelThese are much more difficult to comprehend than the domain specific macro versions in which we see exactly the structure of the target code text we want to generate. In the general language definition, we have to study the code to understand what the target code will be. (The
show
function converts a value to its textual
representation.) Another significant advantage of the macro
specification language is that the white space (spaces, tabs, new
lines) between text and attribute values is inserted automatically
into the expanded text. Also, it only inserts white space between
parameters if they are both not the empty string. For example, since
the dec
rule generates the empty string as the code
attribute's value, if the <Stmt>
in the slist2
rule is a
declaration we would not want the new line between the formal
parameters in the macro body to be inserted into the generated code.
This would lead to spurious blank lines in the final target code. The
macro language ensures that this does not happen. Adding this feature
to the general purpose specifications would make them even more
complicated. The macro meta language lets us write specifications
which say exactly what we want the target code to look like and
shields us from the many complexities one must deal with in a general
purpose specification language.
env
using a
domain specific language tailored to our example. This language is
used to define the environment of a block and allows remote
contributions of declared local variables to an environment. Since
env
is an inherited attribute, we can again use the default copy
rule to copy the attribute value from parents to children. Thus we
need only specify attribute definitions for env
for the rules
program
and list
.
/\ env EnvLang program: <Program> = "program" <Block> "." <Block>.env is empty environment list: <Block> = <StmtList> <StmtList>.env is add locals at <StmtList>.level to <Block>.env dec: <Stmt> = "dec" "id" contribute "id".lex to localsThese rules define the environment of the outer most block to be empty, and the environment of inner blocks by adding an ordered list of local variables at a specific lexical level to the enclosing environment. The
contribute
statement in the dec
rule
adds the lexeme of the identifier ("id".lex
) to a list of local
variables which will be added to the environment at the enclosing
block. When the id_level
and id_offset
operations query
an environment with an identifier name, they search each set of local
variables in the environment by decreasing lexical level until the
identifier is found. The contribute
statement is similar to the
remote attribute access methods in Eli [9] and allows us to
avoid creating another attribute to pass local variable names up to
the nearest enclosing block up the tree. Behind the scenes the meta
language may implement contribute
by defining a hidden
synthesized attribute to pass local variables up to the definition of
the enclosing environment, but it is not the concern of the person
writing attribute definitions. We are simply seeking, in this
admittedly simple example, to raise the level of abstraction when
dealing with the environment.
The generated Haskell program is written in an aspect oriented style [5] in which all the code defining a particular attribute is held in a single extensible record structure called an aspect. Each aspect can be type checked, parameterized and separately compiled. All the aspect definitions, one for each attribute, are then combined to generate the Haskell function mapping input programs to attribute values. Since our attribute grammar specifications are decomposed by attribute and thus attribute definitions of an individual attribute are kept in a separate file, it is straight forward to write translators which map attribute definitions into the desired Haskell aspect code which is then merged into the final target program.
In addition to the Haskell code for each attribute, there is a main set of enclosing functions generated by a translator from the productions which combine the attribute definition functions into the main function implementing the attribute grammar. These functions are also generated by a translator which maps the set of productions into this set of functions. The heart of our system is collection of loosely integrated translators: one which maps the source language syntax to the main structural code of the attribute grammar and one for each domain specific meta language which maps attributes defined by that language into their Haskell code implementation. All the generated functions are combined into a single Haskell module which implements the the language processing tool defined by the attribute grammar.
Since each meta language requires a translator as its implementation, it is only natural to define and implement all of these translators using this same set of tools. Thus, for each meta language we define an attribute grammar specifying the translation from attribute definitions written in that meta language into their Haskell function implementation. All meta language processors have the same enclosing syntactic structure, sketched below:
<Aspect> = <SpecList> <SpecList> = <SpecList> <Spec> <SpecList> = <Spec> <Spec> = "rulename" <BNF_Rule> <MetaLanguageFragment>The
<MetaLangaugeFragment>
construct contains the entire
attribute specification associated with a production. Thus the meta
language has complete control over the syntax and semantics of its
components.
Because we can automatically generate meta language processors from their specifications it is not difficult to extend a meta language with new features or to build a completely new one, as we did for the environment attribute. Also, it is very easy to add new meta language translators to the system and thus integrate new domain specific meta languages into our attribute grammar system.
2. A. Aho, R. Sethi, and J. Ullman. Compilers - Principles, Techniques, and Tools. Addison-Wesley, Reading, MA, 1986.
3. R. S. Bird. Introduction to Functional Programming in Haskell. International Series in Computer Science. Prentice Hall, 1998.
4. P. Brown. Using a macro processor to aid softare implementation. Computer Journal, 12(4):327-331, Novermber 1969.
5. O. de Moor, S. Peyton-Jones, and E. Van Wyk. Aspect oriented compilers. In First International Symposium on Generative and Component-Based Software Engineering, 1999.
6. D. D. P. Dueck and G. V. Cormack. Modular attribute grammars. The Computer Journal, 33(2):164-172, 1990.
7. J. Ferrante, K. Ottenstein, and J. Warren. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9(3):319-349, July 1987.
8. H. Ganzinger and R. Giegerich. Attribute coupled grammars. SIGPLAN Notices, 19:157-170, 1984.
9. R. W. Gray, V. P. Heuring, S. P. Levi, A. M. Sloane, and W. M. Waite. Eli: A complete, flexible compiler construction system. Communications of the ACM, 35:121-131, 1992.
10. T.~Johnsson. \newblock Attribute grammars as a functional programming paradigm. \newblock In G.~Kahn, editor, {\em Functional Programming Languages and Computer Architecture}, volume 274 of {\em Lecture Notes in Computer Science}, pages 154-173. Springer-Verlag, 1987.
11. M. Jourdan, D. Parigot, Juli{\'e}, O. Durin, and C. Le Bellec. Design, implementation and evaluation of the FNC-2 attribute grammar system. In Conference on Programming Languages Design and Implementation, pages 209-222, 1990. Published as ACM SIGPLAN Notices, 25(6).
12. U. Kastens, B. Hutt, and E. Zimmermann. GAG: A Practical Compiler Generator, volume 141 of Lecture Notes in Computer Science. Springer Verlag, 1982.
13. U. Kastens and W. M. Waite. Modularity and reusability in attribute grammars. Acta Informatica, 31:601-627, 1994.
14. D. E. Knuth. Semantics of context-free languages. Mathematical Systems Theory, 2:127-146, 1968.
15. K. Koskimies, K. Raiha, and M. Sarjakoski. Compiler construction using attribute grammars. SIGPLAN Notices, 17(6):153-159, 1982.
16.
M. Kuiper and S. D. Swierstra.
Using attribute grammars to derive efficient functional programs.
In Computing Science in the Netherlands CSN '87, 1987.
Available from: ftp://ftp.cs.ruu.nl/pub/RUU/CS/techreps/CS-1986/1986-16.ps.gz
.
17. J. Lee. Macro-processors as compiler code generators. Master's Thesis, The University of Iowa, Department of Computer Science, Iowa City, IA 52242, 1990.
18. T. W. Reps and T. Teitelbaum. The Synthesizer Generator: A system for constructing language-based editors. Texts and Monographs in Computer Science. Springer-Verlag, 1989.
19.
C. Simonyi.
Intentional programming: Innovation in the legacy age.
Presented at IFIP Working group 2.1. Available from URL
http://www.research.microsoft.com/ip/
, 1996.
20. A. Tanenbaum. A general purpose macroprocessor as a poor man's compiler compiler. IEEE Transactions Software Engineering, 2:121-125, June 1976.
21. E. Van Wyk. Semantic Processing by Macro Processors. PhD Thesis, The University of Iowa, Iowa City, Iowa, 52240 USA, July 1998.
22. W. Waite. Building a mobile programming system. Computer Journal, 13:28-31, February 1970.
23.
M. Wilkes. An experiment with a self-compiling compiler for a
simple list processing language.
Annual Review of Automatic Programming, 4:1-48, 1964.
* This work is funded by Microsoft Research.