This page documents cc.ast, the most important file in Elsa. Mainly I document here the big ideas or the tricky points; cc.ast itself should be consulted for the details, and you should probably be looking at that file as you read this page (classes are documented in the same order as they appear in cc.ast).
Note that some of the AST classes, but not all, have source location information (SourceLoc; smbase/srcloc.h). Generally I've just put location info wherever I've happened to need it; there's no clear strategy. Now that SourceLoc is just one word (it used to be three), I might put it everywhere.
The entire AST is a list of TopForms, collected into a TranslationUnit. A TopForm is something that appears at toplevel, or (once they're implemented) at toplevel in a namespace.
All function definitions, whether at toplevel or inside class bodies, get a Function AST node. The function's name and parameters are encoded in the nameAndParams Declarator. Also included are constructor member initializers, and constructor exception handlers, if any.
A Declaration is a TypeSpecifier and then some Declarators, plus some optional keywords (DeclFlags) like static or extern.
An ASTTypeId is like a Declaration, but there's only one Declarator and no DeclFlags. It's used for function parameters and for the types that appear in the cast syntax.
A PQName, a possibly-qualified name, is usually just a string (actually a StringRef, a pointer into the StringTable; see ast/strtable.h). However, it might also have qualifiers, those names that can appear before the "::" symbol. To complicate matters further, sometimes the names have template arguments, and sometimes the names refer to operators.
TypeSpecifiers are the first part of Declarations. They mainly correspond to AtomicTypes in the terminology of cc_type: built-in types, enums, structs, classes, and unions. However, via typedefs (TS_name), they can actually refer to constructed types (like pointers) also.
Members are elements in a class definition. Typical members are data members or method prototypes (MR_decl), or inline definitions of methods (MR_func). MR_publish is obscure, corresponding to an "access declaration" [cppstd Section 11.3].
The C/C++ syntax for declarators is probably the strangest part of the language to someone new to parsing it. Declarators are the things that come after TypeSpecifiers in Declarations:
int * x , * * y ; | TypeSpecifier <-- Declarator -> <-- Declarator --> <---------------------- Declaration ----------------------->But they also have a recursive structure, represented in my AST as IDeclarators:
int * * y ; | D_name <---- D_pointer ----> <--------- D_pointer ----------> int * * ( * func ) (int, int) ; | D_name <-- D_pointer --> <----- D_grouping -------> <-------------- D_func -------------> <------------- D_pointer ------------------> <------------- D_pointer -------------------->
Now, what really makes them screwy is that they're inside out! Taking the last example above, func is being declared to be a pointer to a function which returns a pointer to pointer to an integer--you read it from right to left. The type checker (cc_tcheck.cc) sorts the types out into a more reasonable representation, but they start like this.
(By the way: declarators are inside-out not because Kernighan and Ritchie are evil or stupid, but because they wanted the syntax of declarations to mirror the syntax of expressions, to try to make the language easier to learn.)
OperatorName just stores the various operator-induced names. The getOperatorName then flattens them down to a string anyway. The tricky one is ON_conversion, which can't be canonically flattened, so this has to be special-cased in code that consumes OperatorNames.
Statement represents statements; it's pretty straightforward. Condition is the thing between the parentheses in conditionals. Handler is what comes after try.
Expression represents expressions. Again, straightforward.
Notice that E_stringLit may contain a continuation. That's how string literal concatenation is implemented: in the parser. This is done because the lexer does no interpretation of any kind of literal, but concatenation semantics would require that it do so (e.g. "\xA" "B" is equivalent to "\x0AB", two characters, not "\xAB", one character).
Initializers are what come after IDeclarators in Declarations; they're the "3" in "int x = 3;". They can be nested, as IN_compound.
InitLabel is part of an intent to support the extended "labeled" initializer syntax of C99, partially supported by gcc. However, Elsa currently supports none of it.
Representation of templates in the AST is straightforward. Elsa does not (yet?) expand or instantiate templates, however.