The Java AST.

General Design

This section identifies the assumptions and design choices underlying the entire API.

Tree decorations: Properties

The Tree framework provides a facility for decorating nodes that should intentionally remind callers of properties in java.lang.System: getProperty(S), setProperty(SO), clearProperty(S). All Tree subtypes allow the caller to store an arbitrary Object value with a specific Tree instance under a String key.

Tree, not a DAG

The source model must strictly be a tree. It may not be a dag (directed-acyclic graph).

The consequence is that no element may have a "parent" relationship to more than one element. Also, element A may not have a "sibling" relationship with element B unless A and B share the same parent.

Consider element C and element P. If P.getChildren().contains(C), then C.getParent().equals(P) must be true.

Furthermore, sibling order is defined by source file location. If sibling C occurs before D in the source representation, then C should occur before D in the sibling list.

No outside side effects

Here, an "outside side effect" means an operation upon another file's contents. The typical example is compilation. Resolving the type of a type declaration's superclass almost always requires loading other classes, usually java.lang.Object. Therefore, the Tree API may not have a method that resolves a type declaration's superclass.

The main reason for this choice is to avoid having Tree method calls block on file I/O.

Cloning trees

A Tree object may not be moved from one FileT to another. That is, a Tree object is tied to a FileT object for its entire lifetime. We do this to prevent the undo problems that arise when a Tree object is unlinked from one FileT and linked to another and then the unlink operation is undone.

In place of a traditional move operation, we provide a cloning facility for Tree objects. A client calls cloneSelf(FileT) on a Tree object to clone its entire subtree into the provided FileT. The cloned subtree is unattached, i.e. has no parent, and the client must then add it to the tree in the desired location.

Write operations

The FileT presents a parsed view into the contents of a source file. Changing the FileT causes the contents of underlying source file to be changed. For example, if a caller adds an ImportT to a FileT's List of import declarations, then, on commit, an import declaration is added to the underlying source file.

Mutating a FileT that is not writable results in an IllegalStateException. A FileT is marked as read-only or writable by its owning TreeManager. Please see javax.ide.model.java.source.write.TreeManager for more detail.

Final Tree subtypes

Here, a "final Tree subtype" refers to a Tree interface with no further public subinterfaces. To illustrate, let's consider method and constructor declarations. According to this design choice, there are two solutions.

1. Have one type, MethodT, that represents both method declarations and constructor declarations. The idea comes from class files where constructors are treated just like methods. In fact, it is desirable to have a type that represents both method and constructor declarations since there are a number of client operations (particularly refactoring) that execute the same code for both method and constructor declarations.
2. Have two types, ConstructorT and MethodT, which represent disjoint sets. All constructor declarations will be represented by ConstructorT and all method declarations that are not constructor declarations will be represented by MethodT. This is how reflection does things. The major downside to this is that there is no clear name we can give to the common supertype of ConstructorT and MethodT. As noted above, it is desirable to have a common supertype for methods and constructor declarations.

If we were to have the two types where ConstructorT extends MethodT, then this would not meet the design choice because then method declarations that are not constructor declarations would not be an instance of a "final Tree subtype".

The benefit is that clients are able to dispatch according to the run-time (and sometimes the compile-time) type of a Tree object via method overloading and instanceof checks.

The downside is two-fold. One, this choice generates that many more types in the API, thus producing clutter. Two, an implementation is required to have a separate implementation for each Tree subtype (in order for instanceof checks to work correctly).

Minimal multiple inheritance

There should be no multiple inheritance except for the following: HasBlockT, HasModifiersT, HasNameT, MemberT, BlockElementT, and VariableT. This produces a very rigid tree structure.

The benefit is that the tree structure becomes highly predictable, predictable enough that the client and implementation gets that extra type checking from the compiler. For example, a ListExpressionT has only ExpressionTs as its children (as do most other ExpressionT subtypes).

One downside is that we get classes like AnnotationExpressionT. An AnnotationExpressionT represents both an AnnotationT and an ExpressionT but is in a position where an ExpressionT is expected (it is an child of a ListExpressionT). The "Final Tree subtype" rule dictates that we can't have a generic Tree instance that is both an AnnotationT and an ExpressionT. The other notable examples are: BlockStatementT and TypeExpressionT.

Another downside is that this rule introduces more nodes than is strictly dictated by the grammar thereby increasing the memory footprint of a Tree representation.

Specific Design

This section identifies the assumptions and design choices underlying specific classes in this API.

TypeArgumentT

Consider the following type reference:

  p.OuterType.InnerType

A TypeReferenceT represents the entire type reference. It has a TypeReferenceT for the qualifying type "p.OuterType", NameT for "InnerType", and a TypeArgumentT for "T2". We cannot use TypeArgumentT for "T2" because then there would be two TypeReferenceT children and the above type reference is indistinguishable from:

  InnerType, T2>

Hence, we have more general class TypeArgumentT instead of WildcardTypeT.

InfixExpressionT

Some InfixExpressionT can have multiple ExpressionT operands even though the operator itself is a binary infix operator. In particular, because most traversals are written recursively, long String concatenations (>1000 addends) would cause stack overflows, such as the one found in java.text.Normalizer in one of the 1.4.* releases.

Consider a subtree of 1000 ADD operations. Parsing generates a tree with depth ~1000. One possible solution is to balance (or approximately) balance the generated tree. This approarch is problematic because then the resolved type of ADD operators in one subtree depends on the resolved type of ADD operators in a different subtree which goes against the vein of the rule of this being strictly a tree not a DAG.

Therefore, the solution we've come up with is to have a single ADD Tree with 1000 addends. For consistency, all binary infix operators that are both associative and commutative are treated this way. These include:

  + * && & || | ^

Though this causes a problem when refactoring of "b + c" in the expression "a + b + c + d", the same problem is present in a balanced tree.

Miscellaneous

Code conventions

We ask that all source text should respect the 79-character line width so that lines do not wrap when displayed on an 80-character wide terminal. All Tree subtypes shall be suffixed with "T", short for Tree.
All ExpressionT subtypes shall be suffixed with "ExpressionT".
All StatementT subtypes shall be suffixed with "StatementT" or "ClauseT".
With these conventions, a client can differentiate between different Tree subhierarchies at a glance.

Collections and Lists

The following rules apply wherever Collections and Lists are returned by Tree subclasses.

The Collections and Lists are themselves mutable if and only if the Tree instance(s) they are based on is/are mutable. For example, consider non-null Tree elements C and P. The following two calls are equivalent: C.addSelf(P) and P.getChildren().add(C). Iterators are likewise mutable if the owning Collections or Lists are themselves mutable.

In addition, all methods in the java.util.Collections class should perform correctly. For example, a client should be able to call Collections.sort(file.getImports(), comparator) and expected the file's list of imports to become sorted according to the input comparator.

The same Collection/List instance does not need to be returned each time. That is, for FileT instance f, f.getImports() == f.getImports() does not necessarily have to be true.

Note: Should a guarantee of Collection/List equality be made? Consider file1.getImports().equals(file2.getImports()) for different FileT instances, file1 and file2. If file1 and file2 have different import names, then equals(O) should clearly return false. But, if they have equivalent import names, should equals(O) return true? How about f.getImports().equals(f.getImports())?

Dummy (synthetic) instances

Certain constructs must always be present in the Tree, even if there is no matching source representations. Consider, for example, the interfaces declaration of a type declaration. It is permissible to omit the interfaces declaration (implicitly "java.lang.Object"). ClassT.getInterfacesClause() must always return a non-null InterfacesT object. If no interfaces are declared in the source file, then a synthetic one should be generated and returned by ClassT.getInterfacesClause(). This synthetic InterfacesT is mutable if and only if the owning ClassT is mutable. Adding a TypeReferenceT to this InterfacesT element should automatically "promote" the InterfacesT element to be a real element instead of just a synthetic one.

No guarantee of equality is made of these synthetic elements. If there is no interfaces declaration and a client calls ClassT.getInterfacesClause() twice (receiving instances A and B), then A.equals(B) does not necessarily have to be true. However, A.getParent().equals(B.getParent()) must be true.

Note: Consider parent P with a synthetic child C. Clearly, C.getParent().equals(P) must be true. However, should C be considered to be a part of P's children list? That is, should P.getChildren().contains(C) be true? package javax.ide.model.java.source.tree;