The Java AST.
General Design
This section identifies the assumptions and design choices underlying
the entire API.
Tree decorations: Properties
The Tree framework provides a facility for decorating nodes that should
intentionally remind callers of properties in java.lang.System
:
getProperty(S)
, setProperty(SO)
,
clearProperty(S)
. All Tree subtypes allow the caller to
store an arbitrary Object value with a specific Tree instance under a
String key.
Tree, not a DAG
The source model must strictly be a tree. It may not be a dag
(directed-acyclic graph).
The consequence is that no element may have a "parent" relationship to
more than one element. Also, element A may not have a "sibling"
relationship with element B unless A and B share the same parent.
Consider element C and element P. If
P.getChildren().contains(C)
, then
C.getParent().equals(P)
must be true.
Furthermore, sibling order is defined by source file location. If
sibling C occurs before D in the source representation, then C should
occur before D in the sibling list.
No outside side effects
Here, an "outside side effect" means an operation upon another file's
contents. The typical example is compilation. Resolving the type of a
type declaration's superclass almost always requires loading other classes,
usually java.lang.Object. Therefore, the Tree API may not have a method
that resolves a type declaration's superclass.
The main reason for this choice is to avoid having Tree method calls block
on file I/O.
Cloning trees
A Tree object may not be moved from one FileT to another. That is, a
Tree object is tied to a FileT object for its entire lifetime. We do
this to prevent the undo problems that arise when a Tree object is
unlinked from one FileT and linked to another and then the unlink
operation is undone.
In place of a traditional move operation, we provide a cloning
facility for Tree objects. A client calls
cloneSelf(FileT)
on a Tree object to clone its entire
subtree into the provided FileT. The cloned subtree is unattached,
i.e. has no parent, and the client must then add it to the tree in the
desired location.
Write operations
The FileT presents a parsed view into the contents of a source file.
Changing the FileT causes the contents of underlying source file to be
changed. For example, if a caller adds an ImportT to a FileT's List of
import declarations, then, on commit, an import declaration is added
to the underlying source file.
Mutating a FileT that is not writable results in an
IllegalStateException. A FileT is marked as read-only or writable by
its owning TreeManager. Please see
javax.ide.model.java.source.write.TreeManager
for more detail.
Final Tree subtypes
Here, a "final Tree subtype" refers to a Tree interface with no further
public subinterfaces. To illustrate, let's consider method and
constructor declarations. According to this design choice, there are two
solutions.
- 1. Have one type, MethodT, that represents both method
declarations and constructor declarations. The idea comes from class
files where constructors are treated just like methods. In fact, it is
desirable to have a type that represents both method and constructor
declarations since there are a number of client operations
(particularly refactoring) that execute the same code for both method
and constructor declarations.
- 2. Have two types, ConstructorT and MethodT, which represent
disjoint sets. All constructor declarations will be represented by
ConstructorT and all method declarations that are not constructor
declarations will be represented by MethodT. This is how reflection
does things. The major downside to this is that there is no clear
name we can give to the common supertype of ConstructorT and
MethodT. As noted above, it is desirable to have a common supertype
for methods and constructor declarations.
If we were to have the two types where ConstructorT extends MethodT, then
this would not meet the design choice because then method declarations that
are not constructor declarations would not be an instance of a "final Tree
subtype".
The benefit is that clients are able to dispatch according to the run-time
(and sometimes the compile-time) type of a Tree object via method
overloading and instanceof checks.
The downside is two-fold. One, this choice generates that many more
types in the API, thus producing clutter. Two, an implementation is
required to have a separate implementation for each Tree subtype (in
order for instanceof checks to work correctly).
Minimal multiple inheritance
There should be no multiple inheritance except for the following:
HasBlockT, HasModifiersT, HasNameT, MemberT, BlockElementT, and
VariableT. This produces a very rigid tree structure.
The benefit is that the tree structure becomes highly predictable,
predictable enough that the client and implementation gets that extra
type checking from the compiler. For example, a ListExpressionT has
only ExpressionTs as its children (as do most other ExpressionT
subtypes).
One downside is that we get classes like AnnotationExpressionT. An
AnnotationExpressionT represents both an AnnotationT and an
ExpressionT but is in a position where an ExpressionT is expected (it
is an child of a ListExpressionT). The "Final Tree subtype" rule
dictates that we can't have a generic Tree instance that is both an
AnnotationT and an ExpressionT. The other notable examples are:
BlockStatementT and TypeExpressionT.
Another downside is that this rule introduces more nodes than is
strictly dictated by the grammar thereby increasing the memory
footprint of a Tree representation.
Specific Design
This section identifies the assumptions and design choices underlying
specific classes in this API.
TypeArgumentT
Consider the following type reference:
p.OuterType.InnerType
A TypeReferenceT represents the entire type reference. It has a
TypeReferenceT for the qualifying type "p.OuterType", NameT for
"InnerType", and a TypeArgumentT for "T2". We cannot use TypeArgumentT
for "T2" because then there would be two TypeReferenceT children and
the above type reference is indistinguishable from:
InnerType, T2>
Hence, we have more general class TypeArgumentT instead of
WildcardTypeT.
InfixExpressionT
Some InfixExpressionT can have multiple ExpressionT operands even
though the operator itself is a binary infix operator. In particular,
because most traversals are written recursively, long String
concatenations (>1000 addends) would cause stack overflows, such as
the one found in java.text.Normalizer
in one of the 1.4.*
releases.
Consider a subtree of 1000 ADD operations. Parsing generates a tree
with depth ~1000. One possible solution is to balance (or
approximately) balance the generated tree. This approarch is
problematic because then the resolved type of ADD operators in one
subtree depends on the resolved type of ADD operators in a different
subtree which goes against the vein of the rule of this being strictly
a tree not a DAG.
Therefore, the solution we've come up with is to have a single ADD
Tree with 1000 addends. For consistency, all binary infix operators
that are both associative and commutative are treated this way. These
include:
+ * && & || | ^
Though this causes a problem when refactoring of "b + c" in the
expression "a + b + c + d", the same problem is present in a balanced
tree.
Miscellaneous
Code conventions
We ask that all source text should respect the 79-character line width so
that lines do not wrap when displayed on an 80-character wide terminal.
All Tree subtypes shall be suffixed with "T", short for Tree.
All ExpressionT subtypes shall be suffixed with "ExpressionT".
All StatementT subtypes shall be suffixed with "StatementT" or
"ClauseT".
With these conventions, a client can differentiate between different
Tree subhierarchies at a glance.
Collections and Lists
The following rules apply wherever Collections and Lists are returned by
Tree subclasses.
The Collections and Lists are themselves mutable if and only if the
Tree instance(s) they are based on is/are mutable. For example, consider
non-null Tree elements C and P. The following two calls are equivalent:
C.addSelf(P)
and P.getChildren().add(C)
.
Iterators are likewise mutable if the owning Collections or Lists are
themselves mutable.
In addition, all methods in the java.util.Collections class should perform
correctly. For example, a client should be able to call
Collections.sort(file.getImports(), comparator)
and expected
the file's list of imports to become sorted according to the input
comparator.
The same Collection/List instance does not need to be returned each time.
That is, for FileT instance f, f.getImports() == f.getImports() does not
necessarily have to be true.
Note: Should a guarantee of Collection/List equality be made? Consider
file1.getImports().equals(file2.getImports())
for different
FileT instances, file1 and file2. If file1 and file2 have different import
names, then equals(O) should clearly return false. But, if they have
equivalent import names, should equals(O) return true? How about
f.getImports().equals(f.getImports())
?
Dummy (synthetic) instances
Certain constructs must always be present in the Tree, even if there is
no matching source representations. Consider, for example, the interfaces
declaration of a type declaration. It is permissible to omit the interfaces
declaration (implicitly "java.lang.Object").
ClassT.getInterfacesClause()
must always return a non-null
InterfacesT object. If no interfaces are declared in the source file, then
a synthetic one should be generated and returned by
ClassT.getInterfacesClause()
. This synthetic InterfacesT is
mutable if and only if the owning ClassT is mutable. Adding a TypeReferenceT
to this InterfacesT element should automatically "promote" the InterfacesT
element to be a real element instead of just a synthetic one.
No guarantee of equality is made of these synthetic elements. If there is
no interfaces declaration and a client calls
ClassT.getInterfacesClause()
twice (receiving instances A and B),
then A.equals(B)
does not necessarily have to be true. However,
A.getParent().equals(B.getParent())
must be true.
Note: Consider parent P with a synthetic child C. Clearly,
C.getParent().equals(P)
must be true. However, should
C be considered to be a part of P's children list? That is, should
P.getChildren().contains(C)
be true?
package javax.ide.model.java.source.tree;