Discussing the design, decisions and discovery involved in developing Mutability Detector, an open-source analysis tool for Java.

Friday 30 September 2011

A better way of categorising immutability?

Currently Mutability Detector categorises a class' immutability as one of the following:


public static enum IsImmutable {
COULD_NOT_ANALYSE,
DEFINITELY,
MAYBE,
DEFINITELY_NOT;
}

For example:

 java.util.Date is DEFINITELY_NOT immutable.    Reasons:
        (... some reasons ommitted for brevity ...)
        Field [fastTime] can be reassigned within method [setTime]

Putting aside the general error case of COULD_NOT_ANALYSE, the three main categories are kind of... unsatisfying.

For example, if your class has a primitive array field, it will be classed as MAYBE immutable. In this case it's due to the tool being naive and not doing the kind of analysis it could, and there were so many false positives being generated. At the time I thought it was a pragmatic (read "weasel") way to have my cake and eat it too - I got to reduce false positives, without the effort of making the analysis more powerful. The problem is that it's not particularly helpful. When Joe Hypothetical runs the analysis, what is he supposed to concur from seeing that his class is "maybe immutable"?

Another example is the DEFINITELY category. In the discussion following a previous blog post it was pointed out that where I had used a class I considered DEFINITELY immutable, a commenter pointed out that under certain threading conditions, it could be seen to mutate. These categories were not being used to communicate certain, specific, well defined, and ultimately useful information.

So I've been thinking about having different categories, that will hopefully be more useful. I've borrowed a couple, which, because the book is so darn good, are based on the ideas and semantics from Java Concurrency in Practice.

  • IMMUTABLE 
  • EFFECTIVELY_IMMUTABLE
  • NOT_IMMUTABLE
  • COULD_NOT_ANALYSE
Immutable is the strictest category of immutability. All fields are final and the class is final. All fields are immutable (a 'turtles all the way down' kind of arrangement). Instances of this class can be published in any way, under any threading conditions, as the Java Memory Model guarantees that writes to the final field Happens-Before any reads from that field. Implicitly thread safe. Issues include: that old favourite - lazily loading fields that are expensive to compute. Detecting benign data races is Hard. I may have to hard code some common cases, e.g. java.lang.String/java.math.BigDecimal.

Instances of effectively immutable classes can be safely shared across threads, as long as they are safely published (JCIP covers what 'safely published' means in more detail). The class doesn't need to be final. Fields don't need to be final, but they can only be, and must be, assigned in the constructor or private methods called only by the constructor. Fields should all be immutable, effectively immutable, or mutable-but-never-mutated, meaning for example, a field can be of type java.util.List, or an array, but as long as it isn't mutated after construction, it's fine. This includes allowing references to mutable instances to escape, e.g. returning the array field from a method call without copying first. Issues include: the 'method called only from constructor' clause allows for serialisation... but should it? Also, confidently identifying fields which are mutable but never mutated is non-trivial.

Not immutable is everything else: interfaces and abstract classes, classes whose fields can be reassigned after construction, whose mutable fields are published, which assign a mutable type (e.g. List or an array) to a field. The analysis will still need to get better at detecting valid and common patterns, but the more esoteric the code gets, the more likely the analysis is just going to throw it's hands up in the air in desperation and admit mutability.

This scheme, would represent a slight change in direction for the analysis. I can remember when the code was thrown together as part of a uni assignment, that the requirements for being immutable were very strict, and since then I've tried to improve the analysis to allow for more leniency. Now I'm starting to think that strictness may be more of a strength - particularly since it would be possible to manually 'override' the result using a flexible and fluent API available in unit test assertions.

So not a particularly ground breaking or earth shattering change suggestion, but hopefully something that could potentially be more useful, communicative and more broadly understood.  As usual comments and suggestions welcome, thanks for reading.