Discussing the design, decisions and discovery involved in developing Mutability Detector, an open-source analysis tool for Java.

Sunday 3 October 2010

Version 0.02 of Mutability Detector now available.

... because nothing says "Progress!" like the increment by one-hundredth of a major release.

Sarcasm aside, in this new release, I definitely see a further step towards "Actually useful". I'll use this blog entry to demonstrate some of the improvements, discuss the planned features for the next release, and list what's still keeping that v1.0 release out of reach.

Unit Testing Support
I consider one of the major barriers to the adoption of a static analysis tool is the 'bootstrapping' cost: the time and resource required to configure and run the analysis as part of the build. There's also the issue of adapting the results to suit your codebase - trimming the false positives, and keeping the results to a meaningful and manageable level. With this problem in mind, I decided to stand on the shoulders of giants, to reuse, rather than reinvent.

JUnit, and its ilk (TestNG being the most prominent rival) have long been a weapon in the arsenal of software development. Without attempting to be controversial, any Java development shop worth its salt will have unit tests regularly run as part of a continuous integration build. The beauty of JUnit is that while it is suitable for large teams with large projects, it also works for a single developer with a small project (about a third of the Java source files in Mutability Detector are JUnit test classes, run regularly through an IDE).

With widespread adoption and an ecosystem of knowledge and tools available, it seems a sensible option to hook into that, rather than trying to roll my own. The v0.02 release of Mutability Detector enables this.

By simply adding the jar to your classpath, it is possible to write this kind of test case:

import static org.mutabilitydetector.junit.MutabilityAssert.assertImmutable;

@Test
 public void theClassIsImmutable() {
    assertImmutable(MyNewClass.class);
}

Which will fail if, for example, someone later comes along and adds a setter method.

That's all it takes. And there's several advantages that come with this approach:

  • There's no lengthy configuration: just add a jar to your development classpath, there's certainly no need to add a runtime dependency on Mutability Detector, unless you want to (there is an API for doing runtime checks).
  • There's no need to annotate your code: meaning nothing except test code has to be cleaned up if you decide to stop using the jar; no dependency on a third-party annotation; and no nasty "magic" comments to enable or disable the check. Developers who employ TDD will be familiar with the idea that test code is the documentation, there's no need for redundant source annotations. 
  • There's also no separate build phase: since the checks are made as part of running the unit tests, if you already do that, there's nothing more to configure. But, if you don't regularly run unit tests don't worry, the option to run the tool from the command line is still there, and always will be.
  • It's extremely selective: you won't have to prune all the results to get the ones for classes you actually care about, the analysis will only be run on classes that you specify as intending to be immutable. This means no noisy false positives.

However, like anything else, there are downsides and caveats, and I'd be remiss for not mentioning them.

  • The assertion mechanism for checking immutability assumes a JUnit-like approach. That is, failed checks will throw a (subclass of) AssertionError on the thread the test is run on. If your unit testing library of choice does not use this kind of mechanism, the tool may not be suitable for you.
  • The flipside of being selective is that it may require a lot of busywork to go around your codebase adding in single-line test methods which only call assertImmutable(). I intend to address this problem (somehow) but support for lessening this burden is not here yet.
  • The unit test task may require more memory than previously. I chose the option of speed over memory footprint, using a single analysis session across all unit tests run in the current JVM process. This prevents the need to re-analyse classes several times, but it does mean that more memory is used to cache the results. Unfortunately, this is not something that I have been able to benchmark in a real-world scenario, but Mutability Detector can analyse the entire rt.jar of Sun's 1.6.12 JDK (~17,000 classes) with less than 96Mb of memory, which gives a decent indication of the memory footprint.

The support for unit testing immutability is the most significant addition to Mutability Detector, and, in my humble opinion, represents a useful and novel way of introducing static analysis into development workflow.


Other Changes With This Release
Along with this change, there are several other improvements:
  • Better detection of setter methods, covering some of the false results described in an earlier post.
  • Support for reporting only on specified classes with a --classlist option. Run java -jar MutabilityDetector.jar -help for further detail.
  • The --match option for the command line interface is now implemented.
  • Improvements to the way classes are loaded, to reduce the amount of PermGen space required.


But, before I get ahead of myself, I must point out, there is a very good reason why this release is not close to a v1.0: java.lang.String is still reported as mutable.

What's Next?
Obviously the analysis of java.lang.String is a problem: since first releasing the source code I have been researching and considering ways to correctly handle the case. Since this is the major stumbling block to a v1.0, and I want to get it right, the work on this is ongoing. Ideas and thoughts are most welcome.

Besides that, there are another couple of items of work which I will be working on in the near future (as time and other commitments permit, of course).

  • Implement the EscapedThisReferenceChecker described in an earlier post.
  • Expose an API within the MutabilityAssert methods to allow more fine grained control to override potential false positives. For example, if your class does have one method which lazy inits a field, and the tool isn't at the level of sophistication required to detect it, there is the mechanism to say "In regards to reassigning this particular field, in this method, don't worry, I know what I'm doing". It's inevitable that there's always going to be code which can fool the tool, this should help ease that pain a little.
  • Investigate the following features: an Eclipse plugin; a FindBugs plugin; a nicer report from the command line tool (perhaps a nice HTML page with the capability of filtering and searching the results); and an ant task to wrap the command line interface. Some of these features may be worthwhile, some may not, so it would be prudent to consider them.
  • Improve the tool to be much smarter about detecting mutations. For example, having a mutable field makes the owning class mutable, whether that field is mutated or not. It should be possible for an immutable class to have mutable fields - as long as they are not actually mutated.
  • The usual bug fixes and incremental improvements.

Hopefully I've demonstrated the main improvements over the last version, and possibly convinced you to try the tool out. As always, if there are any comments, questions or flames, feel free to post to the blog, or email me.

So that's v0.02: roll on v0.03!

Sunday 5 September 2010

Detecting Setter Methods Ain't So Easy After All

Who could have thought that detecting if a class has setter methods could be so complicated?

Up until a few days ago, the SetterMethodChecker was the most simple checker of all. The meat of the checker could be summed up with the following code:

   if("<init>".equals(this.methodName)) return; // We're not concerned with the constructor 
            
    if(opcode == Opcodes.PUTFIELD) {
         // declare the class as mutable, as a field reference can be changed
    }

This simple check was effective for the most part. It correctly detected the mutability of classes such as this:

     class MutableWithSetterMethod {
          private int foo;

          // constructor, and getter, etc

          public void setFoo(int newFoo) { this.foo = foo; }
     }

This is fine for the most part, but if classes stray from this basic pattern, even just a little, the checker can be fooled into thinking an immutable class... isn't. In my quest for realistic examples of immutable classes, I came across a fairly common example: java.math.BigDecimal from Sun's JDK[1]. The SetterMethodChecker raises the following issues:

     Field [precision] can be reassigned within method [subtract]
     Field [scale] can be reassigned within method [divide]
     Field [scale] can be reassigned within method [divide]
     Field [precision] can be reassigned within method [negate]
     Field [precision] can be reassigned within method [precision]
     Field [precision] can be reassigned within method [setScale]
     Field [precision] can be reassigned within method [scaleByPowerOfTen]
     Field [stringCache] can be reassigned within method [toString]
     Field [intVal] can be reassigned within method [inflate]
     Field [intCompact] can be reassigned within method [readObject]
     Field [intVal] can be reassigned within method [stripZerosToMatchScale]
     Field [scale] can be reassigned within method [stripZerosToMatchScale]
     Field [precision] can be reassigned within method [stripZerosToMatchScale]
     Field [intCompact] can be reassigned within method [stripZerosToMatchScale]
     Field [intVal] can be reassigned within method [roundThis]
     Field [intCompact] can be reassigned within method [roundThis]
     Field [scale] can be reassigned within method [roundThis]
     Field [precision] can be reassigned within method [roundThis]
     Field [scale] can be reassigned within method [dropDigits]

Nineteen examples of how BigDecimal is mutable, apparently. Not really of course, BigDecimal is immutable, but the code is complicated enough to fool the checker. There are three patterns of field reassignment which a) retain immutability b) are common use cases and c) require more sophistication to detect. I hope to fix the SetterMethodChecker to recognise these patterns and handle them accordingly.

Valid Setter Method Patterns


Reassigning fields in private methods called only from the constructor.
As detailed above, the method 'roundThis' reassigns four fields of BigDecimal. However, the method is private, and only ever called from a constructor - in 7 of the 18 available BigDecimal constructors. This pattern was also discovered in the JodaTime libraries, which use a private setFields() method for serialisation. If the mutating method is guaranteed[2] to only be called before the constructor completes, the class can be safely called immutable.

Solution? A setter method makes the class mutable unless it is private, and called only from the constructor.

Reassigning fields on references to an object other than 'this'.
A common idiom for immutable types is, rather than having setter methods mutating internal state, is to construct new instances, modified from the original as required, and returned from the method. A common example is the likes of String.toUpperCase(), which, rather than mutating the char array of this String, copies it, converting the chars to upper case, and returns a new String instance. The toUpperCase() is not the most representative example, as it misses a common case - where an instance is created, and fields are reassigned after construction[3]. Consider the following code (an immutable equivalent of the earlier snippet):

     public class ImmutableWithNoSetterMethod {
          private int foo;
          public ImmutableWithNoSetterMethod(int foo) {    this.foo = foo; }
            
          public int foo() { return this.foo; }
            
          public static ImmutableWithNoSetterMethod newAndMutatedInstance() {
               ImmutableWithNoSetterMethod toReturn = new ImmutableWithNoSetterMethod(0);
               toReturn.foo = 42; // mutating an instance after construction
               return toReturn;
          }
     }

Here the newly created instance is mutated after construction - that's hardly a model of immutability! However, it's safe in this example, and in the case of BigDecimal. So how is this safe? Well, the instance is modified with private access, and it's done before the reference is published. So perhaps this case falls in the grey area between immutability and thread-safety. I would argue that, since non-final fields don't qualify for the Java Memory Model's view of immutability anyway, that this kind of pattern qualifies for any useful definition of immutability.

In the code above, despite the mutation of the already constructed object, there is no way for a client of the code to view a change in the object. Multithreaded code is not an issue for the newAndMutatedInstance() method, since nothing is published to multiple threads, and concurrent calls to the method will each construct their own stack, with no shared references[4]. Thus there is no danger in mutating the local variable toReturn. However, the SetterMethodChecker is fooled. Partly because the check is very basic: it looks for the PUTFIELD opcode, but doesn't investigate which object is having its field reassigned. Thus the code this.foo = 42 and toReturn.foo = 42 are treated equally.

Solution? Check that the field reassignment is performed on the 'this' object, either directly or through something reached by the this reference's object graph (i.e. `this.foo.bar.value = 42;` is still a mutation). This checker also has to collaborate with the (as yet unwritten) EscapedThisReferenceChecker to ensure that the reference isn't published to multiple threads before it's mutated. So if the ImmutableWithNoSetterMethod constructor or the newAndMutatedInstance() method is modified to call a method which shares the the reference before the mutation, the class will be dubbed mutable.

Reassignment of fields as a lazy-initialisation and caching strategy.
This is the main reason[5] that java.lang.String is found to be mutable. The result of hashCode() is computed lazily, and cached. But to the SetterMethodChecker, it just makes the class look mutable. The same goes for BigDecimal: the result of toString() is only computed when called the first time. The code for this pattern in BigDecimal is almost as simple an example as can be made:

     class BigDecimal // etc {
          private volatile transient String stringCache = null;
        
          // etc
        
          public String toString() {
               if (stringCache == null)
                    stringCache = layoutChars(true);
               return stringCache;
          }
     }

This example can be generalised, in psuedo-code, to the following idiom:

     class {
          private X field; // initialised to default value 
                                   // whatever that is for type X
        
          getField(){
               if(field == default value) {
                    compute value of field
               }                return field;           }      }

This kind of pattern satisfies immutability - the object can not seen to change by its clients. String's hashCode() and BigDecimal's toString() will always return the same result, even though the object was not fully formed on completion of the constructor. So what appears to be downright mutability... isn't. The SetterMethodChecker is fooled once again. I'm almost feeling sorry for it.

Solution? The scope surrounding a field reassignment should be investigated to find if there is a "if equal to default value" condition guarding the change. This isn't as satisfying as it should be, since fields could technically be initialised to an "uncalculated" marker (e.g. private String myField = "UN-INITIALISED";). Though I predict the complexity of detecting this pattern doesn't lie with what is being compared to, I'm happy planning the trade-off that fields must be initialised to the default value to qualify for the lazy initialisation exemption.

A Note On Lazy Initialisation and Data Races

If you've been reading about lazy initialisation (hello to someone other than me!) you may have been thinking "Well, String's hashCode() and BigDecimal's toString() aren't synchronised, surely there's a potential data race here?". If you have, you are right. Technically, myBigDecimal.toString() could be called across concurrently executing threads, and the second caller could execute the comparison while the first is computing, but has not yet assigned a result to the stringCache field (shown below in the diagram).







This could result in stringCache being assigned to multiple times. However, it is safe (I think). This example can be described as a "benign data race": yes the field can be reassigned, but only to the same value, each and every time. String's hashCode() is calculated with only immutable state, it doesn't matter that there is a data race to assign the field, the result will always be the same. Same goes for BigDecimal's toString(). Now, I say I only "think" this is safe, because there's a whole lotta stuff in the Java Memory Model (which I have investigated, but hasn't "sunk in" yet) that can throw spanners in the works. Because there is no synchronisation of either method, there is no guarantees of happens-before in terms of the assignment. This can mean that while the field is being assigned, other threads could view the field with stale data (i.e. a reference is still null) or even worse, the reference points to an object which isn't in a fully constructed state (i.e. fields are not as they will be when the constructor completes). As far as I can tell, BigDecimal and String are protected from this because the field assigned in the benign data race is immutable, and the Java Memory Model has some guarantees around visibility in that area. Or, in the case of BigDecimal, because the field has the volatile modifier - I'm not sure yet. Needless to say I can't commit to a solution for this problem until I can be confident of the safety.

So, we have three patterns which I think are common enough to need to be handled correctly before I can release a v1.0 of Mutability Detector. They've been added to the list. As I said, who could have thought a check for a setter method could be so complicated?


[1] Sun JDK 1.6 Update 12
[2]
Ignoring reflection as a possibility for calling methods.
[3]
toUpperCase() uses a String constructor which allows setting all the necessary changes in one call.
[4] This is what's known as "stack confinement", discussed in the excellent Java Concurrency In Practice.
[5] There is another reason, but it's caused by the primitiveness of the tool, rather than what I'd call a complicated problem.

Tuesday 13 July 2010

EscapedThisReferenceChecker - A Sketch Of A Checker To Come

In a previous entry I discussed some of the rules for immutability, and how they are enforced in Mutability Detector through a set of "checkers". So, for example, where a method can change the reference of an object's field (i.e. a setter method) there is a checker (named SetterMethodChecker) which can detect this rule and render a class definitely not immutable.

This blog entry is aimed at discussing another checker which doesn't exist, but should, before a 1.0 release can be considered.

EscapedThisReferenceChecker

This can be a bit tricky to explain, the first hit for "immutability escape constructor" explains it pretty well. Basically, if you're constructing an object, and someone can get hold of a reference to it before the constructor has completed, there are no guarantees about what that part-constructed object which actually consist of. A code example will help. We'll start with an immutable class, and to keep a theme going I'll call the class IAmImmutable. For those unfamiliar with java.lang.String, it's necessary to know that it is immutable too.


public final class IAmImmutable {
  public final String aField;

  public IAmImmutable(String aField) {
    this.aField = aField;
  }
}


A good example of an immutable class. To show how the reference escapes, we'll show a near identical class, MyThisReferenceEscapes.

public final class MyThisReferenceEscapes {
  public final String aField;

  public MyThisReferenceEscapes(String aField) {
    this.aField = aField;
    ReferencePublishingService.publish(this);
  }
}

In this example, the "this" reference escapes before the constructor is complete, as it is passed to a method in another class. Because of the semantics of the Java memory model, within the publish() method, the reference to the MyThisReferenceEscapes instance could have it's field set, or it might not, there's no guarantees of behaviour. Even if the "this" reference escapes in the last line, after all the fields have been assigned, it doesn't matter, it's still escaped in an incomplete state. Since the ReferencePublishingService could do just about anything with that reference, it could lead to the instance of MyThisReferenceEscapes changing across threads, at different times. With this vulnerability, there is no guarantee that the MyThisReferenceEscapes class is immutable.

I envisage that this kind of checker should be relatively straight forward - once the rules are decided. The "this" can't be passed as a parameter, but only to other classes? Static methods in the same class? Instance methods? It also can't be assigned to fields in another class, but again, how about as a static field in the same class? How about as an instance field?
I'm not entirely sure about these things, so I'll have a think about it, then pap the thought off to my subconscious, and see if it comes up with any bright ideas.

Monday 5 July 2010

Immutability and subclassing: are they mutually exclusive?

In an earlier post I introduced the Mutability Detector, and how Josh Bloch's rules were used in its development. One point I mentioned was that Josh's rules are a bit on the strict side. I'm using this blog entry to clarify what I meant by that, and get some of the ol' wheels turning on the question: are immutability and subclassing mutually exclusive?

One of the immutability rules from Effective Java is:
2. Ensure that the class can’t be extended.


I'd like to make the case that yes, while preventing your class being extended is not going harm the immutability status it isn't always necessary. There's also the issue to consider that declaring your class final may have benefits, somewhere, at some point, in the hazy world of the Java memory model.

I'll start by explaining when it is necessary.

For example, given the following class:

public class IAmImmutable {
private String label;
public IAmImmutable(String label) {
this.label = label;
}
public String getLabel() {
return this.label;
}
}


I'm pretty sure everyone would consider this immutable - an instance of IAmImmutable cannot change after construction. However, it fails one of Josh's rules: the class can be extended. So you could have a situation like:

public class ImInUrClassRuiningUrImmutability extends IAmImmutable {
... // similar constructor
public void setLabel(String label) {
this.label = label;
}
}


So you could have a class which depends on IAmImmutable. Perhaps you're composing your big immutable objects out of smaller immutable ones. Like this:

public class ComposedOfImmutables {
public final IAmImmutable firstField;
public final SecondImmutableClass secondField;

public ComposedOfImmutables(IAmImmutable first, SecondImmutableClass second) {
firstField = first;
secondField = second;
}
}


All is well: ComposedOfImmutables, because it's being built on the solid foundation of immutable instances, is easier to understand, debug, reason about, and almost trivial to use in multithreaded applications. That is, however, until IAmImmutable is subclassed, and the menace that is the immutability-ruining LOLclass is free to pollute the otherwise immutable world around it. An instance of ImInUrClassRuiningUrImmutability, because it extends IAmImmutable, can be passed as a parameter to ComposedOfImmutables, corrupted it with mutability. Like so:

ImInUrClassRuiningUrImmutability mutableThing = new ImInUrClassRuiningUrImmutability("lol");
ComposedOfImmutables supposedlyImmutable = new ComposedOfImmutables(mutableThing, ...);

publishImmutableThingToManyThreads(supposedlyImmutable);

supposedlyImmutable.firstField.setLabel("hr i goez - ruinin ur immutability");


This is the case where it's important that IAmImmutable cannot be extended. It's a consumer thing. The existence of a mutable subclass does not mean IAmImmutable instantly becomes mutable. You can still new them up and pass them around multiple threads with joyful abandon. But... ComposedOfImmutables depends on the instance it receives in the constructor being immutable. The onus of immutability in this case does, and should lie, with ComposedOfImmutables.

You may be asking, what's the problem? IAmImmutable obviously wants to be immutable. It lives to be used immutably (is that a word?), why not just prompt the developer to prevent subclassing?. Well, it becomes an issue for static code checking tools, like Mutability Detector. In the past year I spent my honours year project steeped in the stew of static bug checkers, with their incomplete analysis, unsoundness, and swathes of false positives. My research in that area has convinced me that any static analysis tool has to be very, very careful of reporting errors that get in the users way. If it's not reporting real problems they want to fix, it's wasting their time.

[Edit: looking back after the comments made by Seb Charrot, this conclusion was really misleading. Hopefully this revision makes more sense.]

That's why it's my opinion that Josh's rule is too strict. And it's the reason that Mutability Detector will not emit errors that IAmImmutable is in fact mutable, it will instead emit warnings. Instead of an "omg teh buildz broken!", it says "You know, there's a danger here. Gather round to hear why...". However, Mutability Detector has, and will, tell you that ComposedOfImmutables is definitely not immutable. If guarantees can't be made about the runtime instances passed to a class, the foundations it is built on are too weak to support it, and immutability cannot be claimed.

So while most of the time immutability and subclassing are incompatible, hopefully this post has demonstrated that they are not always mutually exclusive.

Sunday 4 July 2010

An Introduction To Mutability Detector (Part 1 of n)

(n still to be decided)

Mutability Detector is a tool which analyses Java code, and tells you if instances of a given class are immutable or mutable. Having been a google code project for around six months, some recent discussions (one on the issues page, and another in the mailing list for Project Lombok, in an unrelated discussion) have brought a little bit of attention to the project. I thought it would be useful to give an overview of the usage of Mutability Detector, and how it can be used to recognise mutability in your classes.

Mutability Detector works by applying a set of checkers to each class. Each checker looks for a particular way of rendering a class as mutable, and the results from each checker are combined to give the overall "mutability result" of the class. The checkers closely relate to a set of rules for achieving immutability. These began as the rules defined by Josh Bloch in the excellent Effective Java, 2nd Edition. Josh lists five rules (in Item 15), these are:

1. Don’t provide any methods that modify the object’s state (known as mutators).
2. Ensure that the class can’t be extended.
3. Make all fields final.
4. Make all fields private.
5. Ensure exclusive access to any mutable components.

These formed a good basis for an everyday tool for ensuring your classes are immutable. But I found them to be a bit on the strict side (more on that in a later blog entry). When first writing the tool, those rules were combined with the following:

1. It should not be possible to reassign any field. (this is almost implicit in Josh's rules 1, 3 & 4)
2. An immutable object should consist of immutable fields.
This implies three sub-rules. If a field is not constructed directly (i.e. passed as a parameter, or as the result of a method call):
i. It must not be possible to subclass the type.
ii. It should be declared as a concrete type.
iii. If the previous two cannot be achieved, make a copy of in the constructor and assign that to the field.

Some of these rules are pretty opaque for understanding what the tool actually detects (and in the past few days, I've found them to be a bit lacking). What might help understand Mutability Detector is some of the patterns of code which are detected and reported. Listed here is each checker, some example code and the results of applying the checker to that code.


FinalClassChecker
public class MutableByNotBeingFinalClass {

}
MutableByNotBeingFinalClass is MAYBE immutable
Is not declared final, and thus may be immutable.

PublishedNonFinalFieldChecker

public class MutableByHavingPublicNonFinalField {
public String name;

public MutableByHavingPublicNonFinalField(String name) {
this.name = name;
}
}
MutableByHavingPublicNonFinalField is DEFINITELY_NOT immutable
Field [name] is visible outwith this class, and is not declared final.

SetterMethodChecker

public final class MutableByHavingSetterMethod {
private String name;
public void setName(String name) {
this.name = name;
}
}
MutableByHavingSetterMethod is DEFINITELY_NOT immutable
Field [name] can be reassigned within method [setName]
AbstractTypeToFieldChecker
public final class MutableByAssigningAbstractTypeToField {
private AbstractStringContainer nameContainer;

public MutableByAssigningAbstractTypeToField(AbstractStringContainer abstractNameContainer) {
nameContainer = abstractNameContainer;
}

abstract class AbstractStringContainer {
protected final String name = "my name";
}
final class StringContainer extends AbstractStringContainer {
public String someMutableField;
}
}
MutableByAssigningAbstractTypeToField is DEFINITELY_NOT immutable
Field [nameContainer] can have an abstract type (AbstractStringContainer) assigned to it.


(Here the problem is, that while AbstractStringContainer is immutable, StringContainer is not, and either could be passed as a parameter to the constructor. Thus we can never be sure that the field we're assigning to references an immutable object or not. Though this could be relaxed to render types like these as MAYBE immutable.)

InherentTypeMutabilityChecker

This checker is used more to help decide the immutability status for client classes. For example, if a class has a field which is inherently mutable, this affects the containing class.

public class MutableByHavingArrayTypeAsField {
private final String names[];
}
MutableByHavingArrayTypeAsField is DEFINITELY_NOT immutable
Field [names] is a primitive array.

This currently quite limiting - the field 'names' may not actually be mutated, yet the containing class would be declared as mutable. This is definitely something to work on.


Some of these rules aren't complete. Sometimes they don't interact too well with certain classes (inheritance is hairy issue). Common patterns, such as lazily loaded fields, aren't handled at all (thus java.lang.String is supposedly mutable). So there's a lot to do, but hopefully this entry has provided a decent introduction to Mutability Detector.