Wednesday, 11 February 2009

Notes on the Java Memory Model and Garbage Collection

Understanding the Java Memory Model is key to understanding critical Java behaviour, including Garbage Collection, thread synchronization, volatile and class and object management. The diagram below is the starting point to items listed in this blog entry.



  • Class loading, into the Heap (shared between threads), is carried out in the following sequence:
    1. On start-up, the JVM loads the objects it requires to operate using the Bootstrap Loader. These objects are obtained from class files in rt.jar
    2. The ExtClassLoader will load classes from the path specified by the property java.ext.dir
    3. All other required classes are loaded by the AppClassLoader from locations specified in the CLASSPATH. These objects are loaded as required by individual threads.
  • Garbage Collection aims to free heap memory used by objects that can't be reached from a root object.
  • Root objects are either:
    • Local variables on the stack
    • Parameters of functions on the stack
    • JNI native references
    • All classes loaded by the bootstrap Loader
  • Note the different types of references (see this Sun page for details), going from strongest to weakest reachability:
    1. An object is strongly reachable if it can be reached by some thread without traversing any reference objects. A newly-created object is strongly reachable by the thread that created it.
    2. An object is softly reachable if it is not strongly reachable but can be reached by traversing a soft reference.
    3. An object is weakly reachable if it is neither strongly nor softly reachable but can be reached by traversing a weak reference. When the weak references to a weakly-reachable object are cleared, the object becomes eligible for finalization.
    4. An object is phantom reachable if it is neither strongly, softly, nor weakly reachable, it has been finalized, and some phantom reference refers to it.
    5. Finally, an object is unreachable, and therefore eligible for reclamation, when it is not reachable in any of the above ways.
  • Retained Set - the objects that would be garbage collected
  • Retained Size - the memory that would be released once the retained set was collected
  • Garbage Collection is based on the generational heap model outlined in the diagram above (where the green boxes, Eden Survivor 1 and 2, are the Young Generation):
    1. New objects allocated to Eden space.
    2. Once Eden is "full", objects are garbage collected and survivors are placed into one of the survivior spaces. The vast majority of objects will never make it into the survivor space as they will die soon after creation.
    3. Once Eden or the Survivor space is full, the objects in both are garbage collected and placed in the other survivor space.
    4. Objects in the Survivor space that survive several generations are copied into the tenured space.
  • The proportion of the heap made available to each generation can be controlled on starting the JVM as follows:
    • -XX:NewRatio=3 will provide a ratio between the Young and Tenured generation of 1 to 3
    • -XX:SurvivorRatio=6 will provide a ratio between the each Survivor space and Eden as 1 to 6.
  • Java 1.5 provides 4 garbage collectors to allow for compromise between throughput and pausing:
    • The Default Collector
    • The Throughput Collector (Parallel GC)
    • The Concurrent Low Pause Collector (Concurrent GC)
    • The incremental low-pause collector (Not supported in future releases)
  • To measure basic GC performance, use the argument -verbose:gc output on startup. As explained by Pete Freitag, the output [GC 325407K->83000K(776768K), 0.2300771 secs] would mean:
    • GC - Indicates that it was a minor collection (young generation). If it had said Full GC then that indicates that it was a major collection (tenured generation).
    • 325407K - The combined size of live objects before garbage collection.
    • 83000K - The combined size of live objects after garbage collection.
    • (776768K) - the total available space, not counting the space in the permanent generation, which is the total heap minus one of the survivor spaces.
    • 0.2300771 secs - time it took for garbage collection to occur

Thursday, 11 December 2008

If only I'd over-engineered it!

Sometimes you come back to an application and can only weep at the lack of foresight which would have made implementing the required new feature a doddle! But you've no time to weep, just as you had no time back in the day to make the application dance to any tune imaginable. You also know that Agile methodology tells you to code what is needed and no more and yet, you're always asking yourself, wouldn't it be nice if you could predict the future and just squeeze in that little bit of extra abstraction? In my current predicament, I can honestly say that there was no way I could have imagined replacing the source of my real-time feed with a completely different API. And given that the new API is only intended for an internal audience, with perhaps the odd external client demonstration, was it really an oversight? Afterall, making the receiving code abstract enough to cope with a simple replacement of the input feed took two weeks, time I simply didn't have on the original project. And yet, was there anything I could have done that would have made my life simpler now without comprimising the original deadline? In fact, I'm going to propose the following guidelines (admittedly Java centric) to help compromise between over-engineering and delivering what is needed now:
  • Always code to interfaces. This is extremely obvious but, yet again, the last API I worked on hadn't and the resulting experience was painful to say the least. However, there's no need to create interfaces for objects intended as data containers.
  • Use dependency injection. That is, keep coupling between classes light and ask yourself if creating an object is necessary in all cases. Inject it if at all possible.
  • Use factories. If an object has to be created, consider using a Factory object to create it instead. This can be as simple as a method on an Enum specifying the various types and associates a creation mechanism with each one.

There's nothing new in these suggestion and I'll keep adding to it as and when.

Tuesday, 18 November 2008

You need an architect ...

Having worked on two projects recently, both of which had become distressed and each embracing very different design methodologies, I thought it'd be interesting to look at what went wrong in each case. The first project started with a "top-down", architectural design exercise which was later implemented by another dev team. The other was a "bottom-up" proof of concept exercise intended to deliver demonstrable functionality in a short space of time with virtually non-existent design effort. The top-down project produced a set of grand sounding documents specifying, in some detail, the desired (SOA) architectural framework and an outline of how the first pilot application should be implemented. The bottom-up project resulted in code.

On implementing the top-down project's pilot application, the visionary plans got lost in the process of making sure the required functionality was delivered on time and on budget. Unfortunately, when it came time to add another application to the framework, the owners found that their cost-cutting meant that all the services created to support the pilot app couldn't be re-used for anything else. The bottom-up project, on the other hand, hit the keyboards from the start and produced a tangled code base that didn't come close to delivering anything, on time or otherwise.

The top-down project lost its way as the lauded design principle - to separate services from the presentation tier - wasn't properly considered in the frenzy to get code out the door. The bottom-up project similarly went wrong as there were no design principles at all, leading to confused and broken code. In both cases, common functionality for use across many use cases simply didn't exist. Although each project had started from a different position, both ended up with the same problem. It seems to me that the missing ingredient in both cases was a lack of direction during the development phase. I was left wondering why such a crucial resource - architectural guidance - had been left out. Although it seems to me that the upfront cost of some form of architectural oversight will pay for itself in a very short space of time, it'd be interesting to know how to convince paying clients!