Friday, 11 March 2011

Garbage collection tuning in Java 5.0


Memory management is a major factor that affects software application performance. More time is usually spent allocating and allocating memory than performing actual data computation.
While C++ offers direct control over when memory is allocated and freed up, Java attempts to abstract memory management by using garbage collection to reclaim memory that the program no longer needs. However, the "pause" associated with garbage collection has been the central argument against using Java when real-time performance is required.
Garbage collection is typically a periodic process that pauses normal program execution to analyze object references and reclaim memory that was allocated but can no longer be accessed by reference. In large Java applications, the pause for garbage collection can last several seconds, which is enough time to disrupt any type of real-time communication or control system.
Consequently, the memory abstraction provided by garbage collection requires some developers to think more carefully about memory management. Even though Java does not provide the same level of control over memory deallocations as C++, programming patterns can still make a huge difference in the memory performance of Java applications.
In this article, I will briefly review Java 5.0 capabilities in the tuning of garbage collection.

Principles of garbage collection in Java 5.0

The goal of a new Java 1.5 feature called ergonomics is to provide good performance from the JVM with a minimum of command-line tuning. Ergonomics attempts to match the best selection of proper garbage collector, heap size, and runtime compiler for an application.
When does the choice of a garbage collector matter to the user? For many applications, it doesn't matter. That is, the application can perform within its specifications in the presence of garbage collection with pauses of modest frequency and duration. An example where this is not the case would be a large application that scales well to a large number of threads, processors, sockets, and a lot of memory.
An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.
Starting with Java 2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work. The most important of these observed properties is so-called infant mortality. There are a plenty of objects that "have died" soon after being allocated. Iterator objects, for example, are often alive for the duration of a single loop. To optimize for this scenario, memory is managed in generations, or memory pools holding objects of different ages. Garbage collection occurs in each generation when the generation fills up. Objects are allocated in a generation for younger objects or the young generation, and most objects die there because of infant mortality.
If the garbage collector has become a bottleneck, you may wish to customize the generation sizes. Check the verbose garbage collector output, and then explore the sensitivity of your individual performance metric to the garbage collector parameters.
At initialization, a maximum address space is virtually reserved but not allocated to physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and tenured generations. The young generation consists of eden plus two survivor spaces. Objects are initially allocated in eden. One survivor space is empty at any time and serves as a destination of the next, copying collection of any live objects in eden and the other survivor space. Objects are copied between survivor spaces in this way until they are old enough to be tenured, or copied to the tenured generation. A third generation closely related to tenured is the permanent generation. This generation is special because it holds data needed by the virtual machine to describe objects that do not have an equivalent at the Java language level. For example, objects describing classes and methods are stored in the permanent generation.

Performance considerations

There are two metrics of performance of a Java application (and garbage collection in particular): throughput and pauses. Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time. Throughput includes time spent in allocation (but tuning for speed of allocation is generally not needed). Pauses are the times when an application appears unresponsive because garbage collection is occurring.
Some users are sensitive to other considerations as well. For instance, footprint is the working set of a process, measured in pages and cache lines. On systems with limited physical memory or many processes, footprint may dictate scalability. Promptness is the time between when an object becomes dead and when the memory becomes available; this is an important consideration for distributed systems, including remote method invocation (RMI).
In general, a particular generation sizing chooses a trade off between these considerations. For example, a very large young generation may maximize throughput, but it does so at the expense of footprint, promptness, and pause times. You can minimize young generation pauses by using a small young generation at the expense of throughput.
When you want to improve the performance of your application with larger numbers of processors, you should use the throughput collector. You can enable the throughput collector by using the command-line flag -XX:+UseParallelGC. You can control the number of garbage collector threads with the ParallelGCThreads command-line option -XX:ParallelGCThreads=<desired number>. The maximum pause time goals are specified with the command-line flag -XX:MaxGCPauseMillis=<nnn>. This is interpreted as a hint to the throughput collector that pause times of <nnn> milliseconds or less are desired. There are plenty of generation sizes adjustment options such as -XX:YoungGenerationSizeIncrement=<nnn> for the young generation and -XX:TenuredGenerationSizeIncrement=<nnn> for the tenured generation.
If your application would benefit from shorter garbage collector pauses and can afford to share processor resources with the garbage collector when the application is running, I suggest using the concurrent low pause collector. A concurrent collection will start if the occupancy of the tenured generation grows above the initiating occupancy (i.e., the percentage of the current heap that is used before a concurrent collection is started). The initiating occupancy by default is set to about 68%. You can set it with the parameter -XX:CMSInitiatingOccupancyFraction=<nn> where <nn> is a percentage of the current tenured generation size. You can use the concurrent collector in a mode in which the concurrent phases are done incrementally. This mode (referred to here as "i-cms") divides the work done concurrently by the collector into small chunks of time, which are scheduled between young generation collections. This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors.

Fine-tuning garbage collection

The command-line argument -verbose:gc prints information at every collection. If it is switched on, you will see similar output at every garbage collection. For example:
[GC 325407K->83000K(776768K), 0.2300771 secs] 

[GC 325816K->83372K(776768K), 0.2454258 secs] 

[Full GC 267628K->83769K(776768K), 1.8479984 secs]
There are two minor collections and one major one (Full GC). The flag -XX:+PrintGCDetails prints additional information about the collections. The flag -XX:+PrintGCTimeStamps will additionally print a timestamp at the start of each collection. Listing A is what you will see when both flags are set.
Additionally, the information is shown for a major collection delineated by Tenured. The tenured generation usage was reduced here to about 10% and took approximately 0.13 seconds.
A number of parameters affect generation size. At initialization of the virtual machine, the entire space for the heap is reserved. You can specify the size of the space reserved with the -Xmx option. If the value of the -Xms parameter is smaller than the value of the -Xmx parameter, not all of the reserved space is immediately committed to the virtual machine. The different parts of the heap (permanent generation, tenured generation, and young generation) can grow to the limit of the virtual space as needed.
By default, the virtual machine grows or shrinks the heap at each collection to try to keep the proportion of free space to live objects at each collection within a specific range. This target range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and -XX:MaxHeapFreeRatio=<maximum>, and the total size is bounded below by -Xms and above by -Xmx. Unless you have problems with pauses, try granting as much memory as possible to the virtual machine. The default size (64MB) is often too small. You can find descriptions of other VM options on Sun's Web site.
You can also set a proportion of the heap dedicated to the young generation. The bigger the young generation, the less often minor collections occur. However, for a bounded heap size, a larger young generation implies a smaller tenured generation, which will increase the frequency of major collections. The optimal choice depends on the lifetime distribution of the objects allocated by the application. The young generation size is controlled by NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the young and tenured generation is 1:3. If desired, the parameter SurvivorRatio can be used to tune the size of the survivor spaces, but this is often not as important for performance. For example, -XX:SurvivorRatio=6 sets the ratio between each survivor space and eden to 1:6. Unless you find problems with excessive major collection or pause times, grant plenty of memory to the young generation.
Java 5.0 has implemented three different garbage collectors. The throughput collector uses a parallel version of the young generation collector. It is used if the -XX:+UseParallelGC option is passed on the command line. The concurrent low pause collector is used if the -Xincgc or -XX:+UseConcMarkSweepGC option is passed on the command line. In this case, the application is paused for short periods during the collection. The incremental low pause collector is used only if -XX:+UseTrainGC is passed on the command line. It will not be supported in future releases, but if you want more information, please see Sun's documentation on using this collector. (Note: Do not use -XX:+UseParallelGC with -XX:+UseConcMarkSweepGC.)

Conclusion

Garbage collection can become a bottleneck in different applications depending on the requirements of the applications. By understanding the requirements of the application and the garbage collection options, it is possible to minimize the impact of garbage collection.

Link for further details:


Analyzing the Garbage Collection Log

Tagtraum industries ( http://tagtraum.com) provides a free utility, gcviewer, for analyzing garbage collection logs generated by the JVM. Load the garbage collection log into gcviewer and determine which issue is occuring based on the descriptions in Table 3-1.
Table 3-1 Garbage Collection Performance Issues
Issue
Symptoms in Garbage Collection Log
Impact of the Issue
Insufficient total (heap) memory allocated
Memory usage trends upwards and reaches the top of the total memory allocated.
Reduces the performance or potentially crashes the AquaLogic User Interaction product.
Excessive total (heap) memory allocated
Memory usage peaks much lower than total memory allocated.
Can cause a slowdown across all applications on the server. The application server or AquaLogic User Interaction product is taking up too much of the system memory.
Insufficient young generation memory allocated
Sawtoothed memory usage.
Reduces the performance of the AquaLogic User Interaction product. This represents excessive minor garbage collector runs, which increases the number of objects in the tenured generation. Objects in the tenured generation are more resource intensive when called.

Resolving Garbage Collection Performance Issues

Resolving the issues described in Analyzing the Garbage Collection Log is a matter of adjusting the JVM memory settings and reanalyzing the garbage collection log. Table 3-2 shows what memory settings to adjust for each issue. For details on how to adjust these settings for each supported application server and standalone AquaLogic User Interaction product, see Java Virtual Machine Configuration.
Table 3-2 Garbage Collection Performance Issue Resolution
Issue
Resolution
JVM Memory Parameter
Insufficient total (heap) memory allocated
Increase total heap memory allocation until memory usage stays reasonably below total memory.
Increase
-Xmx
Excessive total (heap) memory allocated
Decrease total heap memory allocation until memory usage is reasonably, but not excessively, below total memory.
Decrease
-Xmx
Insufficient young generation memory allocated
Increase young generation memory allocation until the memory usage trend is horizontal.
Adjust
-XX:NewRatio

Java Memory Switches

The following are Java memory switches used to tune JVM garbage collection. Use these switches in conjunction with the instructions specific to your application server or AquaLogic User Interaction product.
  • -Xloggc:<path/filename>
  • This switch turns on garbage collection logging for the JVM. Replace <path/filename> with the location where the garbage collection log should be generated.
  • -Xms and -Xmx
  • These switches set the minimum (-Xms) and maximum (-Xmx) heap size for the JVM. The JVM adjusts heap size based on object usage and bounded by these two switches. Setting these switches to the same value increases predictability by removing the ability of the JVM to adjust the heap size.
    Caution:Fixing the heap size to a specific value requires special attention to memory tuning.
  • -XX:NewRatio
  • This switch sets the ration of the young generation to the tenured generation. For example
    -XX:NewRatio=3
    would mean that the tenured generation is 3x the size of the young generation, or, in other words, the young generation is one quarter of the heap and the tenured generation is three-quarters of the heap.

How We Solved our Garbage Collection Pausing Problem

-XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewSize=1200m -XX:SurvivorRatio=16
As Greg says:
-XX:+DisableExplicitGC - some libs call System.gc(). This is usually a bad idea and could explain some of what we saw. -XX:+UseConcMarkSweepGC - use the low pause collector -XX:NewSize=1200m -XX:SurvivorRatio=16 - the black magic part. Tuning these requires emprical observation of your GC log, either from verbose gc or jstat (a JDK 1.5 tool). In particular the 1200m new size is 1/4 of our heap size of 4800MB.

No comments:

Post a Comment