Friday, 8 July 2011

Why to use Calender not new Date(...)?


Originally, Date was intended to contain all logic concerning dates, but the API designers eventually realized that the API they had so far was woefully inadequate and could not be cleanly extended to deal correctly with issues such as timezones, locales, different calendars, daylight savings times, etc.
So they created Calendar to deal with all that complexity, and relegated Date to a simple timestamp, deprecating all its functionality that dealt with formatting, parsing and individual date fields.
BTW, internally these methods such as Date(int, int, int) constructor now call Calendar, so if you see a difference in speed, then you're doing something wrong when calling Calendar.

Monday, 25 April 2011

Synchronization, Volatile and Atomic ... together - when and why ?

Concurrent Coding:-
To exploit the power of multiprocessor systems, applications are generally structured using multiple threads. But as anyone who's written a concurrent application can tell you, simply dividing up the work across multiple threads isn't enough to achieve good hardware utilization -- you must ensure that your threads spend most of their time actually doing work, rather than waiting for more work to do, or waiting for locks on shared data structures.

Using Synchronization -

The traditional way to coordinate access to shared fields in the Java language is to use synchronization, ensuring that all access to shared fields is done holding the appropriate lock. With synchronization, you are assured (assuming your class is properly written) that whichever thread holds the lock that protects a given set of variables will have exclusive access to those variables, and changes to those variables will become visible to other threads when they subsequently acquire the lock. The downside is that if the lock is heavily contended (threads frequently ask to acquire the lock when it is already held by another thread), throughput can suffer, as contended synchronization can be quite expensive. (Public Service Announcement: uncontended synchronization is now quite inexpensive on modern JVMs.)
Another problem with lock-based algorithms is that if a thread holding a lock is delayed (due to a page fault, scheduling delay, or other unexpected delay), then no thread requiring that lock may make progress.
Using Volatile -                                                                                            
Volatile variables can also be used to store shared variables at a lower cost than that of synchronization, but they have limitations. While writes to volatile variables are guaranteed to be immediately visible to other threads, there is no way to render a read-modify-write sequence of operations atomic, meaning, for example, that a volatile variable cannot be used to reliably implement a mutex (mutual exclusion lock) or a counter.
Using Atomic - 
The first processors that supported concurrency provided atomic test-and-set operations, which generally operated on a single bit. The most common approach taken by current processors, including Intel and Sparc processors, is to implement a primitive called compare-and-swap, or CAS. (On Intel processors, compare-and-swap is implemented by the cmpxchg family of instructions. PowerPC processors have a pair of instructions called "load and reserve" and "store conditional" that accomplish the same goal; similar for MIPS, except the first is called "load linked.")
A CAS operation includes three operands -- a memory location (V), the expected old value (A), and a new value (B). The processor will atomically update the location to the new value if the value that is there matches the expected old value, otherwise it will do nothing. In either case, it returns the value that was at that location prior to the CAS instruction. (Some flavors of CAS will instead simply return whether or not the CAS succeeded, rather than fetching the current value.) CAS effectively says "I think location V should have the value A; if it does, put B in it, otherwise, don't change it but tell me what value is there now."
The natural way to use CAS for synchronization is to read a value A from an address V, perform a multistep computation to derive a new value B, and then use CAS to change the value of V from A to B. The CAS succeeds if the value at V has not been changed in the meantime.
Instructions like CAS allow an algorithm to execute a read-modify-write sequence without fear of another thread modifying the variable in the meantime, because if another thread did modify the variable, the CAS would detect it (and fail) and the algorithm could retry the operation. Listing 3 illustrates the behavior (but not performance characteristics) of the CAS operation, but the value of CAS is that it is implemented in hardware and is extremely lightweight (on most processors):
Concurrent algorithms based on CAS are called lock-free, because threads do not ever have to wait for a lock (sometimes called a mutex or critical section, depending on the terminology of your threading platform). Either the CAS operation succeeds or it doesn't, but in either case, it completes in a predictable amount of time.
Atomic variable classes
Until JDK 5.0, it was not possible to write wait-free, lock-free algorithms in the Java language without using native code. With the addition of the atomic variables classes in the java.util.concurrent.atomic package, that has changed. The atomic variable classes all expose a compare-and-set primitive (similar to compare-and-swap), which is implemented using the fastest native construct available on the platform (compare-and-swap, load linked/store conditional, or, in the worst case, spin locks). Nine flavors of atomic variables are provided in the java.util.concurrent.atomic package (AtomicIntegerAtomicLong;AtomicReferenceAtomicBoolean; array forms of atomic integer; long; reference; and atomic marked reference and stamped reference classes, which atomically update a pair of values).
The atomic variable classes can be thought of as a generalization of volatile variables, extending the concept of volatile variables to support atomic conditional compare-and-set updates. Reads and writes of atomic variables have the same memory semantics as read and write access to volatile variables.
While the atomic variable classes might look superficially like the SynchronizedCounter example in Listing 1, the similarity is only superficial. Under the hood, operations on atomic variables get turned into the hardware primitives that the platform provides for concurrent access, such as compare-and-swap.
Atomic variables in java.util.concurrent
"Nearly all the classes in the java.util.concurrent package use atomic variables instead of synchronization, either directly or indirectly. Classes like ConcurrentLinkedQueue use atomic variables to directly implement wait-free algorithms, and classes like ConcurrentHashMap use ReentrantLock for locking where needed. ReentrantLock, in turn, uses atomic variables to maintain the queue of threads waiting for the lock."
Sample Code For CAS -
Code illustrating the behavior (but not performance) of compare-and-swap
public class SimulatedCAS {
     private int value;

     public synchronized int getValue() { return value; }

 public synchronized int compareAndSwap(int expectedValue, int newValue) {
         int oldValue = value;
         if (value == expectedValue)
             value = newValue;
         return oldValue;
     }
}

Implementing a counter with compare-and-swap
public class CasCounter {
    private SimulatedCAS value;

    public int getValue() {
        return value.getValue();
    }

    public int increment() {
        int oldValue = value.getValue();
        while (value.compareAndSwap(oldValue, oldValue + 1) != oldValue)
            oldValue = value.getValue();
        return oldValue + 1;
    }
}





Wednesday, 6 April 2011

Integer Overflow and Underslow in Java...


You could imagine that when you have only 2 places you are counting (so adding 1 each time)
 00
 01
 10
 11 100
But the last one gets cut down to "00" again. So there is your "overflow". You're back at 00. Now depending on what the bits mean, this can mean several things, but most of the time this means you are going from the highest value to the lowest. (11 to 00)

So:
System.out.println(Integer.MAX_VALUE + 1 == Integer.MIN_VALUE);
System.out.println(Integer.MIN_VALUE - 1 == Integer.MAX_VALUE);
prints true twice.

Monday, 4 April 2011

What is use of serialVersionUID?


During object serialization, the default Java serialization mechanism writes the metadata about the object, which includes the class name, field names and types, and superclass. This class definition is stored as a part of the serialized object. This stored metadata enables the deserialization process to reconstitute the objects and map the stream data into the class attributes with the appropriate type
Everytime an object is serialized the java serialization mechanism automatically computes a hash value. ObjectStreamClass’s computeSerialVersionUID() method passes the class name, sorted member names, modifiers, and interfaces to the secure hash algorithm (SHA), which returns a hash value.The serialVersionUID is also called suid.
So when the serilaize object is retrieved , the JVM first evaluates the suid of the serialized class and compares the suid value with the one of the object. If the suid values match then the object is said to be compatible with the class and hence it is de-serialized. If not InvalidClassException exception is thrown.
Changes to a serializable class can be compatible or incompatible. Following is the list of changes which are compatible:
  • Add fields
  • Change a field from static to non-static
  • Change a field from transient to non-transient
  • Add classes to the object tree
List of incompatible changes:
  • Delete fields
  • Change class hierarchy
  • Change non-static to static
  • Change non-transient to transient
  • Change type of a primitive field
So, if no suid is present , inspite of making compatible changes, jvm generates newsuid thus resulting in an exception if prior release version object is used .
The only way to get rid of the exception is to recompile and deploy the application again.
If we explicitly metion the suid using the statement:

private final static long serialVersionUID = <integer value>
then if any of the metioned compatible changes are made the class need not to be recompiled. But for incompatible changes there is no other way than to compile again.

Saturday, 19 March 2011

Why String is final...

Being final guarantees that instances of String are read-only.(The String class implements read-only objects, but if it were not final it would be possible to write a subclass of String which permitted instances to be changed.) Strings need to be read-only for security ... Security is a more compelling reason. Before String was changed to be final (while Java 1.0 was still in beta) there was a race condition which could be used to subvert security restrictions. It had to do with one thread changing a pathname to a file after another thread had checked that the access was permitted and was about to open it...

Friday, 11 March 2011

Garbage collection tuning in Java 5.0


Memory management is a major factor that affects software application performance. More time is usually spent allocating and allocating memory than performing actual data computation.
While C++ offers direct control over when memory is allocated and freed up, Java attempts to abstract memory management by using garbage collection to reclaim memory that the program no longer needs. However, the "pause" associated with garbage collection has been the central argument against using Java when real-time performance is required.
Garbage collection is typically a periodic process that pauses normal program execution to analyze object references and reclaim memory that was allocated but can no longer be accessed by reference. In large Java applications, the pause for garbage collection can last several seconds, which is enough time to disrupt any type of real-time communication or control system.
Consequently, the memory abstraction provided by garbage collection requires some developers to think more carefully about memory management. Even though Java does not provide the same level of control over memory deallocations as C++, programming patterns can still make a huge difference in the memory performance of Java applications.
In this article, I will briefly review Java 5.0 capabilities in the tuning of garbage collection.

Principles of garbage collection in Java 5.0

The goal of a new Java 1.5 feature called ergonomics is to provide good performance from the JVM with a minimum of command-line tuning. Ergonomics attempts to match the best selection of proper garbage collector, heap size, and runtime compiler for an application.
When does the choice of a garbage collector matter to the user? For many applications, it doesn't matter. That is, the application can perform within its specifications in the presence of garbage collection with pauses of modest frequency and duration. An example where this is not the case would be a large application that scales well to a large number of threads, processors, sockets, and a lot of memory.
An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.
Starting with Java 2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. While naive garbage collection examines every live object in the heap, generational collection exploits several empirically observed properties of most applications to avoid extra work. The most important of these observed properties is so-called infant mortality. There are a plenty of objects that "have died" soon after being allocated. Iterator objects, for example, are often alive for the duration of a single loop. To optimize for this scenario, memory is managed in generations, or memory pools holding objects of different ages. Garbage collection occurs in each generation when the generation fills up. Objects are allocated in a generation for younger objects or the young generation, and most objects die there because of infant mortality.
If the garbage collector has become a bottleneck, you may wish to customize the generation sizes. Check the verbose garbage collector output, and then explore the sensitivity of your individual performance metric to the garbage collector parameters.
At initialization, a maximum address space is virtually reserved but not allocated to physical memory unless it is needed. The complete address space reserved for object memory can be divided into the young and tenured generations. The young generation consists of eden plus two survivor spaces. Objects are initially allocated in eden. One survivor space is empty at any time and serves as a destination of the next, copying collection of any live objects in eden and the other survivor space. Objects are copied between survivor spaces in this way until they are old enough to be tenured, or copied to the tenured generation. A third generation closely related to tenured is the permanent generation. This generation is special because it holds data needed by the virtual machine to describe objects that do not have an equivalent at the Java language level. For example, objects describing classes and methods are stored in the permanent generation.

Performance considerations

There are two metrics of performance of a Java application (and garbage collection in particular): throughput and pauses. Throughput is the percentage of total time not spent in garbage collection, considered over long periods of time. Throughput includes time spent in allocation (but tuning for speed of allocation is generally not needed). Pauses are the times when an application appears unresponsive because garbage collection is occurring.
Some users are sensitive to other considerations as well. For instance, footprint is the working set of a process, measured in pages and cache lines. On systems with limited physical memory or many processes, footprint may dictate scalability. Promptness is the time between when an object becomes dead and when the memory becomes available; this is an important consideration for distributed systems, including remote method invocation (RMI).
In general, a particular generation sizing chooses a trade off between these considerations. For example, a very large young generation may maximize throughput, but it does so at the expense of footprint, promptness, and pause times. You can minimize young generation pauses by using a small young generation at the expense of throughput.
When you want to improve the performance of your application with larger numbers of processors, you should use the throughput collector. You can enable the throughput collector by using the command-line flag -XX:+UseParallelGC. You can control the number of garbage collector threads with the ParallelGCThreads command-line option -XX:ParallelGCThreads=<desired number>. The maximum pause time goals are specified with the command-line flag -XX:MaxGCPauseMillis=<nnn>. This is interpreted as a hint to the throughput collector that pause times of <nnn> milliseconds or less are desired. There are plenty of generation sizes adjustment options such as -XX:YoungGenerationSizeIncrement=<nnn> for the young generation and -XX:TenuredGenerationSizeIncrement=<nnn> for the tenured generation.
If your application would benefit from shorter garbage collector pauses and can afford to share processor resources with the garbage collector when the application is running, I suggest using the concurrent low pause collector. A concurrent collection will start if the occupancy of the tenured generation grows above the initiating occupancy (i.e., the percentage of the current heap that is used before a concurrent collection is started). The initiating occupancy by default is set to about 68%. You can set it with the parameter -XX:CMSInitiatingOccupancyFraction=<nn> where <nn> is a percentage of the current tenured generation size. You can use the concurrent collector in a mode in which the concurrent phases are done incrementally. This mode (referred to here as "i-cms") divides the work done concurrently by the collector into small chunks of time, which are scheduled between young generation collections. This feature is useful when applications that need the low pause times provided by the concurrent collector are run on machines with small numbers of processors.

Fine-tuning garbage collection

The command-line argument -verbose:gc prints information at every collection. If it is switched on, you will see similar output at every garbage collection. For example:
[GC 325407K->83000K(776768K), 0.2300771 secs] 

[GC 325816K->83372K(776768K), 0.2454258 secs] 

[Full GC 267628K->83769K(776768K), 1.8479984 secs]
There are two minor collections and one major one (Full GC). The flag -XX:+PrintGCDetails prints additional information about the collections. The flag -XX:+PrintGCTimeStamps will additionally print a timestamp at the start of each collection. Listing A is what you will see when both flags are set.
Additionally, the information is shown for a major collection delineated by Tenured. The tenured generation usage was reduced here to about 10% and took approximately 0.13 seconds.
A number of parameters affect generation size. At initialization of the virtual machine, the entire space for the heap is reserved. You can specify the size of the space reserved with the -Xmx option. If the value of the -Xms parameter is smaller than the value of the -Xmx parameter, not all of the reserved space is immediately committed to the virtual machine. The different parts of the heap (permanent generation, tenured generation, and young generation) can grow to the limit of the virtual space as needed.
By default, the virtual machine grows or shrinks the heap at each collection to try to keep the proportion of free space to live objects at each collection within a specific range. This target range is set as a percentage by the parameters -XX:MinHeapFreeRatio=<minimum> and -XX:MaxHeapFreeRatio=<maximum>, and the total size is bounded below by -Xms and above by -Xmx. Unless you have problems with pauses, try granting as much memory as possible to the virtual machine. The default size (64MB) is often too small. You can find descriptions of other VM options on Sun's Web site.
You can also set a proportion of the heap dedicated to the young generation. The bigger the young generation, the less often minor collections occur. However, for a bounded heap size, a larger young generation implies a smaller tenured generation, which will increase the frequency of major collections. The optimal choice depends on the lifetime distribution of the objects allocated by the application. The young generation size is controlled by NewRatio. For example, setting -XX:NewRatio=3 means that the ratio between the young and tenured generation is 1:3. If desired, the parameter SurvivorRatio can be used to tune the size of the survivor spaces, but this is often not as important for performance. For example, -XX:SurvivorRatio=6 sets the ratio between each survivor space and eden to 1:6. Unless you find problems with excessive major collection or pause times, grant plenty of memory to the young generation.
Java 5.0 has implemented three different garbage collectors. The throughput collector uses a parallel version of the young generation collector. It is used if the -XX:+UseParallelGC option is passed on the command line. The concurrent low pause collector is used if the -Xincgc or -XX:+UseConcMarkSweepGC option is passed on the command line. In this case, the application is paused for short periods during the collection. The incremental low pause collector is used only if -XX:+UseTrainGC is passed on the command line. It will not be supported in future releases, but if you want more information, please see Sun's documentation on using this collector. (Note: Do not use -XX:+UseParallelGC with -XX:+UseConcMarkSweepGC.)

Conclusion

Garbage collection can become a bottleneck in different applications depending on the requirements of the applications. By understanding the requirements of the application and the garbage collection options, it is possible to minimize the impact of garbage collection.

Link for further details:


Analyzing the Garbage Collection Log

Tagtraum industries ( http://tagtraum.com) provides a free utility, gcviewer, for analyzing garbage collection logs generated by the JVM. Load the garbage collection log into gcviewer and determine which issue is occuring based on the descriptions in Table 3-1.
Table 3-1 Garbage Collection Performance Issues
Issue
Symptoms in Garbage Collection Log
Impact of the Issue
Insufficient total (heap) memory allocated
Memory usage trends upwards and reaches the top of the total memory allocated.
Reduces the performance or potentially crashes the AquaLogic User Interaction product.
Excessive total (heap) memory allocated
Memory usage peaks much lower than total memory allocated.
Can cause a slowdown across all applications on the server. The application server or AquaLogic User Interaction product is taking up too much of the system memory.
Insufficient young generation memory allocated
Sawtoothed memory usage.
Reduces the performance of the AquaLogic User Interaction product. This represents excessive minor garbage collector runs, which increases the number of objects in the tenured generation. Objects in the tenured generation are more resource intensive when called.

Resolving Garbage Collection Performance Issues

Resolving the issues described in Analyzing the Garbage Collection Log is a matter of adjusting the JVM memory settings and reanalyzing the garbage collection log. Table 3-2 shows what memory settings to adjust for each issue. For details on how to adjust these settings for each supported application server and standalone AquaLogic User Interaction product, see Java Virtual Machine Configuration.
Table 3-2 Garbage Collection Performance Issue Resolution
Issue
Resolution
JVM Memory Parameter
Insufficient total (heap) memory allocated
Increase total heap memory allocation until memory usage stays reasonably below total memory.
Increase
-Xmx
Excessive total (heap) memory allocated
Decrease total heap memory allocation until memory usage is reasonably, but not excessively, below total memory.
Decrease
-Xmx
Insufficient young generation memory allocated
Increase young generation memory allocation until the memory usage trend is horizontal.
Adjust
-XX:NewRatio

Java Memory Switches

The following are Java memory switches used to tune JVM garbage collection. Use these switches in conjunction with the instructions specific to your application server or AquaLogic User Interaction product.
  • -Xloggc:<path/filename>
  • This switch turns on garbage collection logging for the JVM. Replace <path/filename> with the location where the garbage collection log should be generated.
  • -Xms and -Xmx
  • These switches set the minimum (-Xms) and maximum (-Xmx) heap size for the JVM. The JVM adjusts heap size based on object usage and bounded by these two switches. Setting these switches to the same value increases predictability by removing the ability of the JVM to adjust the heap size.
    Caution:Fixing the heap size to a specific value requires special attention to memory tuning.
  • -XX:NewRatio
  • This switch sets the ration of the young generation to the tenured generation. For example
    -XX:NewRatio=3
    would mean that the tenured generation is 3x the size of the young generation, or, in other words, the young generation is one quarter of the heap and the tenured generation is three-quarters of the heap.

How We Solved our Garbage Collection Pausing Problem

-XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:NewSize=1200m -XX:SurvivorRatio=16
As Greg says:
-XX:+DisableExplicitGC - some libs call System.gc(). This is usually a bad idea and could explain some of what we saw. -XX:+UseConcMarkSweepGC - use the low pause collector -XX:NewSize=1200m -XX:SurvivorRatio=16 - the black magic part. Tuning these requires emprical observation of your GC log, either from verbose gc or jstat (a JDK 1.5 tool). In particular the 1200m new size is 1/4 of our heap size of 4800MB.