Search This Blog

Saturday 2 March 2013

new ArrayList(????)

Writing code without using collections framework would be crazy. They are really cool part of java with a lot of nifty stuff. More importantly unlike arrays you do not have to worry about size management here. The ArrayList will grow as long as  memory is there. No need for the developer to worry about initial size and ArrayIndexOutOfBoundsException.
But what when we are actually asked to watch out our memory footprint ? Can we be memory efficient when we are using Collections ?

A standard question that arises is how to initialize the list? We cannot leave the code as below:
public List<String> getAllNames() {
    List<String> names;
    if (/*validate all preconditions satisfied*/) {
        names = new ArrayList<String>();
            /*fill the list*/   
    }
    return names;
}
This will lead to the compilation error :
The local variable names may not have been initialized
Java requires that all local variables (those that exist on the function stack) are initialized with appropriate values. So what should it be :
List<String> names = null;
List<String> names = new ArrayList<String>();
List<String> names = new ArrayList<String>(0);
List<String> names = new ArrayList<String>(size);
It is settled beyond any doubt that the list has to be initialized. You could either eagerly create the list at the very first line itself or do the smarter thing - lazy create it. Like we did in the getAllNames() method. In this case the array list is only created if we are going to need one. So in the cases where our initial conditions are not satisfied the array list will never be allocated in the heap.
This is where things get interesting. What size should we give the array list ?
The best option is to give it the exact size. For e.g.
public List<String> getTopNNames(int n) {
    List<String> names = null;
    if (/*validate all preconditions satisfied*/) {
        names = new ArrayList<String>(n);
        /*fill the list*/   
    }
    return names;
}
In this case exactly n names need to be returned and therefore only n sized collection is created. But knowing the correct value of n is never a guarantee. Even in the above case what if n was 1000 and we only had  a maximum of 100 names available ?
The most common thing that we do in case of collections is call the default constructor:
names = new ArrayList<String>();
If we look inside the Java source code (I am using 1.6)
   /**
     * Constructs an empty list with the specified initial capacity.
     *
     * @param   initialCapacity   the initial capacity of the list
     * @exception IllegalArgumentException if the specified initial capacity
     *            is negative
     */
    public ArrayList(int initialCapacity) {
        super();
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal Capacity: " + initialCapacity);
        this.elementData = new Object[initialCapacity];
    }

    /**
     * Constructs an empty list with an initial capacity of ten.
     */
    public ArrayList() {
        this(10);
    }
As can be seen the default array list created is of size 10. So we can use the default constructor without any worries when our intended list is not going to be holding anymore than 10 elements.
What happens if we try and add the 11th element ? Well nothing bad..., the element is added successfully.
Consider the add method:
public boolean add(E e) {
    ensureCapacity(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
    }
The size of the array is managed in the ensureCapacity method.
public void ensureCapacity(int minCapacity) {
    modCount++;
    int oldCapacity = elementData.length;
    if (minCapacity > oldCapacity) {
        Object oldData[] = elementData;
        int newCapacity = (oldCapacity * 3)/2 + 1;
        if (newCapacity < minCapacity)
            newCapacity = minCapacity;
        // minCapacity is usually close to size, so this is a win:
        elementData = Arrays.copyOf(elementData, newCapacity);
    }
}
The size is increased 1.5 times. A new array is allocated and the elements are moved to that array. So if we set a size of 10 then at the 11th element :
the size changes to 10*3/2 + 1 = 16
next jump = 16*3/2 + 1 = 25
next jump = 25*3/2 + 1 = 39  and so on...
In fitting 100 elements, the JVM will have to keep allocating a new array several times over and copy the elements into the new array. For this reason if we have an idea of the size required, it is a good practice to allocate an initial value close to it. 
Some good guidelines to follow here would be:
  1. Create the collection only when you need it, else initialize to null or use the Collections.EMPTY_LIST option.
  2. If we know the exact size needed, specify that in the constructor for the collection.
  3. If we are sure that the collection won't exceed 10 elements, feel free to call the default constructor
  4. The risk involved in initializing 0 sized collections is that the frequency of creating newer arrays and copying data into them could be higher. We need to carefully consider if we are really achieving a lot of gain by going for 0 sized collections.
  5. If we had initialized the collection to a very large size in the beginning of the method and would like to cut the size down then there is the trimToSize() method.
Of course these guidelines would be applicable when working with array backed collections. These would not be a worry for LinkedList.
To be honest, these issues may not be program killer issues, but if there is an opportunity to make things a bit better, why not do it ? Could you come up with some other useful guidelines ? Any solutions you have found to make it work better? Or is this overkill ? What do you think ?

3 comments:

  1. Thanks Robin for publishing this out !!!
    1) It is always better to initialize a collection with exact size if we know it.(Yeah, we are not always guaranteed to get exact size).
    2) If we initialize the arraylist with size zero, there is a possiblity that our program will have more time and space complexity, while adding first few elements.
    So, honestly my opinion is if we do not know the exact size to initialize an arraylist, better let JVM handle the things !!!

    ReplyDelete
  2. Well the implemenation is quite changed in JDK 1.7. It is using a bitwise operator now.
    So, it is like :
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    rather than,
    int newCapacity = (oldCapacity * 3)/2 + 1;
    Also, there are few more changes while adding a new element in an arraylist.
    You can refer src.zip provoded along with JDK installations!!

    ReplyDelete