Search This Blog

Sunday 17 March 2013

Working with the Iterator

Before I start let me state that I am still to come across a valid scenario in actual application code where I could find this functionality useful.
To test this I first created a simple Hibernate POJO:
public class Entity {
    private String name;
    private Integer id;
        //setter getters
}
Consider that we have a method where we need to do some operations on persistent records of this POJO:
public static void testFetching() {
    final Session session = sessionFactory.openSession();
    System.out.println("First round of operations");
    List<Entity> entities = session.createQuery("from Entity").list();
    for (Entity entity : entities) {
        System.out.println(entity);
    }
    //some code here
    System.out.println("Second round of operations");
    List<Entity> entitiesLoadedOnceAgain = session.createQuery(
            "from Entity").list();
    for (Entity entity : entitiesLoadedOnceAgain) {
        System.out.println(entity);
    }
    session.close();
}
The above code first loads a list of Entities from the database. After performing a display if the records it executes a second database fetch. For the same query.
First round of operations
Hibernate: 
    /* 
from
    Entity */ 
    select
        entity0_.ID as ID4_,
        entity0_.NAME as NAME4_ 
    from
        Entity entity0_
com.simple.Entity@f855562
com.simple.Entity@53d439fe
com.simple.Entity@122b7db1
Second round of operations
Hibernate: 
    /* 
from
    Entity */ 
    select
        entity0_.ID as ID4_,
        entity0_.NAME as NAME4_ 
    from
        Entity entity0_
com.simple.Entity@f855562
com.simple.Entity@53d439fe
com.simple.Entity@122b7db1
As can be seen, Hibernate ended up firing the same query twice. But if we look at the objects in the result we can see that they haven't changed. So the records fetched in the second SQL call were simply discarded.
In other words the second SQL call was a total waste.
Is there some way to improve on this scenario?
Can we just avoid the second query ? No, that is not an option. We may have some code here that probably removes records from the first list. Or after the first fetch a new record might have been added. 
The query has to be fired . But does all the data need to be loaded back ? If this table had a lot of columns and a lot of data, consider the amount of network traffic caused by the second db call. We need to see if this data size can be optimized.
That is where iterate method comes in.
Consider the below code:
public static void testIteratedFetching() {
    final Session session = sessionFactory.openSession();
    List<Entity> entities = session.createQuery("from Entity").list();
    for (Entity entity : entities) {
        System.out.println(entity);
    }
    // some code here
    Iterator<Entity> entitiesLoadedOnceAgain = session.createQuery(
            "from Entity").iterate();
    while (entitiesLoadedOnceAgain.hasNext()) {
        System.out.println(entitiesLoadedOnceAgain.next());
    }
    session.close();
}
In this the second list() call has been replaced by an iterate call(). Instead of a list, it returns an iterator. The output of the method is :
Hibernate: 
    /* 
from
    Entity */ 
    select
        entity0_.ID as ID4_,
        entity0_.NAME as NAME4_ 
    from
        Entity entity0_
com.simple.Entity@7aa89ce3
com.simple.Entity@122b7db1
com.simple.Entity@6548f8c8
Hibernate: 
    /* 
from
    Entity */ 
    select
        entity0_.ID as col_0_0_ 
    from
        Entity entity0_
com.simple.Entity@7aa89ce3
com.simple.Entity@122b7db1
com.simple.Entity@6548f8c8
If we look at the output then we see
  • The list() method as before fetched all the records. 
  • The session added these entities to its cache.
  • The call to iterate() did not behave like a select *. Instead it only selected the ids for the matching records.
  • For each of the identifiers, the session checked if the record was there in its cache. As all records were then in the cache, the session simply returned those records.
Thus the data columns of the rows were not loaded again. But what if a new record was added during this period ?
private static void createRec() {
    final Session session = sessionFactory.openSession();
    Transaction transaction = session.beginTransaction();
    Entity entity1 = new Entity();
    entity1.setName("Entity4");
    session.save(entity1);
    transaction.commit();
    session.close();
}
public static void testIteratedFetching() {
    final Session session = sessionFactory.openSession();
    List<Entity> entities = session.createQuery("from Entity").list();
    for (Entity entity : entities) {
        System.out.println(entity);
    }
    // this will add a record via
    createRec();
    Iterator<Entity> entitiesLoadedOnceAgain = session.createQuery(
            "from Entity").iterate();
    while (entitiesLoadedOnceAgain.hasNext()) {
        System.out.println(entitiesLoadedOnceAgain.next());
    }
    session.close();
}
IN the above code, after executing the first query, I have added a new record in the database using a separate session ( so as to avoid repeatable reads for the first session). If we now look at the logs:
Hibernate: 
    /* 
from
    Entity */ 
    select
        entity0_.ID as ID4_,
        entity0_.NAME as NAME4_ 
    from
        Entity entity0_
com.simple.Entity@6548f8c8
com.simple.Entity@66922804
com.simple.Entity@5815338
Hibernate: 
    /* insert com.simple.Entity
        */ 
        insert 
        into
            Entity
            (NAME) 
        values
            (?)
Hibernate: 
    /* 
from
    Entity */ 
    select
        entity0_.ID as col_0_0_ 
    from
        Entity entity0_
com.simple.Entity@6548f8c8
com.simple.Entity@66922804
com.simple.Entity@5815338
Hibernate: 
    /* load com.simple.Entity */ 
    select
        entity0_.ID as ID4_0_,
        entity0_.NAME as NAME4_0_ 
    from
        Entity entity0_ 
    where
        entity0_.ID=?
com.simple.Entity@59d0d45b
As the fourth record was not available in the session's first level cache, Hibernate had to execute a separate select query for that record.
This is a risk with this method. You might end up avoiding lot of duplicate data, but then also end up firing additional select queries to load an individual record.
Certain scenarios that this feature might be useful is :
  • You have huge data that rarely changes. In such case, the probability of an additional select call is very low.
  • You are working with mostly immutable data that is cached ( first level or second level). The iterate method looks in both caches before hitting the database.
There must be a few more. Let me know if you can think of any.

No comments:

Post a Comment