Tic Tech Toe: July 2007

Despite using Java Persistence API (JPA) for quite some time now, it continues to remain an interesting topic to explore. I think a lot of credit for it should go to the number of "may"s used in the specification. I asked Deeps why does spec uses "may" or "may not" when it should have used "will" or "will not". His reply was short but convincing.

The spec is as much for the providers as it is for application developers. There should be enough incentive left for the providers to showcase that they are better than others. The word "may" gives that incentive. I call this phenomenon as MSM ..."Magic Sphere of May". This sphere is magical and wonderful but only issue is no body knows its perimeter.

As far as JPA is concerned most important thing that comes in this sphere is caching of Entity Beans. I made a small attempt to explore it. Since I have used toplink essentials, my observations are based on it but i am sure Hibernate and others will not be much different.

Toplink has two levels of caching generally termed as L1 cache and L2 cache.

L1 cache:

The key concept of JPA is persistence context. JPA defines persistence context as a set of entity instances in which for any persistent entity identity there is a unique entity instance. Its within this persistence context that the life cycle of entities are managed. In simple words persistence context is like a cache which holds all the managed entities , and the EntityManager interface API works off this cache. Probably that is why in toplink terminology persistence context is referred as "Unit of Work", and the cache managing this persistence context as "Unit of Work cache".

Each entitymanager instance is associated with a persistence context. All entities are cached against their persistent identities (or primary key). Whenever we do operations like persist or merge, entities are registered with unit of work. Unit of work then maintains a change set to remember all modifications done on a registered entity. Since any changes made to registered entities will eventually be synchronized to database, they are also referred as "managed entities". You can play around with the unit of work...update entities, bring new entities, remove entities, but the db queries will be made only when you commit (or flush) the unit of Unit of Work .

Consider this very simple example to see what we are gaining out of this.

1 EntityManager em = emf.createEntityManager();
2 em.getTransaction().begin();
3 Product table = new Product();
4 em.persist(table);//No insert query here

5 table.setName("Wood Table");// In memory change to the managed entity

6 em.getTransaction().commit(); //one insert query here
7 em.clear(); //clears the unit of work cache, entity "table" is no more managed by unit of work

One can see how wonderfully entities are managed behind the scenes as db queries are fired only when the state of the in memory entity is different from the state of entity in db.

8 em.getTransaction().begin();
9 table = em.find(Product.class, table.getId());// one select query here, entity table again
becomes managed
10 table.setName("Steel Table");//In memory change to the managed entity

11 table.setName("Wood Table");//In memory entity changed back to the original state

12 em.getTransaction().commit();//no db queries here

13 em.clear();

Here since the state of entity when it was brought into persistence context by running a finder at line 9 is same as at line 12, there is absolutely zero database interaction on commit.There is nothing in the Unit of Work change set to commit. If you are using JPA in Java SE mode entities remain managed beyond transaction scope, until you clear or close the em. So Had i not cleared the persistence context at line 7 there won't be any select query fired either (at line 9) as entity to be found is already available in Unit of Work cache from the previous transaction.

L1 cache surely saves us many expensive queries.

L2 cache

Having said that it is very unlikely that an application will operate on just one persistence context .Normally when a transaction is completed EntityManager is closed and new persistence context is used in another transaction. Also an application may have several clients trying to fetch same data .In this case L1 cache is not sufficient as one client will not be able to take advantage of another client's cache and will always fire expensive database queries to retrieve entities..

To solve this problem toplink has one more level of caching whose scope is across persistence contexts. This cache is at persistence unit level and all entitymanager created from the same EntitymanagerFactory share this cache. Advantage here is em.find() will search in L2 cache first and hit the db only when the desired entity is not found there.

In the above example if I enable session cache(this is by default) there is no db query fired at line 9.This is because even though I have cleared the persistence context cache, the entity is still available in L2 cache. If there are say 1000 clients asking for same entity, L2 cache clearly saves another 999 trips to db.

However there is one significant difference when you obtain an entity from L1 cache and when you get it from L2 cache.

Product table = em.find(Product.class, id);

Product table1 = em.find(Product.class, id);
Product table2 = em.find(Product.class, id);
System.out.println("Session vs L1 "+(table1==table));
System.out.println("L1 vs L1 "+(table1==table2));

output is

Session vs L1 = true

L1 vs L1 =false
...

We can see that the if an entity is given from L1 cache it is given as such, but when it is given from the L2 cache first a clone of entity is created and that clone it is given to client. This means that any changes made to one entity are not immediately reflected on the L2 cache. Its only when a unit of work successfully commits or explicitly calls em.refresh, topLink updates the session cache.

Obviously changes made to an entity obtained by client 1 is not visible to client 2 as both the clients are operating on different copies of same entity. If one is not careful, various parts of the application may be working on stale data from the session cache. This is where it also becomes absolute compulsory for applications to maintain inverse side of relationship.

Another issue is that if there are say 1000 clients fetching an entity there will be 1001 copies of the entity in the memory. One in L2 cache and 1000 in L1 cache , different cloned copy for each entity manager. If your application is running out of memory, this could be one of the reason. If there are entities which are very frequently changed it would be better to disable shared cache for them. In toplink you can do this by specifying a property in your application's persistence.

"<property name="toplink.cache.shared.EntityName" value="false" />"

Since it appears L2 cache will continue to remain in "MSM" it would be better to analyse its use case in your application.

Tic Tech Toe

Saturday, July 21, 2007

Toplink caching

Monday, July 9, 2007

What is in the phrase "Hello World"

Blog Archive

Blog I follow