Tic Tech Toe

Wednesday, April 21, 2010

GIDS Day 2.

"Small Cap, Mid Cap, Large Cap .. it doesn't matter what cap a stock is, add it to your portfolio as far as it brings a value to you."

This is a promo line of a popular stock market analysis program on Television.

My take from GIDS is quite similar..

"Presentation tier, Application tier, Data tier.. it doesn't matter which tier a technology is, add it to your knowledge base as far as this knowledge brings your software a value".

Few years back when one attend a web conference one would only hear 10 presenters giving 100 different definitions of Web 2.0 and Ajax . But today I saw the same presenter discussing flash in one session and power of cloud computing in another. Google and Ajax has made the end user so demanding that it has forced a developer to move beyond his comfort zone and learn jQuery and cloud computing concepts at the same time.

Coming back to GDIS it started with Marty Hall's session on "Rich Internet Application" where he started with what is ajax and then went on to give example using plain jQuery, JSF 2.0 and GWT.

Scott Davis then decided to move beyond the curly braces and discussed Web 2.0 concepts. It was a great and enjoyable presentation but to be honest I wanted to see some grail code within those hated curly braces. But he promised there will be enough of it tomorrow.

These were followed by parallel sessions and one can choose to attend those one like the most.

I attended one of the less crowded presentation on "Longevity of Scalable System" by Yahoo's Nishad Kamat. He covered how to increase the shelf life of a software in the current competitive and dynamically changing technical echo system. Presentation covered how one should overcome the familiar dilemma between home grown and imported technology.

Matthew McCullough's Hadoop's intro was another good thing to follow. A good intro to Map Reduce Algo and the problem it is designed to solve. Not sure if I understood all of it but I am sure very soon I am going to download Hadoop and try to develop some simple searches on my tomcat log files.

I wanted to attend the "NoSQL:The Shift to a Non-relational world" presentation but thanks to the volcano ashes, the presenter could not make to Bangalore. Ramesh Srinivasraghavan from Adobe who earlier talked about flex took his slot and gave introduction to cloud as an environment and the challenges it brings to web development.

McCullogh ended the day with the important session on web debugging tool. He discussed tcpdump, netstat, curl, wireshark, MODI, Jmeter, JASH etc. I was surprised that he didn't talk about fiddler and our so simple live httpheaders.

Looking from the kind of presentation and audience responses, one can say that developer curiosity has shifted from how to build web 2.o application to how to scale a web 2.0 application.

Organization of the even was very professional.. I just wished there was more time given for questions and networking. I would try to put a detail plus and minus post after tomorrow's presentation. But I am happy to have spent my time and money on event like this.

Saturday, July 21, 2007

Toplink caching

Despite using Java Persistence API (JPA) for quite some time now, it continues to remain an interesting topic to explore. I think a lot of credit for it should go to the number of "may"s used in the specification. I asked Deeps why does spec uses "may" or "may not" when it should have used "will" or "will not". His reply was short but convincing.

The spec is as much for the providers as it is for application developers. There should be enough incentive left for the providers to showcase that they are better than others. The word "may" gives that incentive. I call this phenomenon as MSM ..."Magic Sphere of May". This sphere is magical and wonderful but only issue is no body knows its perimeter.

As far as JPA is concerned most important thing that comes in this sphere is caching of Entity Beans. I made a small attempt to explore it. Since I have used toplink essentials, my observations are based on it but i am sure Hibernate and others will not be much different.

Toplink has two levels of caching generally termed as L1 cache and L2 cache.

L1 cache:

The key concept of JPA is persistence context. JPA defines persistence context as a set of entity instances in which for any persistent entity identity there is a unique entity instance. Its within this persistence context that the life cycle of entities are managed. In simple words persistence context is like a cache which holds all the managed entities , and the EntityManager interface API works off this cache. Probably that is why in toplink terminology persistence context is referred as "Unit of Work", and the cache managing this persistence context as "Unit of Work cache".

Each entitymanager instance is associated with a persistence context. All entities are cached against their persistent identities (or primary key). Whenever we do operations like persist or merge, entities are registered with unit of work. Unit of work then maintains a change set to remember all modifications done on a registered entity. Since any changes made to registered entities will eventually be synchronized to database, they are also referred as "managed entities". You can play around with the unit of work...update entities, bring new entities, remove entities, but the db queries will be made only when you commit (or flush) the unit of Unit of Work .

Consider this very simple example to see what we are gaining out of this.

1 EntityManager em = emf.createEntityManager();
2 em.getTransaction().begin();
3 Product table = new Product();
4 em.persist(table);//No insert query here

5 table.setName("Wood Table");// In memory change to the managed entity

6 em.getTransaction().commit(); //one insert query here
7 em.clear(); //clears the unit of work cache, entity "table" is no more managed by unit of work

One can see how wonderfully entities are managed behind the scenes as db queries are fired only when the state of the in memory entity is different from the state of entity in db.

8 em.getTransaction().begin();
9 table = em.find(Product.class, table.getId());// one select query here, entity table again
becomes managed
10 table.setName("Steel Table");//In memory change to the managed entity

11 table.setName("Wood Table");//In memory entity changed back to the original state

12 em.getTransaction().commit();//no db queries here

13 em.clear();

Here since the state of entity when it was brought into persistence context by running a finder at line 9 is same as at line 12, there is absolutely zero database interaction on commit.There is nothing in the Unit of Work change set to commit. If you are using JPA in Java SE mode entities remain managed beyond transaction scope, until you clear or close the em. So Had i not cleared the persistence context at line 7 there won't be any select query fired either (at line 9) as entity to be found is already available in Unit of Work cache from the previous transaction.

L1 cache surely saves us many expensive queries.

L2 cache

Having said that it is very unlikely that an application will operate on just one persistence context .Normally when a transaction is completed EntityManager is closed and new persistence context is used in another transaction. Also an application may have several clients trying to fetch same data .In this case L1 cache is not sufficient as one client will not be able to take advantage of another client's cache and will always fire expensive database queries to retrieve entities..

To solve this problem toplink has one more level of caching whose scope is across persistence contexts. This cache is at persistence unit level and all entitymanager created from the same EntitymanagerFactory share this cache. Advantage here is em.find() will search in L2 cache first and hit the db only when the desired entity is not found there.

In the above example if I enable session cache(this is by default) there is no db query fired at line 9.This is because even though I have cleared the persistence context cache, the entity is still available in L2 cache. If there are say 1000 clients asking for same entity, L2 cache clearly saves another 999 trips to db.

However there is one significant difference when you obtain an entity from L1 cache and when you get it from L2 cache.

Product table = em.find(Product.class, id);

Product table1 = em.find(Product.class, id);
Product table2 = em.find(Product.class, id);
System.out.println("Session vs L1 "+(table1==table));
System.out.println("L1 vs L1 "+(table1==table2));

output is

Session vs L1 = true

L1 vs L1 =false
...

We can see that the if an entity is given from L1 cache it is given as such, but when it is given from the L2 cache first a clone of entity is created and that clone it is given to client. This means that any changes made to one entity are not immediately reflected on the L2 cache. Its only when a unit of work successfully commits or explicitly calls em.refresh, topLink updates the session cache.

Obviously changes made to an entity obtained by client 1 is not visible to client 2 as both the clients are operating on different copies of same entity. If one is not careful, various parts of the application may be working on stale data from the session cache. This is where it also becomes absolute compulsory for applications to maintain inverse side of relationship.

Another issue is that if there are say 1000 clients fetching an entity there will be 1001 copies of the entity in the memory. One in L2 cache and 1000 in L1 cache , different cloned copy for each entity manager. If your application is running out of memory, this could be one of the reason. If there are entities which are very frequently changed it would be better to disable shared cache for them. In toplink you can do this by specifying a property in your application's persistence.

"<property name="toplink.cache.shared.EntityName" value="false" />"

Since it appears L2 cache will continue to remain in "MSM" it would be better to analyse its use case in your application.

Monday, July 9, 2007

What is in the phrase "Hello World"

It was summer of 2001, I was sitting before my brand new Pentium machine, desperately trying to figure out something. Trust me it was not at all fun. Without any doubt that was the most challenging one hour of my life.

But finally after some hundred useless hits at ctrl+F9, the ugly blue screen was gone and I could see the String "Hello World" blinking bright against the black DOS background.
Yup.. u guessed it right.. It was my first computer program.Oh God !!How delighted I was looking at "Hello World".So much was my pleasure that I didn't even care that I was sitting before a stand alone computer saying Hello World to myself!!

That time out of some 100 billion English phrases I chose "Hello World" because

Mr Robert Lafore asked me to do so.
My intelligent friends typed the same string a day before and my pride would have been deeply hurt if I could not have done the same.

Today 6 years later when I am writing this post I am again choosing this hackneyed string to launch my blog.May be no body will read it, but I can smell the aroma of the same old happiness. This time there are no complex reasons for picking up "Hello World"
This time I just wanted to say
"Hello World"