WebSphere Portal v6 cluster secondary node performance issues
Posted by Vivek Agarwal on August 19, 2007
I am back writing about performance issues that we are running into with the secondary node in a WebSphere Portal v6 cluster. I had written sometime back about a startup performance issue on a secondary node in a cluster – at the time, we were not seeing any performance issues subsequent to server start-up. However, now we are seeing horrendous performance issues on the secondary node – running a JMeter test against the secondary node with 2 threads with a think time of 2-9 seconds causes the CPU on the secondary node to be pegged at 100%. Running the exact same test against the primary node with 10 threads causes the CPU utilization to hover between 10% and 20%. It is extremely puzzling to us why this should be the case with the exact same hardware and exact same network configuration. The situation on the secondary node eventually gets to the point where WebSphere Portal ends up with a ton of hung threads and stops responding totally.
We had some wild speculative theories about what could be causing the issue – our suspicion was around the distributed caching with us seeing a lot of Data Replication Service (DRS) activity. We saw an APAR that seemed reasonably close to our situation but that issue was supposedly fixed in an earlier WAS version than what we are running. We are chasing the issue with IBM support and hope to have it resolved soon; curious if there is anybody else running into this issue at all?


Edward said
Vivek, I will pass this along to my Websphere Portal Support staff to see if we can help you. Would you be open to using a Websphere Consulting firm?
Ed
Peningo Systems
Vivek Agarwal said
Ed,
I would appreciate any pointers if your folks have experienced the same issues. At this point, I do not believe we would be looking to use another consulting firm. We are trying a few things internally and also working with IBM support to resolve this issue.
Thanks, Vivek.
Kaustubh said
Vivek,
Is this issue now resolved?
Did you by any chance, enable the cache replication on the secondary cluster member? I’m seeing similar issue when i did that on my cluster env. And I’m looking for some solution here.
Thanks
Kaustubh