Skip to Content
0
Former Member
Feb 12, 2009 at 07:47 PM

DI Job servers were terminated randomly on HPUX

56 Views

Hi there,

I have a strange problem with a DI 11.5.3 on an HPUX PA-RISC server. There are 21 Job servers with each there are 4 or 5 repositories attached. Every couple of days, 1 or 2 Job servers are misteriously terminated. The servers crashed seem to be random, ie could be Job server 1 yesterday, and Job server 21 today. And it could be a Job server that is running a batch job, or one without running any job at all. Sometimes there is a core dump but sometimes there is none.

The HP box is an extremely powerful server with plenty of CPUs and memory. After monitoring the system closely with Glance, I found out that Job servers tend to crash when the disk utilization stays 100% for a long time which may be caused by some long running database scripts. It won't happen immediately, but more Job servers start to fail the longer this happens. For example, one Job server may fail 30 minutes after this happens, and another will fail after 2 hours. It's unpredicable when and what Job servers will crash. My question is why the Job server process is killed, not other application processes on the server? Is it killed by the OS or the DI background service? If OS, is it because they are of a lower priority or something else?

Thanks,

Larry