hi everyone,
My manager told to me of my PRD server was down yesterday morning. My manager said some users could not login the system , and he found one app server was down. He restarted the app server and seems everything is ok now. But he asked me to find out what caused the server down.
I have checked the dev trace files of the app server today.
I did not find any useful log in the dev_disp.old.
but I found the some dev_wp*.old files record like below (the server has 76 wp, 0~19 record like below)
M Tue May 28 06:55:22 2013
M ThAlarmHandler (1)
M ThAlarmHandler: set CONTROL_TIMEOUT/DP_CONTROL_JAVA_EXIT and break sql
B db_sqlbreak() = 1
M ThAlarmHandler: return from signal handler
M
M Tue May 28 06:56:22 2013
M ThAlarmHandler (2)
M ThAlarmHandler: 2. ALARM: terminate process (pid=9408, user is T138/M0)
M ThAlarmHandler: prv_action of W0: 0x2
M ThAlarmHandler: set clean state of T138/M0 to DP_TIMEOUT
M ThAlarmHandler: prv_action of W0: 0xa
M ThAlarmHandler: save snc contexts
M ThISncSaveAllContexts: save 0 snc contexts
M ThAlarmHandler: C-Stack during alarm handler
M C-STACK
(0) 0x4000000001b363b0 CTrcStack + 0x1d0 at dptstack.c:227 [dw.sapP20_D20]
(1) 0x4000000001733100 ThAlarmHandler + 0x11e0 at thxxhead.c:21417 [dw.sapP20_D20]
(2) 0x4000000001664520 DpSigAlrm + 0x220 at dpxxtool.c:2295 [dw.sapP20_D20]
(3) 0xe00000013305f440 Signal 14 (SIGALRM) delivered
(4) 0xc00000000054ee70 _semop_sys + 0x30 [/usr/lib/hpux64/libc.so.1]
(5) 0xc0000000005607e0 _semop + 0xe0 at ../../../../../core/libs/libc/shared_em_64_perf/../core/syscalls/t_semop.c:19 [/usr/lib/hpux64/libc.so.1]
(6) 0x4000000001707680 RqOsSem + 0xb0 at semux.c:1186 [dw.sapP20_D20]
(7) 0x40000000017097a0 SemRq + 0x810 at semux.c:1814 [dw.sapP20_D20]
(8) 0x4000000004cc2990 EsILock + 0x2410 at esxx.c:3449 [dw.sapP20_D20]
(9) 0x4000000004cca410 STD_EsAttach + 0x1d0 at esxx.c:2348 [dw.sapP20_D20]
(10) 0x4000000004cd5110 EsAttach + 0x90 at esxxfunc.c:874 [dw.sapP20_D20]
(11) 0x4000000004c988c0 EmContextAttach + 0x1e0 at emxx.c:932 [dw.sapP20_D20]
(12) 0x40000000018e20a0 ThCheckEmState + 0x300 at thxxmem.c:438 [dw.sapP20_D20]
(13) 0x40000000018dd780 ThRollIn + 0x380 at thxxmem.c:870 [dw.sapP20_D20]
(14) 0x400000000175bc20 ThSessionRestore + 0x180 at thxxhead.c:22129 [dw.sapP20_D20]
(15) 0x40000000017250b0 TskhLoop + 0x1210 at thxxhead.c:3542 [dw.sapP20_D20]
(16) 0x400000000171f000 ThStart + 0x5d0 at thxxhead.c:10759 [dw.sapP20_D20]
(17) 0x40000000015ab260 DpMain + 0x870 at dpxxdisp.c:1152 [dw.sapP20_D20]
(18) 0x40000000015a4b60 main + 0x80 at thxxanf.c:64 [dw.sapP20_D20]
(19) 0xc00000000006e9b0 main_opd_entry + 0x50 [/usr/lib/hpux64/dld.so]
M
M ***LOG Q02=> wp_halt, WPStop (Workproc 0 9408) [dpuxtool.c 268]
other wp*.old just record wp heap memory is not enough and asked us to change the parameter.
Could anyone give me any suggestions how to find out what caused the issues?
regards .