on 07-06-2009 3:58 PM
We are having issues with our new 7.7 instance which we have not seen with any older version. To me it seems like vserver is somehow hanging or some dbmsrv processes. I tried to look at the running instance with dbstudio but it also hanged. So to get us running quickly I simply killed the running dbmsrv processes. Then I was able to shutdown the instance and restart vserver and the instance.
Everything seems to run fine again but the question remains why we get into this hang condition after it works for a week or so without any problems. I have checked the logs but can't find anything useful. Is there any known issue which can cause this?
Hello Simon,
1. It will be helpful to check the database activities using 'x_cons' tool.
The are entries "Killing T120 appl. died PID:15404" in knlMsg file & you wrote
"I simply killed the running dbmsrv processes." The dbmsrv_omega.corp.invoca.ch.err.
posted information also may be due the fact that you killed the dbmsrv processes.
2. It will be easy to run the analysis of this issue in this thread, if
you could give the link to the downloaded files as you did in another thread.
3. If the case is reproduced, please collect following information :
A) x_cons <SID> show active
< 2-3 times >
x_cons <SID> debugtask T<nnn>
< if the T<nnn> was run long time, hang. The KnlMsg file will have additional information. >
x_cons <SID> sh all 10 10 > xcons_all.txt
< The runtime of the command will be 100 seconds and the output will be
written to the file "xcons_all.txt" >
B) ps -efe | grep sdb > sdb_processes.txt
ps -efe | grep dbmsrv > dbmsrv_processes.txt
uname -a
C) KnlMsg, KnlMsgarc, dbmsrv_omega.corp.invoca.ch.err files
Please give the link to review the collected information/files.
Thank you and best regards, Natalia Khlopina
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Natalia,
Thanks for your comments. I try to reply to your points:
1) Killing the dbmsrv processes has nothing to do with the "Killing T120 appl. died PID:15404" messages. I never killed dbmsrv processeses before but we also had the "Killing T120 appl. died PID:15404" messages before killing dbmsrv.
Killing the dbmsrv processes was simply to get us up and running ASAP. After killing them I was able to shutdown the instance cleanly and restart. I even don't know where those dbmsrv processes came from, because during normal operation, no dbmsrv processes are running. I think it has something to do with running dbstudio (from another box).
But the errors in dbmsrv_omega.corp.invoca.ch.err may well be from killing the dbmsrv processes. I know that killing with SIGTERM is usually not the way to do it but I didn't find another way.
2) Find the logs here:
[dbmsrv_omega.corp.invoca.ch.err|http://www.invoca.ch/pub/dbmsrv_omega.corp.invoca.ch.err]
[KnlMsgArchive.txt|http://www.invoca.ch/pub/KnlMsgArchive.txt]
[KnlMsg.txt|http://www.invoca.ch/pub/KnlMsg.txt]
3) I'll run x_cons as soon as we have the same condition again. Unfortunately we're still trying to find out how to reproduce it.
We had another series of '-9400 AK Cachedirectory full' errors last night but the instance is still running and seems to do fine. I really wonder why we get them and how we could get rid of them?
Regards,
Simon
Here are the kernel logs from the time when the instance became unresponsive:
Thread 0xFC2 Task 145 2009-07-06 15:07:17 CONNECT 12633: Connect req. (INVOCADB, T145, connection obj. 0x2aabaa1d5a38, Node:'alpha.corp.invoca.ch', PID: 7789)
Thread 0xFC3 Task 155 2009-07-06 15:07:41 CONNECT 12633: Connect req. (INVOCADB, T155, connection obj. 0x2aabaa2127b8, Node:'alpha.corp.invoca.ch', PID: 7789)
Thread 0xFC3 Task 155 2009-07-06 15:07:47 CONNECT 12677: Client has released connection, T155
Thread 0xFC3 Task 155 2009-07-06 15:07:47 CONNECT 12651: Connection released (INVOCADB, T155, connection obj. 2aabaa2127b8)
Thread 0xFC1 Task 133 2009-07-06 15:08:09 CONNECT 12677: Client has released connection, T133
Thread 0xFC1 Task 133 2009-07-06 15:08:09 CONNECT 12651: Connection released (INVOCADB, T133, connection obj. 2aabaa20d0b0)
Thread 0xFC1 Task 133 2009-07-06 15:08:14 CONNECT 12633: Connect req. (INVOCADB, T133, connection obj. 0x2aabaa2127b8, Node:'alpha.corp.invoca.ch', PID: 8264)
Thread 0xFB0 Task - 2009-07-06 15:08:21 CONNECT 12629: Killing T120 appl. died PID:15404
Thread 0xFC3 Task 155 2009-07-06 15:08:29 CONNECT 12633: Connect req. (INVOCADB, T155, connection obj. 0x2aabaa20d0b0, Node:'alpha.corp.invoca.ch', PID: 8405)
Thread 0xFB0 Task - 2009-07-06 15:08:31 CONNECT 12629: Killing T120 appl. died PID:15404
Thread 0xFB0 Task - 2009-07-06 15:08:41 CONNECT 12629: Killing T120 appl. died PID:15404
Thread 0xFB0 Task - 2009-07-06 15:08:51 CONNECT 12629: Killing T120 appl. died PID:15404
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
However, I don't think something useful can be found in the instance log because the instance itself seems to have worked fine.
I found some 'bad looking' messages in dbmsrv_omega.corp.invoca.ch.err. Could they be related?
PID 6387: 22: __gxx_personality_v0@@CXXABI_1.3 + 0xf2
PID 6192: 15: DBMSrvFrmReq_HandlerClassic::handleRequest(DBMSrvFrmReq_Request const&, DBMSrvFrmRep_Reply&, bool&) + 0x22bf
PID 6387: SFrame: IP: 0x000000000042dd0a (0x000000000042dc18+0xf2)
PID 6192: Symbol: ZN27DBMSrvFrmReqHandlerClassic13handleRequestERK20DBMSrvFrmReq_RequestR18DBMSrvFrmRep_ReplyRb
PID 6387: Source: start.S:116PID 6192: SFrame: IP: 0x000000000053517f (0x0000000000532ec0+0x22bf)
PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv
PID 6387: Module: /opt/sdb/7706/pgm/dbmsrv
PID 6192: -
PID 6192: 16: DBMSrvFrmReq_Handler::handle(DBMSrvFrmReq_Request const&, DBMSrvFrmRep_Reply&, bool&) + 0x430
PID 6192: Symbol: ZN20DBMSrvFrmReqHandler6handleERK20DBMSrvFrmReq_RequestR18DBMSrvFrmRep_ReplyRb
PID 6192: SFrame: IP: 0x0000000000532d30 (0x0000000000532900+0x430)
PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv
PID 6192: -
PID 6192: 17: DBMSrvThread_MainThread::DBMSrvThreadMain() + 0x2a7
PID 6192: Symbol: ZN23DBMSrvThreadMainThread16DBMSrvThreadMainEv
PID 6192: SFrame: IP: 0x000000000043eec7 (0x000000000043ec20+0x2a7)
PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv
PID 6192: -
PID 6192: 18: RTEThread_Thread::CThreadMain(void*) + 0x5e
PID 6192: Symbol: ZN16RTEThreadThread11CThreadMainEPv
PID 6192: SFrame: IP: 0x00000000007d2a3e (0x00000000007d29e0+0x5e)
PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv
PID 6192: -
PID 6192: 19: RTEThread_Thread::AppointMainThreadToThreadObject(int, SAPDBErr_MessageList&) + 0x2f0
PID 6192: Symbol: ZN16RTEThreadThread31AppointMainThreadToThreadObjectEiR20SAPDBErr_MessageList
PID 6192: SFrame: IP: 0x00000000007d4520 (0x00000000007d4230+0x2f0)
PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv
PID 6192: -
PID 6192: 20: main + 0x97
PID 6192: SFrame: IP: 0x000000000042de57 (0x000000000042ddc0+0x97)
PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv
PID 6192: -
PID 6192: 21: dltls_get_addr_soft@@GLIBC_PRIVATE + 0x1d974
PID 6192: SFrame: IP: 0x000000361a61d974 (0x000000361a600000+0x1d974)
PID 6192: Module: /lib64/libc-2.5.so
PID 6192: -
PID 6192: 22: __gxx_personality_v0@@CXXABI_1.3 + 0xf2
PID 6192: SFrame: IP: 0x000000000042dd0a (0x000000000042dc18+0xf2)
PID 6192: Source: start.S:116
PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv
User | Count |
---|---|
93 | |
10 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.