cancel
Showing results for 
Search instead for 
Did you mean: 

Unresponsive MaxDB server, instance seems alive

simon_matter
Participant
0 Kudos

We are having issues with our new 7.7 instance which we have not seen with any older version. To me it seems like vserver is somehow hanging or some dbmsrv processes. I tried to look at the running instance with dbstudio but it also hanged. So to get us running quickly I simply killed the running dbmsrv processes. Then I was able to shutdown the instance and restart vserver and the instance.

Everything seems to run fine again but the question remains why we get into this hang condition after it works for a week or so without any problems. I have checked the logs but can't find anything useful. Is there any known issue which can cause this?

Accepted Solutions (1)

Accepted Solutions (1)

former_member229109
Active Contributor
0 Kudos

Hello Simon,

1. It will be helpful to check the database activities using 'x_cons' tool.

The are entries "Killing T120 appl. died PID:15404" in knlMsg file & you wrote

"I simply killed the running dbmsrv processes." The dbmsrv_omega.corp.invoca.ch.err.

posted information also may be due the fact that you killed the dbmsrv processes.

2. It will be easy to run the analysis of this issue in this thread, if

you could give the link to the downloaded files as you did in another thread.

3. If the case is reproduced, please collect following information :

A) x_cons <SID> show active

< 2-3 times >

x_cons <SID> debugtask T<nnn>

< if the T<nnn> was run long time, hang. The KnlMsg file will have additional information. >

x_cons <SID> sh all 10 10 > xcons_all.txt

< The runtime of the command will be 100 seconds and the output will be

written to the file "xcons_all.txt" >

B) ps -efe | grep sdb > sdb_processes.txt

ps -efe | grep dbmsrv > dbmsrv_processes.txt

uname -a

C) KnlMsg, KnlMsgarc, dbmsrv_omega.corp.invoca.ch.err files

Please give the link to review the collected information/files.

Thank you and best regards, Natalia Khlopina

simon_matter
Participant
0 Kudos

Hi Natalia,

Thanks for your comments. I try to reply to your points:

1) Killing the dbmsrv processes has nothing to do with the "Killing T120 appl. died PID:15404" messages. I never killed dbmsrv processeses before but we also had the "Killing T120 appl. died PID:15404" messages before killing dbmsrv.

Killing the dbmsrv processes was simply to get us up and running ASAP. After killing them I was able to shutdown the instance cleanly and restart. I even don't know where those dbmsrv processes came from, because during normal operation, no dbmsrv processes are running. I think it has something to do with running dbstudio (from another box).

But the errors in dbmsrv_omega.corp.invoca.ch.err may well be from killing the dbmsrv processes. I know that killing with SIGTERM is usually not the way to do it but I didn't find another way.

2) Find the logs here:

[dbmsrv_omega.corp.invoca.ch.err|http://www.invoca.ch/pub/dbmsrv_omega.corp.invoca.ch.err]

[KnlMsgArchive.txt|http://www.invoca.ch/pub/KnlMsgArchive.txt]

[KnlMsg.txt|http://www.invoca.ch/pub/KnlMsg.txt]

3) I'll run x_cons as soon as we have the same condition again. Unfortunately we're still trying to find out how to reproduce it.

We had another series of '-9400 AK Cachedirectory full' errors last night but the instance is still running and seems to do fine. I really wonder why we get them and how we could get rid of them?

Regards,

Simon

Answers (1)

Answers (1)

simon_matter
Participant
0 Kudos

Here are the kernel logs from the time when the instance became unresponsive:

Thread  0xFC2 Task    145  2009-07-06 15:07:17     CONNECT    12633:  Connect req. (INVOCADB, T145, connection obj. 0x2aabaa1d5a38, Node:'alpha.corp.invoca.ch', PID: 7789)
Thread  0xFC3 Task    155  2009-07-06 15:07:41     CONNECT    12633:  Connect req. (INVOCADB, T155, connection obj. 0x2aabaa2127b8, Node:'alpha.corp.invoca.ch', PID: 7789)
Thread  0xFC3 Task    155  2009-07-06 15:07:47     CONNECT    12677:  Client has released connection, T155
Thread  0xFC3 Task    155  2009-07-06 15:07:47     CONNECT    12651:  Connection released (INVOCADB, T155, connection obj. 2aabaa2127b8)
Thread  0xFC1 Task    133  2009-07-06 15:08:09     CONNECT    12677:  Client has released connection, T133
Thread  0xFC1 Task    133  2009-07-06 15:08:09     CONNECT    12651:  Connection released (INVOCADB, T133, connection obj. 2aabaa20d0b0)
Thread  0xFC1 Task    133  2009-07-06 15:08:14     CONNECT    12633:  Connect req. (INVOCADB, T133, connection obj. 0x2aabaa2127b8, Node:'alpha.corp.invoca.ch', PID: 8264)
Thread  0xFB0 Task      -  2009-07-06 15:08:21     CONNECT    12629:  Killing T120 appl. died PID:15404
Thread  0xFC3 Task    155  2009-07-06 15:08:29     CONNECT    12633:  Connect req. (INVOCADB, T155, connection obj. 0x2aabaa20d0b0, Node:'alpha.corp.invoca.ch', PID: 8405)
Thread  0xFB0 Task      -  2009-07-06 15:08:31     CONNECT    12629:  Killing T120 appl. died PID:15404
Thread  0xFB0 Task      -  2009-07-06 15:08:41     CONNECT    12629:  Killing T120 appl. died PID:15404
Thread  0xFB0 Task      -  2009-07-06 15:08:51     CONNECT    12629:  Killing T120 appl. died PID:15404

simon_matter
Participant
0 Kudos

However, I don't think something useful can be found in the instance log because the instance itself seems to have worked fine.

I found some 'bad looking' messages in dbmsrv_omega.corp.invoca.ch.err. Could they be related?

PID 6387: 22: __gxx_personality_v0@@CXXABI_1.3 + 0xf2

PID 6192: 15: DBMSrvFrmReq_HandlerClassic::handleRequest(DBMSrvFrmReq_Request const&, DBMSrvFrmRep_Reply&, bool&) + 0x22bf

PID 6387: SFrame: IP: 0x000000000042dd0a (0x000000000042dc18+0xf2)

PID 6192: Symbol: ZN27DBMSrvFrmReqHandlerClassic13handleRequestERK20DBMSrvFrmReq_RequestR18DBMSrvFrmRep_ReplyRb

PID 6387: Source: start.S:116PID 6192: SFrame: IP: 0x000000000053517f (0x0000000000532ec0+0x22bf)

PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

PID 6387: Module: /opt/sdb/7706/pgm/dbmsrv

PID 6192: -


PID 6192: 16: DBMSrvFrmReq_Handler::handle(DBMSrvFrmReq_Request const&, DBMSrvFrmRep_Reply&, bool&) + 0x430

PID 6192: Symbol: ZN20DBMSrvFrmReqHandler6handleERK20DBMSrvFrmReq_RequestR18DBMSrvFrmRep_ReplyRb

PID 6192: SFrame: IP: 0x0000000000532d30 (0x0000000000532900+0x430)

PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

PID 6192: -


PID 6192: 17: DBMSrvThread_MainThread::DBMSrvThreadMain() + 0x2a7

PID 6192: Symbol: ZN23DBMSrvThreadMainThread16DBMSrvThreadMainEv

PID 6192: SFrame: IP: 0x000000000043eec7 (0x000000000043ec20+0x2a7)

PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

PID 6192: -


PID 6192: 18: RTEThread_Thread::CThreadMain(void*) + 0x5e

PID 6192: Symbol: ZN16RTEThreadThread11CThreadMainEPv

PID 6192: SFrame: IP: 0x00000000007d2a3e (0x00000000007d29e0+0x5e)

PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

PID 6192: -


PID 6192: 19: RTEThread_Thread::AppointMainThreadToThreadObject(int, SAPDBErr_MessageList&) + 0x2f0

PID 6192: Symbol: ZN16RTEThreadThread31AppointMainThreadToThreadObjectEiR20SAPDBErr_MessageList

PID 6192: SFrame: IP: 0x00000000007d4520 (0x00000000007d4230+0x2f0)

PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

PID 6192: -


PID 6192: 20: main + 0x97

PID 6192: SFrame: IP: 0x000000000042de57 (0x000000000042ddc0+0x97)

PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

PID 6192: -


PID 6192: 21: dltls_get_addr_soft@@GLIBC_PRIVATE + 0x1d974

PID 6192: SFrame: IP: 0x000000361a61d974 (0x000000361a600000+0x1d974)

PID 6192: Module: /lib64/libc-2.5.so

PID 6192: -


PID 6192: 22: __gxx_personality_v0@@CXXABI_1.3 + 0xf2

PID 6192: SFrame: IP: 0x000000000042dd0a (0x000000000042dc18+0xf2)

PID 6192: Source: start.S:116

PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv