Skip to Content
avatar image
Former Member

Unresponsive MaxDB server, instance seems alive

We are having issues with our new 7.7 instance which we have not seen with any older version. To me it seems like vserver is somehow hanging or some dbmsrv processes. I tried to look at the running instance with dbstudio but it also hanged. So to get us running quickly I simply killed the running dbmsrv processes. Then I was able to shutdown the instance and restart vserver and the instance.

Everything seems to run fine again but the question remains why we get into this hang condition after it works for a week or so without any problems. I have checked the logs but can't find anything useful. Is there any known issue which can cause this?

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

2 Answers

  • Best Answer
    avatar image
    Former Member
    Jul 07, 2009 at 09:31 PM

    Hello Simon,

    1. It will be helpful to check the database activities using 'x_cons' tool.

    The are entries "Killing T120 appl. died PID:15404" in knlMsg file & you wrote

    "I simply killed the running dbmsrv processes." The dbmsrv_omega.corp.invoca.ch.err.

    posted information also may be due the fact that you killed the dbmsrv processes.

    2. It will be easy to run the analysis of this issue in this thread, if

    you could give the link to the downloaded files as you did in another thread.

    3. If the case is reproduced, please collect following information :

    A) x_cons <SID> show active

    < 2-3 times >

    x_cons <SID> debugtask T<nnn>

    < if the T<nnn> was run long time, hang. The KnlMsg file will have additional information. >

    x_cons <SID> sh all 10 10 > xcons_all.txt

    < The runtime of the command will be 100 seconds and the output will be

    written to the file "xcons_all.txt" >

    B) ps -efe | grep sdb > sdb_processes.txt

    ps -efe | grep dbmsrv > dbmsrv_processes.txt

    uname -a

    C) KnlMsg, KnlMsgarc, dbmsrv_omega.corp.invoca.ch.err files

    Please give the link to review the collected information/files.

    Thank you and best regards, Natalia Khlopina

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Hi Natalia,

      Thanks for your comments. I try to reply to your points:

      1) Killing the dbmsrv processes has nothing to do with the "Killing T120 appl. died PID:15404" messages. I never killed dbmsrv processeses before but we also had the "Killing T120 appl. died PID:15404" messages before killing dbmsrv.

      Killing the dbmsrv processes was simply to get us up and running ASAP. After killing them I was able to shutdown the instance cleanly and restart. I even don't know where those dbmsrv processes came from, because during normal operation, no dbmsrv processes are running. I think it has something to do with running dbstudio (from another box).

      But the errors in dbmsrv_omega.corp.invoca.ch.err may well be from killing the dbmsrv processes. I know that killing with SIGTERM is usually not the way to do it but I didn't find another way.

      2) Find the logs here:

      [dbmsrv_omega.corp.invoca.ch.err|http://www.invoca.ch/pub/dbmsrv_omega.corp.invoca.ch.err]

      [KnlMsgArchive.txt|http://www.invoca.ch/pub/KnlMsgArchive.txt]

      [KnlMsg.txt|http://www.invoca.ch/pub/KnlMsg.txt]

      3) I'll run x_cons as soon as we have the same condition again. Unfortunately we're still trying to find out how to reproduce it.

      We had another series of '-9400 AK Cachedirectory full' errors last night but the instance is still running and seems to do fine. I really wonder why we get them and how we could get rid of them?

      Regards,

      Simon

  • avatar image
    Former Member
    Jul 06, 2009 at 03:00 PM

    Here are the kernel logs from the time when the instance became unresponsive:

    Thread  0xFC2 Task    145  2009-07-06 15:07:17     CONNECT    12633:  Connect req. (INVOCADB, T145, connection obj. 0x2aabaa1d5a38, Node:'alpha.corp.invoca.ch', PID: 7789)
    Thread  0xFC3 Task    155  2009-07-06 15:07:41     CONNECT    12633:  Connect req. (INVOCADB, T155, connection obj. 0x2aabaa2127b8, Node:'alpha.corp.invoca.ch', PID: 7789)
    Thread  0xFC3 Task    155  2009-07-06 15:07:47     CONNECT    12677:  Client has released connection, T155
    Thread  0xFC3 Task    155  2009-07-06 15:07:47     CONNECT    12651:  Connection released (INVOCADB, T155, connection obj. 2aabaa2127b8)
    Thread  0xFC1 Task    133  2009-07-06 15:08:09     CONNECT    12677:  Client has released connection, T133
    Thread  0xFC1 Task    133  2009-07-06 15:08:09     CONNECT    12651:  Connection released (INVOCADB, T133, connection obj. 2aabaa20d0b0)
    Thread  0xFC1 Task    133  2009-07-06 15:08:14     CONNECT    12633:  Connect req. (INVOCADB, T133, connection obj. 0x2aabaa2127b8, Node:'alpha.corp.invoca.ch', PID: 8264)
    Thread  0xFB0 Task      -  2009-07-06 15:08:21     CONNECT    12629:  Killing T120 appl. died PID:15404
    Thread  0xFC3 Task    155  2009-07-06 15:08:29     CONNECT    12633:  Connect req. (INVOCADB, T155, connection obj. 0x2aabaa20d0b0, Node:'alpha.corp.invoca.ch', PID: 8405)
    Thread  0xFB0 Task      -  2009-07-06 15:08:31     CONNECT    12629:  Killing T120 appl. died PID:15404
    Thread  0xFB0 Task      -  2009-07-06 15:08:41     CONNECT    12629:  Killing T120 appl. died PID:15404
    Thread  0xFB0 Task      -  2009-07-06 15:08:51     CONNECT    12629:  Killing T120 appl. died PID:15404

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      However, I don't think something useful can be found in the instance log because the instance itself seems to have worked fine.

      I found some 'bad looking' messages in dbmsrv_omega.corp.invoca.ch.err. Could they be related?

      PID 6387: 22: __gxx_personality_v0@@CXXABI_1.3 + 0xf2

      PID 6192: 15: DBMSrvFrmReq_HandlerClassic::handleRequest(DBMSrvFrmReq_Request const&, DBMSrvFrmRep_Reply&, bool&) + 0x22bf

      PID 6387: SFrame: IP: 0x000000000042dd0a (0x000000000042dc18+0xf2)

      PID 6192: Symbol: ZN27DBMSrvFrmReqHandlerClassic13handleRequestERK20DBMSrvFrmReq_RequestR18DBMSrvFrmRep_ReplyRb

      PID 6387: Source: start.S:116PID 6192: SFrame: IP: 0x000000000053517f (0x0000000000532ec0+0x22bf)

      PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

      PID 6387: Module: /opt/sdb/7706/pgm/dbmsrv

      PID 6192: -


      PID 6192: 16: DBMSrvFrmReq_Handler::handle(DBMSrvFrmReq_Request const&, DBMSrvFrmRep_Reply&, bool&) + 0x430

      PID 6192: Symbol: ZN20DBMSrvFrmReqHandler6handleERK20DBMSrvFrmReq_RequestR18DBMSrvFrmRep_ReplyRb

      PID 6192: SFrame: IP: 0x0000000000532d30 (0x0000000000532900+0x430)

      PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

      PID 6192: -


      PID 6192: 17: DBMSrvThread_MainThread::DBMSrvThreadMain() + 0x2a7

      PID 6192: Symbol: ZN23DBMSrvThreadMainThread16DBMSrvThreadMainEv

      PID 6192: SFrame: IP: 0x000000000043eec7 (0x000000000043ec20+0x2a7)

      PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

      PID 6192: -


      PID 6192: 18: RTEThread_Thread::CThreadMain(void*) + 0x5e

      PID 6192: Symbol: ZN16RTEThreadThread11CThreadMainEPv

      PID 6192: SFrame: IP: 0x00000000007d2a3e (0x00000000007d29e0+0x5e)

      PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

      PID 6192: -


      PID 6192: 19: RTEThread_Thread::AppointMainThreadToThreadObject(int, SAPDBErr_MessageList&) + 0x2f0

      PID 6192: Symbol: ZN16RTEThreadThread31AppointMainThreadToThreadObjectEiR20SAPDBErr_MessageList

      PID 6192: SFrame: IP: 0x00000000007d4520 (0x00000000007d4230+0x2f0)

      PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

      PID 6192: -


      PID 6192: 20: main + 0x97

      PID 6192: SFrame: IP: 0x000000000042de57 (0x000000000042ddc0+0x97)

      PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv

      PID 6192: -


      PID 6192: 21: dltls_get_addr_soft@@GLIBC_PRIVATE + 0x1d974

      PID 6192: SFrame: IP: 0x000000361a61d974 (0x000000361a600000+0x1d974)

      PID 6192: Module: /lib64/libc-2.5.so

      PID 6192: -


      PID 6192: 22: __gxx_personality_v0@@CXXABI_1.3 + 0xf2

      PID 6192: SFrame: IP: 0x000000000042dd0a (0x000000000042dc18+0xf2)

      PID 6192: Source: start.S:116

      PID 6192: Module: /opt/sdb/7706/pgm/dbmsrv