Skip to Content
avatar image
Former Member

brbackup fails with error 23 after running 2 hours

Hello,

we are facing a strange problem with brbackup.

The backup (online/offline) always fails after running 2 hours with message "EXIT STATUS 23: socket read failed"

It isn't depending on special datafiles, just on the runtime. Dividing the backup in several streams to get a less runtime solved the problem for a while. But the database became larger and since some days the 2 hours runtime were reached again - and the backups are failing.

File backups for these servers were completed successfully, in spite of their runtime, which is mostly more than 2 hours. It just concerns brbackup.

We are using Veritas Netbackup 5.1 MP4.

I already set the debug level to highest value, but the logs are containing the error message without a hint on the cause...

I also tried increazing all possible client timeout settings in the Netbackup configuration but this just had the effect to keep the already failed brbackup session in status running in the Netbackup monitor 😔 until the time out had been reached.

There are also some other servers in the same backup cell, running SAP backups by brbackup which were completed successfully. I compaired their Netbackup installation/configuration and found no difference (except client name and SID of course).

Now I wonder, if there is any possibility for timeout entries in the SAP configuration or SAP gateway, which is responsible for such backup aborts...

I am afraid to be "no SAP guru" - so I hope to find some help here...

Many thanks in advance!!!!

ml

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

4 Answers

  • avatar image
    Former Member
    Oct 08, 2007 at 10:59 AM

    Hi Martina

    I am not aware of any timeout at SAP side. But if you have a firewall between the SAP server and the backup media / master server it might be the reason for the timeouts. We had the same problem with Veritas Backups some years ago.

    If all your servers are in the same network, just ignore this post.

    Best regards

    Michael

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Oct 09, 2007 at 01:08 PM

    Hi MArtina,

    Not very sure but the sap backup via brtools is controlled by the parameter file init<sid>.sap file.

    Pl hv a look into the parameters there it may give you some insight.

    Regards

    Add comment
    10|10000 characters needed characters exceeded

  • Oct 09, 2007 at 01:16 PM

    there is nothing is SAP's side. this is either on Netbackup on network side.

    can you change the time of your backup? perhaps a router reboots itself once a day?

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Oct 19, 2007 at 09:16 PM

    Hello,

    Thanks for all your input.

    I already sorted out the utl file and also the scheduling, before I posted my question.

    But the first hint was the one, which set me on the right track. Many thanks to Michael.

    I wasn't aware there was a firewall between backup master and clients, but this was a new idea and I forwarded it straight away to our network team to sort out.

    It took a while (nearly 2 weeks now) and several test backups to get the confirmation, but since today it's 100% clear: the problem is firewall related.

    We also ran some backups "bypassing the firewall". It was much faster and (of course...) completely successfully. This would be a good work around, but customer doesn't permits to do backups in this way.

    What has been detected as cause?

    I am no network expert, so I hope, I will describe it in the correct way...

    There was a address translation by NAT, which caused to run the backup via another IP adress (invisible in the backup logs!) as the one, to which the keep alive was set. So after 2 hours the default socket idle time was reached and the connection has been cut, even if the backup was still in progress - but running via another address, instead of the one where the keep alive was set and which was known and specified in the backup policy and backup client properties ...

    There was also some other missconfiguration on the firewall, about I get no further description. They are still not fixed at all, as it needs permissions from customer and planned maintenance time to do the changes.

    I hope, the issue will be solved soon. But we are finally on the right way.

    best regards,

    Martina

    Add comment
    10|10000 characters needed characters exceeded