Skip to Content
0

Why would HANA db disconnect take place during backup copy procedure?

May 10 at 01:23 AM

71

avatar image
Former Member

We run on ECC 6.0 EHP 7 Suite on premise on HANA 1.00.122.12 VM SLES. We also replicate to a backup DB for DR purposes. During off hours we backup the DB to disk then copy to another storage location for final backup. During the copy (CP command) procedure there is frequent escalation of both CPU & Memory. Unfortunately, nearly daily at the same time, there is also a momentary disconnect to the database. Any jobs running during that time will fail as a result.The indexserver trace file often reports a timeout and broken connection, but not always. SM21 shows job failures, but not the same job daily. This started happening just over a month ago (APR 3), but we cannot relate it to any change that took place in our system environment.We have checked VM, Network, System, Backup, HANA, and Storage statistics and have run traces. We have opened incident with SAP, but have no answers after all of this.Anyone experiencing the same or similar?
10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

2 Answers

Lars Breddemann
May 10 at 03:03 AM
0

You have done all the tracing, but what have you learned from it?

Were there any error messages that pointed to a cause for the disconnects?

What about the memory "escalations" during the copy process? Why is that happening? Are you using DirectIO when you copy the files? If not, you should consider this, as there is no benefit in using the file buffer memory when doing a one-off copy of the backup files.

Share
10 |10000 characters needed characters left characters exceeded
avatar image
Former Member May 11 at 12:07 AM
0

Hi Lars. This is Gary Conn, the DBA working with Mark.

In an nutshell, if we knew the answers to your questions, we would not be here. That said...

Since we upgraded HANA from 85.03 to 122.12, we have been getting small memory "spikes" on a semi-regular basis where we did not get them before we upgraded HANA. We are waiting on SAP to help, but so far, nothing. We have sent them the trace and RTE files as requested, but they have not found any "smoking gun". Also, when we are running our database backups, a spike is generated in memory and CPU; when we moved the backup to a different time, it followed. We do a local disk backup using the HANA native backup and then we use a very basic Linux cp command to copy the 400+GB (total size) files over to a Windows server we use for storing backups (then off to tape from there). We have been doing this for the last 2.5 years with no issues, until now. The memory and CPU spikes happen after the HANA backup and copy command (about 40 minutes into the cp command) and stop about 10-15 minutes later (memory goes back to normal after increasing by about 20%); the copy to the Windows server takes about 1.5 hours. We have HANA sync system replication running also to a local site over a 10GB pipe; first using the new logreplay, then switched to delta_datashipping; I am trying to see if different modes of HSSR is causing the spikes (still working on it).

Have you or anyone else you know experienced this before?

Thoughts?

Thanks.

Gary Conn

Show 2 Share
10 |10000 characters needed characters left characters exceeded

Hi Gary

the move from HANA rev. 85 to 122 is rather huge and a lot of how HANA works internally had been changed during the years between these versions. So, without seeing what HANA components allocate what memory it's hard to say, why the memory spikes occur.

What's clear is that there is no direct functional connection between the use of the cp-command and HANA's memory usage. From the description, it sounds as if it could be a side-effect of file system buffering due to the cp command usage. Are you using a samba share to access the MS Windows system?

0
Former Member

We do not use a samba share. Through further testing we have determined that the copy is not causing this issue. We agree with your statement. We have documented that a particular job that was running before the upgrade with no changes to it, is associated with the pronounced regular spiking.

I've included an image to provide a view of dramatic before/after operation of the HANA DB.hana-spiking-2.png Again, the gist of seeking assistance is to identify why HANA would acknowledge a connection loss (at any time) with higher demand. We have changed HANA parameters such as tcp_backlog and indexserver maxchannels, but have not seen any definitive result.

We'll keep looking for answers. Thanks.

hana-spiking-2.png (410.5 kB)
0