Solved: SAP on RAC keepalive settings

Former Member · ‎10-08-2015

Hello Oracle gurus!

I have a question regarding SAP reconnect behavior in conjunction with RAC installation.
We have created a demo system for testing purposes, on top of the RAC 12.1 (SAP NW 7.4 with latest kernel 7.42 and DBSL pathches).

TAF has been configured and if main RAC node (to which SAP is connected) shutdowns correctly (ACPI shutdown) SAP WPs doing reconnect very well (some seconds after VIP move) and it's expected behavior.

But , if main node dies without any notification , like a "power off" happens , SAP WP's can run endless (I think 7200sec by def) doing some internal jobs until max_wp_runtime passes and WP restarted completely.

We have found some recommendations to update kernel parameters on OS level like this :

To improve fail over performance in a RAC cluster, consider changing the following IP kernel parameters as well:

net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_retries2 net.ipv4.tcp_syn_retries

We have changed for testing on server where SAP is installed these parameters :

net.ipv4.tcp_keepalive_time = 30

net.ipv4.tcp_keepalive_intvl = 10

net.ipv4.tcp_keepalive_probes = 2

and in addition in tnsnames has been added "enable=broken" definition.

So the question is , this is how SAP with Oracle should be configured (in terms of fail-over) or we have missed something ?

Also , these parameters are really low , so if you have some best practice values , it's always welcome.

stefan_koehler · ‎10-08-2015

Hi Sergo,

> So the question is , this is how SAP with Oracle should be configured (in terms of fail-over) or we have missed something ?

The behavior that you observe is one of the issues with SAP that i tried to described you previously. I really have no idea why SAP does not implement FAN, but this is a different topic. You have to wait for time-outs in case of TAF (only) as this a client side feature. I also described some further details here:

However i am a little bit confused as the TCP time-out should occur pretty quick after the listener VIP failed over (and it should also do this very quick in case of power off). What is your configured DELAY between the connection attempts and how often do you re-try it (RETRIES)? Have you also set TCP related parameters in your tnsnames.ora or service configuration?

You can also enable a SQL*Net trace at client side to check the polling and fail-over behavior (maybe the client does not capture the ORA-12541, etc.):

Regarding your mentioned parameters - please check MOS ID #249213.1 (especially point 2 for "10g/11g Timeout parameters") and MOS ID #364171.1.

SAP HANA also has such issues and you can see some SAP recommendations in SAPnote # 2053504 ("You should align the actual system settings on the duration of the takeover, because the clients cannot reconnect before the takeover has been completed.")

Regards

Stefan

SAP on RAC keepalive settings

Accepted Solutions (1)

Accepted Solutions (1)

Answers (0)

Re: Daten aus der LMDB oder SLD abrufen

Daten aus der LMDB oder SLD abrufen

Re: Embed mode not working for optimized story

Re: Struggling with Filters on Select - Fiori App

How to navigate with multiple values of parameters...