Skip to Content

SAP on RAC keepalive settings

Hello Oracle gurus!

I have a question regarding SAP reconnect behavior in conjunction with RAC installation.
We have created a demo system for testing purposes, on top of the RAC 12.1 (SAP NW 7.4 with latest kernel 7.42 and DBSL pathches).

TAF has been configured and if main RAC node (to which SAP is connected) shutdowns correctly (ACPI shutdown) SAP WPs doing reconnect very well (some seconds after VIP move) and it's expected behavior.

But , if main node dies without any notification , like a "power off" happens , SAP WP's can run endless (I think 7200sec by def) doing some internal jobs until max_wp_runtime passes and WP restarted completely.

We have found some recommendations to update kernel parameters on OS level like this :

To improve fail over performance in a RAC cluster, consider changing the following IP kernel parameters as well:

net.ipv4.tcp_keepalive_time net.ipv4.tcp_keepalive_intvl net.ipv4.tcp_retries2 net.ipv4.tcp_syn_retries

We have changed for testing on server where SAP is installed these parameters :

net.ipv4.tcp_keepalive_time = 30

net.ipv4.tcp_keepalive_intvl = 10

net.ipv4.tcp_keepalive_probes = 2

and in addition in tnsnames has been added "enable=broken" definition.

So the question is , this is how SAP with Oracle should be configured (in terms of fail-over) or we have missed something ?

Also , these parameters are really low , so if you have some best practice values , it's always welcome.

Add a comment
10|10000 characters needed characters exceeded

Assigned Tags

Related questions

1 Answer

  • Best Answer
    Posted on Oct 08, 2015 at 07:50 AM

    Hi Sergo,

    > So the question is , this is how SAP with Oracle should be configured (in terms of fail-over) or we have missed something ?


    The behavior that you observe is one of the issues with SAP that i tried to described you previously. I really have no idea why SAP does not implement FAN, but this is a different topic. You have to wait for time-outs in case of TAF (only) as this a client side feature. I also described some further details here:

    However i am a little bit confused as the TCP time-out should occur pretty quick after the listener VIP failed over (and it should also do this very quick in case of power off). What is your configured DELAY between the connection attempts and how often do you re-try it (RETRIES)? Have you also set TCP related parameters in your tnsnames.ora or service configuration?


    You can also enable a SQL*Net trace at client side to check the polling and fail-over behavior (maybe the client does not capture the ORA-12541, etc.): [Oracle] A short SQL*Net research or how-to drill down network related ORA errors


    Regarding your mentioned parameters - please check MOS ID #249213.1 (especially point 2 for "10g/11g Timeout parameters") and MOS ID #364171.1.

    SAP HANA also has such issues and you can see some SAP recommendations in SAPnote # 2053504 ("You should align the actual system settings on the duration of the takeover, because the clients cannot reconnect before the takeover has been completed.")


    Regards

    Stefan

    Add a comment
    10|10000 characters needed characters exceeded

    • Hi Sergo,

      > still have some problem with DB02 --> "statistic refresh job" this one still hangs without any logs and reconnects, but I think this is job specific, maybe DBSL bug or something like this, other jobs working well

      Not quite sure if "DB02 -> Statistic refresh job" executes brconnect or not, but in both cases DBMS_STATS is called in the background. However PL/SQL (session state) is not covered / reconstructed by TAF. This is different from session or select failover.

      You can also cross-check this in the official Oracle documentation: Enabling Advanced Features of Oracle Net Services

      Server-side program variables: Server-side program variables, such as PL/SQL package states, are lost during failures, and TAF cannot recover them. They can be initialized by making a call from the failover callback.

      ... and failover callbacks are not implemented by SAP (SAPnote #1431241). If the application does not intercept the corresponding ORA error can be answered with a SQL*Net trace again 😉

      Regards

      Stefan

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.