on 09-23-2013 1:35 AM
Dear All,
happy day ,
we have installed SAP SCM Server 7.0 in the red hat cluster environment under Red Hat Enterprise Linux 5.8 with Oracle DataBase 11.2,we install 4 services
Service 1 : ASCS ==> which is Enqueu Server with NR 12
Service 2 : CI ==> which is Central instance S1C with NR 11
Service 3 : ERS ==> which is Replication Server with NR 10
Service 2 : DB ==> which is DB instance S1C
running under 2 nodes
Node NO 1 : scmapp
Node No 2: scmdb
with the following Sequence (ERS - ASCS - DB - CI)
After the FailOver , the lock table Empty with some failover scenarios (not with all scenarios)
and
During the failover, the trace file of the standalone enqueue server dev_enqrepl contains the following error message:
[Thr 1094582240] Sun Sep 22 16:49:46 2013
[Thr 1094582240] A newly started replication server has connected
[Thr 1094582240] ***LOG GEZ=> repl. activ [encllog.cpp 501]
[Thr 1094582240] Sun Sep 22 16:49:58 2013
[Thr 1094582240] *** ERROR => ReplicationThread::Loop(): fatal error (read) on connection 1aa61b60 (hdl 89) [eniothread.c 2453]
[Thr 1094582240] ***LOG GEZ=> repl. closed [encllog.cpp 501]
Best Regards ,
Ahmed
Dear All ,
for your information only , i am not losing content of the lock table in case adding the content by the following command
> enqt pf=/usr/sap/SID/SYS/profile/SID_ERS10_scers 11 20
only in case if i modifying DataBase records for example modifying User Master record
Best Regards
Ahmed
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Roman ,
excuse for delayed my response yesterday was my holiday , and i was not able to access the system ,
the performed steps to check the lock table
1- add entries to lock table either by the following command
A - >> enqt pf=/usr/sap/SID/SYS/profile/SID_ERS10_scers 11 20 ==> i added 20 locked record in the lock table of type "X" Exclusive but not cumulative lock
OR
B - logon to the system and trying to modify some record for example User Master Record by SU01 T-code and modify my userID Details, the records locked with type "E" Exclusive . i checked the record from SM 12 T-code
2 - check the content of the lock table in all cases by using the following command
>>enqt pf=/usr/sap/SID/SYS/profile/SID_ERS10_scers 20 1 1 9999
the result will be as the following screen
3 - after that , reallocating or restart service one by one or Deactivate the resource manager or shutdown one node
4 - check the content of Lock table during the process
in case i added the content by the command in step A
>> enqt pf=/usr/sap/SID/SYS/profile/SID_ERS10_scers 11 20
i did not lose any of the content of lock table .(completely success)
------------------------------------------------------------------------------------------------------------
in case of step (A) there is some successful scenarios and failed scenarios
-------------------------------------------------------
successful scenarios
-----------------------------------------------------
1 - reallocate (ASCS , ERS , DB) services on the same Node or restart on other node by using the following command
and checking the status of services as the following screen
2 - reallocate or restart (ASCS +CI ) together by deactivate the resource manager by the following command >> service rgmanager stop . or turn off the node holding those services .
ASCS+CI move to the other node successfully and stopping the ERS service and the lock table still keep the content
even after starting the resource manager if both services move back again to the original node together the lock table will not lose any content .
---------------------------------------------------------
failed scenario
---------------------------------------------------------
move or restart or reallocate the CI service to the other node or same node without ASCS service ,the number of entries in lock table will be 0
for example
this the status of service as the following screen
and the status of the lock table as the following screen
i will move CI service to Node scmapp by the following command
the service will move to other node and the status of the lock table will be as the following screen
status of services
the Same case also if i move CI+DB or CI+ERS
you can also check the dev_enqrepl and dev_enqser during CI failover
------------------------------------------------------------------------------------------------------------------------------
scmdb:s1cadm 53> more dev_enqrepl
---------------------------------------------------
trc file: "dev_enqrepl", trc level: 1, release: "720"
---------------------------------------------------
sysno 12
sid S1C
systemid 390 (AMD/Intel x86_64 with Linux)
relno 7200
patchlevel 0
patchno 430
intno 20020600
make multithreaded, Unicode, 64 bit, optimized
pid 14742
[Thr 1105469408] Sat Sep 28 16:34:31 2013
[Thr 1105469408] profile /usr/sap/S1C/SYS/profile/S1C_ASCS12_scascs
[Thr 1105469408] hostname scmdb
[Thr 1105469408] IOListener::Listen: listen on port 51216 (addr 0.0.0.0)
[Thr 1105469408] will sleep maximal 333 ms while waiting for response
[Thr 1105469408] will sleep max 5000 ms when idle
[Thr 1105469408] will wait maximal 1000 ms for input
[Thr 1105469408] *** ERROR => ReplicationThread::Loop(): fatal error (read) on connection 2aab4c0010c0 (hdl 65) [eniothread.c 2465]
[Thr 1105469408] Sat Sep 28 16:34:33 2013
[Thr 1105469408] A newly started replication server has connected
[Thr 1105469408] ***LOG GEZ=> repl. activ [encllog.cpp 501]
[Thr 1105469408] Sat Sep 28 16:34:51 2013
[Thr 1105469408] *** ERROR => ReplicationThread::Loop(): fatal error (read) on connection 2aab4c0010c0 (hdl 73) [eniothread.c 2465]
[Thr 1105469408] ***LOG GEZ=> repl. closed [encllog.cpp 501]
[Thr 1105469408] Sat Sep 28 16:35:18 2013
[Thr 1105469408] A newly started replication server has connected
[Thr 1105469408] ***LOG GEZ=> repl. activ [encllog.cpp 501]
-----------------------------------------------------------------------------------------------------------------------
dev_enqsrv
-----------------------------------------------------------------------------------------------------------------------
---------------------------------------------------
trc file: "dev_enqsrv", trc level: 1, release: "720"
---------------------------------------------------
sysno 12
sid S1C
systemid 390 (AMD/Intel x86_64 with Linux)
relno 7200
patchlevel 0
patchno 430
intno 20020600
make multithreaded, Unicode, 64 bit, optimized
pid 14742
[Thr 47008314785632] Sat Sep 28 16:34:29 2013
[Thr 47008314785632] profile /usr/sap/S1C/SYS/profile/S1C_ASCS12_scascs
[Thr 47008314785632] hostname scmdb
[Thr 47008314785632] Listen successful on port/service sapdp12
[Thr 47008314785632] EnqInitCleanupServer: Shm of enqueue table (rc = 3) does not exist, nothing to clean up
[Thr 47008314785632] Sat Sep 28 16:34:30 2013
[Thr 47008314785632] ShadowTable:attach: ShmCreate(,SHM_ATTACH,) -> 2aab3e200000
[Thr 47008314785632] EnRepClass::getReplicaData: found old replication table with the following data:
[Thr 47008314785632] Line size:744, Line count: 89144, Failover Count: 9
[Thr 47008314785632] EnqId: 1380112831/8767, last stamp: 1/380375233/3000
[Thr 47008314785632] Byte order tags: int:1159934994 char:Z
[Thr 47008314785632] initialize_global: Enqueue server started with replication functionality
[Thr 47008314785632] Enqueue: EnqMemStartupAction Utc=1380375270
[Thr 47008314785632] EnqLockTableCreate: create lock table (size = 102400000)
[Thr 47008314785632] EnqLockTableMapToLocalContext: enque/use_pfclock2 = FALSE
[Thr 47008314785632] ***LOG GEJ=> [enxxmfil.h 191]
[Thr 47008314785632] ***LOG GZZ=> /usr/sap/S1C/DVEBMGS11/log/ENQBC [enxxmfil.h 191]
[Thr 47008314785632] ***LOG GZZ=> No such file or directory [enxxmfil.h 191]
[Thr 47008314785632] Delete replication table which was attached by the enqueue server
[Thr 47008314785632] ShadowTable:destroy: ShmCleanup( SHM_ENQ_REP_SHADOW_TBL)
[Thr 47008314785632] enque/backup_file disabled in enserver environment
[Thr 47008314785632] Sat Sep 28 16:34:31 2013
[Thr 47008314785632] ***LOG GEZ=> Server start [encllog.cpp 501]
[Thr 47008314785632] Enqueue server start with instance number 12
------------------------------------------------------------------------------------------------------------
excuse again for long replay
Best Regards
Ahmed
HI Roman ,
kindly check the
1 - cluster..conf
2 - event-node-ers.sl
3 - event-service-ers.sl
files in the attachment .
just not to confuse , there is some resources configured in cluster.conf file but is not in use in any service "i deleted four services which is one service for dialog instance and livecache and 2 services for DAA"
thanks
Ahmed
Hi Roman,
. What method (SU01 or enqt) did you use to create lock table entries?
i tried both (SU01 and enqt)
in case of create lock table entries by " enqt " all things is OK "no issue "
--------------------------------
in case of create lock table entries by accessing DB tables by T-code like SU01 i faced the issues described above
kindly, can you explain the previous post in Details
thanks
Ahmed
Dear All
I notes the following error in the trace file DEV_ENQSRV after lock table content lost
---------------------------------------------------
trc file: "dev_enqsrv", trc level: 1, release: "720"
---------------------------------------------------
sysno 12
sid S1C
systemid 390 (AMD/Intel x86_64 with Linux)
relno 7200
patchlevel 0
patchno 430
intno 20020600
make multithreaded, Unicode, 64 bit, optimized
pid 25066
[Thr 47930438192992] Thu Sep 26 10:15:25 2013
[Thr 47930438192992] profile /usr/sap/S1C/SYS/profile/S1C_ASCS12_scascs
[Thr 47930438192992] hostname scmdb
[Thr 47930438192992] Listen successful on port/service sapdp12
[Thr 47930438192992] Thu Sep 26 10:15:26 2013
[Thr 47930438192992] EnqInitCleanupServer: Shm of enqueue table (rc = 3) does not exist, nothing to clean up
[Thr 47930438192992] ShadowTable:attach: ShmCreate(,SHM_ATTACH,) -> 2aab3e200000
[Thr 47930438192992] EnRepClass::getReplicaData: found old replication table with the following data:
[Thr 47930438192992] Line size:744, Line count: 89144, Failover Count: 0
[Thr 47930438192992] EnqId: 1380112831/8767, last stamp: 1/380179531/3000
[Thr 47930438192992] Byte order tags: int:1159934994 char:Z
[Thr 47930438192992] initialize_global: Enqueue server started with replication functionality
[Thr 47930438192992] Enqueue: EnqMemStartupAction Utc=1380179726
[Thr 47930438192992] EnqLockTableCreate: create lock table (size = 102400000)
[Thr 47930438192992] EnqLockTableMapToLocalContext: enque/use_pfclock2 = FALSE
[Thr 47930438192992] Thu Sep 26 10:15:28 2013
[Thr 47930438192992] Enqueue checkpointing: start restoring entries. Utc=1380179728
[Thr 47930438192992] Delete replication table which was attached by the enqueue server
[Thr 47930438192992] ShadowTable:destroy: ShmCleanup( SHM_ENQ_REP_SHADOW_TBL)
[Thr 47930438192992] *** ERROR => ShadowTable:destroy: failed to delete SHM (rc=2) [enreptbl.cpp 723]
[Thr 47930438192992] enque/backup_file disabled in enserver environment
[Thr 47930438192992] Thu Sep 26 10:15:29 2013
[Thr 47930438192992] ***LOG GEZ=> Server start [encllog.cpp 501]
[Thr 47930438192992] Enqueue server start with instance number 12
-----------
kindly check the error
thanks
Ahmed
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
HI,
You are checkign the enqueue replication server trace.
After the move of ASCS instnace to the second cluster note you should check the trace dev_enqsrv if the standalone enqueue was able to allocate to the ERS shared memory without error.
If you receive errors like:
ShadowTable:attach: ShmCreate - pool doesn't exist
*** ERROR => EnqRepAttachOldTable: failed to get information on
old replication table: rc=-1
this means that the standalone enqueue was not able to connect to the ERS shared memory and, therefore, you should check the reason in the old traces of the ERS that was running in this host.
If the standalone enqueue was able to attach to the ERS shared memory without errors, you should find a log with " shared key attached"
Regards
Clebio
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
HiClebio
I post the Error in dev_enqrepl filel in the Original Post
--------------------------------------------------------------------------
---------------------------------------------------
trc file: "dev_enqrepl", trc level: 1, release: "720"
---------------------------------------------------
sysno 12
sid S1C
systemid 390 (AMD/Intel x86_64 with Linux)
relno 7200
patchlevel 0
patchno 94
intno 20020600
make multithreaded, Unicode, 64 bit, optimized
pid 3220
[Thr 1083785184] Tue Sep 24 13:42:21 2013
[Thr 1083785184] profile /usr/sap/S1C/SYS/profile/S1C_ASCS12_scascs
[Thr 1083785184] hostname scmapp
[Thr 1083785184] IOListener::Listen: listen on port 51216 (addr 0.0.0.0)
[Thr 1083785184] will sleep maximal 333 ms while waiting for response
[Thr 1083785184] will sleep max 5000 ms when idle
[Thr 1083785184] will wait maximal 1000 ms for input
[Thr 1083785184] Tue Sep 24 13:42:22 2013
[Thr 1083785184] A newly started replication server has connected
[Thr 1083785184] ***LOG GEZ=> repl. activ [encllog.cpp 501]
[Thr 1083785184] Tue Sep 24 13:43:03 2013
[Thr 1083785184] *** ERROR => ReplicationThread::Loop(): fatal error (read) on connection 1f77c450 (hdl 65) [eniothread.c 2453]
[Thr 1083785184] ***LOG GEZ=> repl. closed [encllog.cpp 501]
[Thr 1083785184] Tue Sep 24 13:44:12 2013
[Thr 1083785184] A newly started replication server has connected
[Thr 1083785184] ***LOG GEZ=> repl. activ [encllog.cpp 501]
[Thr 1083785184] Tue Sep 24 13:45:22 2013
[Thr 1083785184] *** ERROR => ReplicationThread::Loop(): fatal error (read) on connection 1f781f50 (hdl 104) [eniothread.c 2453]
[Thr 1083785184] ***LOG GEZ=> repl. closed [encllog.cpp 501]
-------------------------------------------------------------
the content of dev_enqsrv
------------------------------------------------------------
---------------------------------------------------
trc file: "dev_enqsrv", trc level: 1, release: "720"
---------------------------------------------------
sysno 12
sid S1C
systemid 390 (AMD/Intel x86_64 with Linux)
relno 7200
patchlevel 0
patchno 94
intno 20020600
make multithreaded, Unicode, 64 bit, optimized
pid 29637
[Thr 47815365664608] Tue Sep 24 14:46:56 2013
[Thr 47815365664608] profile /usr/sap/S1C/SYS/profile/S1C_ASCS12_scascs
[Thr 47815365664608] hostname scmdb
[Thr 47815365664608] Listen successful on port/service sapdp12
[Thr 47815365664608] Tue Sep 24 14:46:57 2013
[Thr 47815365664608] EnqInitCleanupServer: Shm of enqueue table (rc = 3) does not exist, nothing to clean up
[Thr 47815365664608] ShadowTable:attach: ShmCreate - pool doesn't exist
[Thr 47815365664608] initialize_global: Enqueue server started with replication functionality
[Thr 47815365664608] Enqueue: EnqMemStartupAction Utc=1380023217
[Thr 47815365664608] Enqueue Info: enque/use_pfclock2 = FALSE
[Thr 47815365664608] Tue Sep 24 14:46:58 2013
[Thr 47815365664608] ShadowTable:attach: ShmCreate - pool doesn't exist
[Thr 47815365664608] EnqRepRestoreFromReplica: failed to attach to old replication table: rc=-1
[Thr 47815365664608] enque/backup_file disabled in enserver environment
[Thr 47815365664608] Tue Sep 24 14:46:59 2013
[Thr 47815365664608] ***LOG GEZ=> Server start [encllog.cpp 501]
[Thr 47815365664608] Enqueue server start with instance number 12
------------------------------------------------
Best Regards
Ahmed
Hello,
Possible error that I can see is ERS instance number if different on node 1 and node 2. Please make sure that the replication server must create the replicate with the same instance number that is used by the standalone enqueue server.
Thanks
Sunny
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
HI Sunny,
thank you for your quick response
i didn't understand
Possible error that ERS instance number if different on node 1 and node 2.
how the ERS Instance number on node 2 is not the same as Node 1 ?? actually it is only one service which is move between both node
may be you mean that , the addressed Enqueue instance number in the replication server profile is not the Same with running Enqueue instance number , i checked the following Parameter in the replication instance profile
SCSID = 12
SCSHOST = scascs
which is correct
please correct me if i got misunderstand ,
thank you
Ahmed
Hello,
ERS service never failover to other node. You need to install Enqueue Standalone server on node 1 and replication server on node2 with same instance number. So, when node1 fails over then replication server on node 2 maintains lock entries.
If you will check SAP note 1249256 then it clearly states that:
In the instance profile, the replication server uses a different instance number than the standalone enqueue server. To ensure that the replicate that was created in the shared memory can be found after a failover, the replication server must create the replicate with the same instance number that is used by the standalone enqueue server. You must use the configuration of the replication server to ensure this.
Thanks,
Sunny
Hi,
Can you please check the below note.
Note 1747880 - ENQU: Various corrections (ENSA)
Thanks
Rishi Abrol
Dear Sunny
thank you for your kind response
actually the ASCS service should move to another Node in case the Node holding the ASCS service failed, because ASCS service is holding Message Server otherwise the system will stop
for ERS Service it could be restricted to one Node only "not to move to other node in case of failover and it will stop in case of other node failed " this according to new design from redhat. check the following white paper
but according to the old Design from RedHat in the following link the ERS Service could move to the other Node
http://www.redhat.com/f/pdf/ha-sap-v1-6-4.pdf
Also the Design from SUSE and Oracle following the Same, ERS is moving between both node
http://www.novell.com/docrep/2012/10/sap_on_sle.pdf
please can you check the SAP Note again, because as per my understanding for that Note it is required to implement the following parameter in start profile of the replication server
SAPSYSTEM = <instance number of the replication server>
SCSID = <instance number of the standalone enqueue server>
and
Restart_Program_00 = local $(_ER) pf=$(_PF) NR=$(SCSID)
which it means different Instance number , by the way we can not install two instance with Same number on one System
Dear Sunny, correct me if i got misunderstand ,but please in more Details
thank you
Ahmed
I agree with your first point but don't agree with your 2nd point. If ERS will failover then what is the need to install ERS instance on 2nd node.You are not following SAP recommendation, it is clearly written in the SAP note.
Also, if you read the guide that you gave above, then it referred to SAP help portal link to setup enqueue replication Server. Please check below link:
http://help.sap.com/saphelp_nw2004s/helpdata/en/de/cf853f11ed0617e10000000a114084/frameset.htm
Also, check below screen from the link above
It clearly shows that ERS will be installed with same instance number on both host and ERS will not switchover to other node.
Thanks,
Sunny
Hi Sunny
the diagram above says that ERS service is installed locally on the physical host , which it means that whenever the ASCS service move to the other node the ERS Service will stop on that Node and it will start again on the other node whenever the failed machine start up again
in my scenario it's same but i installed the ERS Service on Virtual host, whenever the ASCS service move to the node holding the ERS service , either the ERS Service stop in case the other node is not Available or it will move if the other node is available
thanks
Ahmed
ERS service cannot move to the other node. It has to stop if node1 goes down because enqueue replication on other node contains lock entries. Also, Instance number is same on both nodes.
The same thing is written in link given above. It's upto you whether you follow this approach or not
Thanks,
Sunny
@Sunny
ERS service can (and must) move to the other node if it is configured as cluster resource (not polling concept).
Replication and Failover - The SAP Lock Concept (BC-CST-EQ) - SAP Library
@Roman
If you read your link then it states below:
A replication server runs on each possible failover host. Each replication server then regularly polls the HA software and checks whether it is running on the failover host currently in use. This is the host on which the enqueue server is to be restarted following a failover. The replication server running on this host is activated (status active) and starts the replication. The other replication servers do nothing (status inactive). All replication servers regularly interrogate the HA software to ascertain whether the failover host has changed (it can change at any time). If the failover host has changed, the active replication server is deactivated (it closes the connection to the enqueue server and deletes the replication table). The replication server on the current failover host, which has been inactive until now, becomes active (it connects to the enqueue server and creates a replication table).
If there is a failover, all the replication servers switch to the hold status. This does not change anything for the inactive servers. The active server closes the connection to the enqueue server but keeps the replication table. The HA software must stay in the hold status until the newly started enqueue server has read the replication table and created a new lock table.
It means that ERS is installed on each host in the HA. One is on active node is in active status and rest is in inactive status. As soon as failover happens, it transfer lock table to other node ERS and it becomes active. But ERS instance does not move.
Thanks,
Sunny
Thanks,
Sunny
All you have quoted is valid for polling concept. Please read this link to the end:
Red Hat Clustering contains Follow Service Dependency which allows you to setup ERS as a cluster resource.
Hi,
Have you checked the below note.
1249256 - Locks are lost during failover
Thanks
Rishi Abrol
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
HI Rishi
thanks for your quick replay.
I already checked that sap note before , and the parameter to be check in that SAP note is OK
actually the lock table is not lost in all failover scenarios
for example :
if the Node1 failover which is including the (ASCS and CI ) services , the services will move to Node2 without losing the lock table content . and same if Node2 fail .
in case of reallocate ASCS or ERS or DB services to the other node the content will not lost also
but if i reallocate CI service to other node the content of lock table lost
Best Regards
Ahmed
if the Node1 failover which is including the (ASCS and CI ) services , the services will move to Node2 without losing the lock table content . and same if Node2 fail .
in case of reallocate ASCS or ERS or DB services to the other node the content will not lost also
It's ok. Please clarify is your ERS service configured as cluster resource?
but if i reallocate CI service to other node the content of lock table lost
In this case you doesn't move ASCS instance. Where you check that content of lock table lost?
Hi Roman,
thank you for your kind response,
It's ok. Please clarify is your ERS service configured as cluster resource?
Yes , ERS service is configured as cluster resource (IP , LVM , FS )
In this case you doesn't move ASCS instance. Where you check that content of lock table lost?
by using the following command
it means there is only one locked record when number of entries become 0 it means the content of lock table lost
and that command to check the connection
which is OK also
and to check the ENQ ID
which is same after failover
Thanks
Ahmed
User | Count |
---|---|
83 | |
23 | |
11 | |
9 | |
8 | |
5 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.