on 08-28-2007 11:16 AM
Hi All
Our 4.6C system has developed a strange performance problem.
We are finding situations where multiple work processes are all in status Commit, and remain so until the either time out or cancelled. They are performing quite different workloads, don't appear to be waiting for any locks, and don't actually appear to be committing from an AS/400 view. Has any one else encountered this. I've attached an SM50. We are also opening in OSS call. But I thought I'd tap the real knowledge as well.
No Ty. PID Status Reasn Start Err Sem CPU Time Report Cl. User Action Table
2 DIA 11886 Running Yes 1 SAPMV45A 410 GALLACL Sequential read VBAK
22 DIA 11907 Running Yes 65 ZSDRP022 410 PERCHAM Sequential read VBPA
23 DIA 11908 Running Yes 2 SAPLM61X 410 OBRIENC Sequential read MSSQ
6 BGD 11890 Running Yes 20600 ZMMRP007 410 BATCH Direct read SER03
1 DIA 11885 Running Yes 2298 SAPLOLEA 410 NAIRNPA Commit
3 BGD 11887 Running Yes 9737 SDRQCR21 410 BATCH Commit
7 BGD 11891 Running Yes 886 SAPLZCGF 410 CHECKER1 Commit
14 DIA 11899 Running Yes 1690 SAPMM07M 410 KILDEAV Commit
15 DIA 11900 Running Yes 1400 SAPLOLEA 410 BULLIOR Commit
19 DIA 11904 Running Yes 543 SAPLOLEA 410 STEVENA Commit
25 UPD 11910 Running Yes 1 RSM13000 410 MACINTK Commit
31 BGD 11917 Running Yes 834 SAPLZCGF 410 CHECKER1 Commit
32 BGD 11918 Running Yes 14505 SAPLZCGF 410 CHECKER1 Commit
0 DIA 11884 waiting Yes
4 BGD 11888 waiting Yes
5 BGD 11889 waiting Yes
8 BGD 11893 stopped CPIC Yes 3 SAPLRSAP 410 BW_REMOTE
9 DIA 11894 waiting Yes
10 DIA 11895 Running Yes 410 CAMPBEC
11 DIA 12218 Running Yes SAPLTHFB 410 JONESST
12 DIA 11897 Running Yes 3 SAPLRSAP 410 BW_REMOTE
13 DIA 11898 waiting Yes
16 DIA 12219 waiting Yes 1
17 DIA 11902 waiting Yes
18 DIA 11903 waiting Yes
20 DIA 11905 waiting Yes
21 DIA 11906 waiting Yes
24 UPD 11909 waiting Yes
26 ENQ 11911 waiting Yes
27 DIA 11912 waiting Yes
28 DIA 11914 waiting Yes
29 DIA 11915 waiting Yes
30 BGD 12230 waiting Yes 1
33 SPO 11919 waiting Yes
34 UP2 11920 Running Yes SAPLIEP2 410 CAMPBEC
Regards
Steve
This looks like a problem with the database monitor (QQQDBLOG and QQQDBMWL in the call stack). You may want to report this problem to IBM (including the call stack information that you sent me). As a workaround, you can disable the monitor by setting profile parameter as4/dbmon/enable = 0. However, you won't see any useful data in ST04 then.
Kind regards,
Christian Bartels.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Volker,
We are still V5R3 at the moment, but I will certainly look into that option once we upgrade. As for the memory, we have already determined that we are a bit short there, and I am adding some more tonight. I've noticed today paging is down considerably since switching off DBMON.It seems strange that this problem appears to suddenly of happened, rather than a gradual worsening. I guess we must have reached some tipping point.
Thanks
Steve
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Stephen,
Did you happen to apply DBFIX Pack 16? IBM's theory is that the PTF jump from SI25186 to SI26685 may have caused the problem. At least that has been our experience. We just ran into the same problem and disabled the dbmon and performance is back to normal.
Best Regards,
Philip Stracener
Hi Philip,
We have applied DB Fix Pack 16 quite recently. We were already experiencing some performance problems before applying it but not as extreme as it got. I've ordered 17 now. Was there any suggestion from IBM that 17 would fix the problem? My feeling at this stage is that I'll just leave DBMON off.
Thanks for the info.
Regards
Steve
Hi Stephen,
what you detect is a well-known problem ! (Especially in RAM constraint environments)
If you are on V5R4, you can (and should) use the OpsNav Plancache instead - I always use that one on V5R4 customers with really good results (and the additional effort is for free => it is not slower, because the data is there anyway).
Regards
Volker Gueldenpfennig, consolut.gmbh
http://www.consolut.de - http://www.4soi.de - http://www.easymarketplace.de
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks Christian, the database monitor does appear to be our problem. I have disabled it and not only has our stuck commits gone away, general system performance is much better. Certain batch jobs overnight now run in 2,000s as opposed to 18,000s. I will report it to IBM, but I guess I'll have to switch it back on at some point for them to see it happening. Another side effect of the problem was the inability to cancel these hung commits from SM50, SM37 or SM04. Even from the AS/400 an endjob *immed on the work process to over an hour and a half to end the job!
Regards
Steve
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Mi first thought would be that maybe a Save-While-Active operation is running. When you start a SAVLIB operation with SAVACT(*SYNCLIB), the system tries to reach a checkpoint where all processes reach a transaction boundary (Commit or Rollback).
It would be interesting to know what status the work processes are in from an i5/OS standpoint (WRKACTJOB). Are the processes shown as idle (-> SEMW), or are they in another status? What can be seen at the bottom of the call stack for the affected work processes? (WRKACTJOB ->Option 5 for one of the processes, then option 11).
That information may help to understand what is going on.
Kind regards,
Christian Bartels.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Christian,
Whilst we do use SAVACT(*SYNCLIB), I take SAP offline to reach the sync point then bring it back online once reached. This happens between 03:00 & 03:45. Typically the problems start at about 08:00.
From the 400 the locked WP's are at RUN status and the end of the call stack is
Program
Rqs or
Lvl Procedure Library Statement
< CommitEDRS QSYS 0000000217
QXDA_SQL QSYS 0000008031
QSQXCUTE QSYS 0000025220
QSQSTATS QSYS 0000004693
OUTPUT QSYS 0000004984
WRITE_LOG QSYS 0000005790
QQQDBLOG QSYS 0000002909
QQQDBMWL QSYS 0000000150
< YP7eachmon QSYS 0000000027
PurgeNode QSYS 0000000017
locksl2 QSYS 0000000015
I'll add that we are on V5R3, PTF's as latest APAR, ASCII Kernel patch 2307, lib_dbsl patch 2303.
The problem can affect dialog jobs just moving between screens, or background jobs. I suspect but haven't confirmed, that the background jobs are using BDC sessions.
Thanks
Steve
Hi Pat,
We're not using any HA or remote journalling. I've thought about the update processes too, but I rarely see them busy. Looking at it from a 400 view the jobs don't seem to be adding any journal entries, and if they are cancelled, they don't roll anything back. I'll keep you posted if anything turns up, and keep the suggestions coming.
Regards
Steve
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Steve,
My first reaction would be to ask if you have a high availability system that might be using a journal to mirror IFS objects. Especially if you are on V5R4 which has some problems (they call them enhansements).
Another thing I noticed is that you only show one update 2 work process. Maybe someone started using a program that ties that up. We had 4 update and 4 update 2 wp's when we ran 46c.
You need to be looking at the WP logs (ST11 is handy)
Keep looking and let us know if SAP helps.
Pat
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
93 | |
11 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.