cancel
Showing results for 
Search instead for 
Did you mean: 

Work Processes stuck at action Commit

Former Member
0 Kudos

Hi All

Our 4.6C system has developed a strange performance problem.

We are finding situations where multiple work processes are all in status Commit, and remain so until the either time out or cancelled. They are performing quite different workloads, don't appear to be waiting for any locks, and don't actually appear to be committing from an AS/400 view. Has any one else encountered this. I've attached an SM50. We are also opening in OSS call. But I thought I'd tap the real knowledge as well.

No Ty. PID Status Reasn Start Err Sem CPU Time Report Cl. User Action Table

2 DIA 11886 Running Yes 1 SAPMV45A 410 GALLACL Sequential read VBAK

22 DIA 11907 Running Yes 65 ZSDRP022 410 PERCHAM Sequential read VBPA

23 DIA 11908 Running Yes 2 SAPLM61X 410 OBRIENC Sequential read MSSQ

6 BGD 11890 Running Yes 20600 ZMMRP007 410 BATCH Direct read SER03

1 DIA 11885 Running Yes 2298 SAPLOLEA 410 NAIRNPA Commit

3 BGD 11887 Running Yes 9737 SDRQCR21 410 BATCH Commit

7 BGD 11891 Running Yes 886 SAPLZCGF 410 CHECKER1 Commit

14 DIA 11899 Running Yes 1690 SAPMM07M 410 KILDEAV Commit

15 DIA 11900 Running Yes 1400 SAPLOLEA 410 BULLIOR Commit

19 DIA 11904 Running Yes 543 SAPLOLEA 410 STEVENA Commit

25 UPD 11910 Running Yes 1 RSM13000 410 MACINTK Commit

31 BGD 11917 Running Yes 834 SAPLZCGF 410 CHECKER1 Commit

32 BGD 11918 Running Yes 14505 SAPLZCGF 410 CHECKER1 Commit

0 DIA 11884 waiting Yes

4 BGD 11888 waiting Yes

5 BGD 11889 waiting Yes

8 BGD 11893 stopped CPIC Yes 3 SAPLRSAP 410 BW_REMOTE

9 DIA 11894 waiting Yes

10 DIA 11895 Running Yes 410 CAMPBEC

11 DIA 12218 Running Yes SAPLTHFB 410 JONESST

12 DIA 11897 Running Yes 3 SAPLRSAP 410 BW_REMOTE

13 DIA 11898 waiting Yes

16 DIA 12219 waiting Yes 1

17 DIA 11902 waiting Yes

18 DIA 11903 waiting Yes

20 DIA 11905 waiting Yes

21 DIA 11906 waiting Yes

24 UPD 11909 waiting Yes

26 ENQ 11911 waiting Yes

27 DIA 11912 waiting Yes

28 DIA 11914 waiting Yes

29 DIA 11915 waiting Yes

30 BGD 12230 waiting Yes 1

33 SPO 11919 waiting Yes

34 UP2 11920 Running Yes SAPLIEP2 410 CAMPBEC

Regards

Steve

Accepted Solutions (1)

Accepted Solutions (1)

0 Kudos

This looks like a problem with the database monitor (QQQDBLOG and QQQDBMWL in the call stack). You may want to report this problem to IBM (including the call stack information that you sent me). As a workaround, you can disable the monitor by setting profile parameter as4/dbmon/enable = 0. However, you won't see any useful data in ST04 then.

Kind regards,

Christian Bartels.

Answers (6)

Answers (6)

Former Member
0 Kudos

Hi Volker,

We are still V5R3 at the moment, but I will certainly look into that option once we upgrade. As for the memory, we have already determined that we are a bit short there, and I am adding some more tonight. I've noticed today paging is down considerably since switching off DBMON.It seems strange that this problem appears to suddenly of happened, rather than a gradual worsening. I guess we must have reached some tipping point.

Thanks

Steve

Former Member
0 Kudos

Stephen,

Did you happen to apply DBFIX Pack 16? IBM's theory is that the PTF jump from SI25186 to SI26685 may have caused the problem. At least that has been our experience. We just ran into the same problem and disabled the dbmon and performance is back to normal.

Best Regards,

Philip Stracener

Former Member
0 Kudos

Hi Philip,

We have applied DB Fix Pack 16 quite recently. We were already experiencing some performance problems before applying it but not as extreme as it got. I've ordered 17 now. Was there any suggestion from IBM that 17 would fix the problem? My feeling at this stage is that I'll just leave DBMON off.

Thanks for the info.

Regards

Steve

Former Member
0 Kudos

Hi Stephen,

what you detect is a well-known problem ! (Especially in RAM constraint environments)

If you are on V5R4, you can (and should) use the OpsNav Plancache instead - I always use that one on V5R4 customers with really good results (and the additional effort is for free => it is not slower, because the data is there anyway).

Regards

Volker Gueldenpfennig, consolut.gmbh

http://www.consolut.de - http://www.4soi.de - http://www.easymarketplace.de

Former Member
0 Kudos

Thanks Christian, the database monitor does appear to be our problem. I have disabled it and not only has our stuck commits gone away, general system performance is much better. Certain batch jobs overnight now run in 2,000s as opposed to 18,000s. I will report it to IBM, but I guess I'll have to switch it back on at some point for them to see it happening. Another side effect of the problem was the inability to cancel these hung commits from SM50, SM37 or SM04. Even from the AS/400 an endjob *immed on the work process to over an hour and a half to end the job!

Regards

Steve

0 Kudos

Mi first thought would be that maybe a Save-While-Active operation is running. When you start a SAVLIB operation with SAVACT(*SYNCLIB), the system tries to reach a checkpoint where all processes reach a transaction boundary (Commit or Rollback).

It would be interesting to know what status the work processes are in from an i5/OS standpoint (WRKACTJOB). Are the processes shown as idle (-> SEMW), or are they in another status? What can be seen at the bottom of the call stack for the affected work processes? (WRKACTJOB ->Option 5 for one of the processes, then option 11).

That information may help to understand what is going on.

Kind regards,

Christian Bartels.

Former Member
0 Kudos

Hi Christian,

Whilst we do use SAVACT(*SYNCLIB), I take SAP offline to reach the sync point then bring it back online once reached. This happens between 03:00 & 03:45. Typically the problems start at about 08:00.

From the 400 the locked WP's are at RUN status and the end of the call stack is

Program

Rqs or

Lvl Procedure Library Statement

< CommitEDRS QSYS 0000000217

QXDA_SQL QSYS 0000008031

QSQXCUTE QSYS 0000025220

QSQSTATS QSYS 0000004693

OUTPUT QSYS 0000004984

WRITE_LOG QSYS 0000005790

QQQDBLOG QSYS 0000002909

QQQDBMWL QSYS 0000000150

< YP7eachmon QSYS 0000000027

PurgeNode QSYS 0000000017

locksl2 QSYS 0000000015

I'll add that we are on V5R3, PTF's as latest APAR, ASCII Kernel patch 2307, lib_dbsl patch 2303.

The problem can affect dialog jobs just moving between screens, or background jobs. I suspect but haven't confirmed, that the background jobs are using BDC sessions.

Thanks

Steve

Former Member
0 Kudos

Hi Pat,

We're not using any HA or remote journalling. I've thought about the update processes too, but I rarely see them busy. Looking at it from a 400 view the jobs don't seem to be adding any journal entries, and if they are cancelled, they don't roll anything back. I'll keep you posted if anything turns up, and keep the suggestions coming.

Regards

Steve

Former Member
0 Kudos

Hi Steve,

My first reaction would be to ask if you have a high availability system that might be using a journal to mirror IFS objects. Especially if you are on V5R4 which has some problems (they call them enhansements).

Another thing I noticed is that you only show one update 2 work process. Maybe someone started using a program that ties that up. We had 4 update and 4 update 2 wp's when we ran 46c.

You need to be looking at the WP logs (ST11 is handy)

Keep looking and let us know if SAP helps.

Pat