cancel
Showing results for 
Search instead for 
Did you mean: 

Work Process hung and restarting every 30 seconds

ken_halvorsen2
Active Participant
0 Kudos

Hi All

I'm experiencing something strange and wonder if anyone has any suggestions.

This has happened twice now, once in the ECC 6 and once in the CRM 2007 production system (SQL Server 2005 on a cluster)

In the most recent (happening now) I was running statistics update with Transaction ST04_MSS on several table. On one table the update contunied to run over night and my user id was logged off. But the Workprocess (viewed from sm51) continued to run. Refreshing the screen shows that the Work Process continues to run, but restarts with a new PID every 30 seconds (confirmed via the System Log).

I have tried to Cancel the process with & without core, but it continues. I have even tried to End the process at the OS level with out succes.

I actually have 2 Work processes running like this at the same time now. The dev_disp log shows "DpHdlDeadWp: restart wp (pid=9144) automatically" (of course the pid changes each time) every 30 seconds.

The dev_w3 log shows, for example:

Tue Jul 28 08:41:37 2009

create_con (con_name=r/3)

Loading library DB library 'E:\usr\sap\<sid>\SYS\run\dbmssslib.dll' ...

Library 'E:\usr\sap\<sid>\SYS\run\dbmssslib.dll' loaded

New connection 0 created

sysno 00

sid R3P

systemid 562 (PC with Winidows NT)

relno 7000

patchlevel 0

patchno 181

intno 20050900

make multithreaded, ASC11, 64 bit, optimized

pid 5884

kernel runs with dp version 241(ext=110) (@(#) DPLIB-INT-VERSION-241)

-

-

rdisp/queue_size_check_value : -> off

***ERROR => sapinit: no memory or imporper size for ztta/short_area <= 0(-3) [sapinit.c 1107]

***ERROR => DpMemInit: sapinit 9-3) [dpxxdisp.c 10057]

      • DP_FATAL_ERROR => DpSapEnvInit: DpMemInit

This is the first time this has happened in the ECC 6.0 system and it has only happened once in CRM 2007 system. The fix for the CRM system was to reboot the application. Of course I could do this in the ECC 6 system, but want to figure out why it's happening first.

There are no runtime error DBIF_RSQL_NO_MEMORYshort dumps associated with this.

The ztta/short_area parameter is at the default 1,600,000, which hasn't been touched since upgrade 1 year ago.

The ztta/roll_area parameter is at the maximum value 10,000,000.

Does anyone have any suggestions about what may have caused this or how to cancel it with out rebooting the system?

Accepted Solutions (1)

Accepted Solutions (1)

markus_doehr2
Active Contributor
0 Kudos

> This has happened twice now, once in the ECC 6 and once in the CRM 2007 production system (SQL Server 2005 on a cluster)

> rdisp/queue_size_check_value : -> off

> ***ERROR => sapinit: no memory or imporper size for ztta/short_area <= 0(-3) [sapinit.c 1107]

> ***ERROR => DpMemInit: sapinit 9-3) [dpxxdisp.c 10057]

> *** DP_FATAL_ERROR => DpSapEnvInit: DpMemInit

Do you use "es/implementation = flat" on both systems? You activate VMC on the CRM?

Markus

ken_halvorsen2
Active Participant
0 Kudos

Hi Bhupender

PHYS_MEMSIZE = 8192

Hi Markus

Do you use "es/implementation = flat" on both systems? No, OS is MS Windows 2003. I'm not sure what VMC is, but do not think we use it here.

markus_doehr2
Active Contributor
0 Kudos

> No, OS is MS Windows 2003. I'm not sure what VMC is, but do not think we use it here.

I asked because there is a known problem with memory management in Windows 2003, especially, if VMC is activated (check in your CRM system if parameter vmcj/enable = 1 is set)

Note 1316558 - System hang situations on Windows Server 2003

This problem seems to become so big, that SAP doesn't allow CRM 7.0 to be installed on Windows 2003 (Note 1357247 - CRM 7.0 on Windows Server 2003: individual release approval).

I think though, that this is not your problem. I would assume more, that you ARE in fact short on memory.

What are the memory values in ST02 when this occurs?

Markus

ken_halvorsen2
Active Participant
0 Kudos

Thanks Markus

vmcj/enable is set to "off" on both systems.

I'm not sure about the memory, although I have been adjusting the buffer parameters within the last month, I don't seem to be having many Memory related short dumps. (yes a few due to large queries, etc.)

ECC 6 - Buffer parameters:

abap/buffersize 650000

abap/pxa shared

rsdb/cua/buffersize 6000

zcsa/presentation_buffer_area 20000000

sap/bufdir_entries 12000

zcsa/table_buffer_area 60000000

zcsa/db_max_buftab 20000

rtbb/buffer_lenght 30000

rtbb/max_tables 500

rsdb/obj/buffersize 25000

rsdb/obj/max_objects 25000

rsdb/obj/large_object_size 8192

rsdb/obj/mutex_n 0

rsdb/otr/buffersize_kb 4096

rsdb/otr/max_objects 2000

rsdb/otr/mutex_n 0

rsdb/esm/buffersize_kb 4096

rsdb/esm/max_objects 2000

rsdb/esm/large_objects_size 8192

rsdb/esm/mutex_n 0

rsdb/ntab/entrycount 60000

rsdb/ntab/ftabsize 60000

rsdb/ntab/irdbsize 3000

zcsa/calendar_area 500000

zcsa/calendar_ids 200

ztta/roll_area 10000000

ztta/roll_first 1

ztta/short_area 1600000

rdisp/ROLL_SHM 32768

rdisp/PG_SHM 16384

rdisp/PG_LOCAL 150

em/initial_size_MB 8192

em/blocksize_KB 1024

em/address_space_MB 1024

ztta/rollextension 2000000000

abap/heap_are_dia 6000000000

abap/heap_area_nondia 6000000000

abap/heap_area_total 30000000000

abap/heaplimit 40000000

abap/use_paging 0

The CRM has not had this happen since the reboot, but again only had it happen once.

markus_doehr2
Active Contributor
0 Kudos

> vmcj/enable is set to "off" on both systems.

Ok.

> I'm not sure about the memory, although I have been adjusting the buffer parameters within the last month, I don't seem to be having many Memory related short dumps. (yes a few due to large queries, etc.)

The parameters look all good.

What I wanted to have a look at is the actual memory consumption in ST02.

How much physical memory is available on those machines?

Are they virtualized or are they running on bare metal?

Markus

ken_halvorsen2
Active Participant
0 Kudos

The system is on 2 - Dell PowerEdge R900, X64-based PC, 8 - EM64T family 6 Model 15 ~2926 Mhz, with 32 GB Total Phys Mem in a cluster with the Ci and Db on seperate servers. There are 3 Virtual App servers attached as well as 1 other "real" server for application servers. Users can log on to the Ci as well as the app servers.

Both times this has happened it happened on the Ci.

I don't think it's a memory issue, but more that a user is logged off or terminated while running a long program that is running in a dialog session. We don't have any auto log off parameters set for the system, but there is a time out on the terminal servers we use to log into the system with.

The Effected processes don't seem to be using any cpu or memory. I can stop the restart momentarily, by running sm51 - selecting the work process and going to Process -> restart after error -> No, then cancel the process with or without core. The report doesn't disappear and eventually the work process does restarts.

It's as if the work process has lost it's connection but doesn't know it should stop. ???? Very puzzling.

Ken

markus_doehr2
Active Contributor
0 Kudos

I must admit I have no clue what's happening.

Maybe you should open a call (BC-OP-NT) and provide them with the cluster details (Note 1348732 - Providing Windows Cluster Configuration Details) - probably there are some timeout values in the DB client to be adjusted.

Markus

ken_halvorsen2
Active Participant
0 Kudos

Thanks Markus

I spoke to another Basis Admin, who said he did recognize the ztta/short_area error as an Old Windows error from older versions. I did note that all of the OSS Notes I could find were mostly between 1995 and 2005. He didn't know why it would be poking it's ugly head up again.

The trace record showed a couple of things:

Mon Jul 27 13:49:25 2009

SQLBREAK: DBSL_CMD_SQLBREAK: CbOnCancel was not set

Program canceled

Reason = soft cancel

Report = CL_SQL_RESULT_SET======CP

(By the way, I found 1 note on this message, # 834235, which indicates that because of it's unpredictable behaviour, the SQL Break functionality wouldn't be supported by MSSQL-DbSl anymore. Resulting from this that each cancellation of a running operation (Stop transaction) will cause a work process restart. I'm now wondering if trying to cancel a transaction may have caused the wp to continually restart.)

Mon Jul 27 13:49:30 2009

Soft kill timeout, terminated process

Previous sql break failed, terminate without db cleanup and hooks

Mon Jul 27 13:49:31 2009

The work process was restarted but ended with:

      • ERROR => sapinit: no memory or improper size for ztta/short_area <= 0(-3) [sapinit.c 1107]

      • ERROR = DpMemInit: sapinit (-3) [dpxxdisp.c 10057]

      • DP_FATAL_ERROR => DpSapEnvInit

This occurrence started to cascade and expand. By 16:00 there were several work processes (+12) observed on the Ci each of the App servers had work processes in similar state.

Thinking that it might be a memory leak type of problem we tried to stop the work process by terminating it at the OS level via the Task Manager. Which was unsuccessful and the work processes started to recycle them selves. Soon each work process had over 100 err's and only showed ** in that column.

Noting that there was a similar occurence in our CRM system the previous week and that re-cycling the application fixed that situation, we tested recycling the application on one of the App servers.

After the re-cycle the application the work process behavior could not be recreated, so we recycled the application completely and after restart the problem does not seem to be re-occurring.

Thanks you very much for the Help Markus

Former Member
0 Kudos

Hi Ken:

Did any unusual transports get into your systems recently?

Did you recently apply a SupPack in both ECC and CRM?

Do you have a process processing IDOCs, that due to failures in your system configuration or maybe the SPOOL table becoing full, is trying to re-processsing the IDOCs over and over again?

I remember having a similar issue in the past, and basically the process was trying to restart every time because due to the issue it found, it could not complete the task, then tried to run it again. That I remember happened to me when I was applying some transports and they could not get through.

Unfortunately I am on vacation mode, hence I do not have the information by hand. That is the nice thing about being on a permanent job. You became a consultant, so ... go back to work ... slave!!!

Regards,

JC

ken_halvorsen2
Active Participant
0 Kudos

Hey JC

- Did any unusual transports get into your systems recently?

No, we're on another "GoLive" with a freeze on.

- Did you recently apply a SupPack in both ECC and CRM?

Nope.

- Do you have a process processing IDOCs, that due to failures in your system configuration or maybe the SPOOL table becoing full, is trying to re-processsing the IDOCs over and over again?

This is a possibility, but nothing has changed in the system iDoc's are processed daily without occurence. Both occurences where initiated from an End User running a report. For example the last one was after I was running a statistics update on a table, but the remote desktop time out logged my session off. (Not an SAP time out) The next day, the process was still running yet nothing being done. trying to kill the process atthe SAP level had no effect because the PID was changing every 30 seconds.

- I remember having a similar issue in the past, and basically the process was trying to restart every time because due to the issue it found, it could not complete the task, then tried to run it again.

Sounds similar, but we definately had a memory issue here. Then it really got bad when I tried shutting down the disp+work processes that seemed to have a lot of memory attached via the Task Manager at the OS level. Then the process wouldn't restart after that. The error counter on SM51 went wild until it passed 99 the only showed **. As I mentioned earlier the trace record had a bunch of ***ERROR => sapinit: no memory or imporper size for ztta/short_area. Our Operate Manager who's an Old time Basis Geek, said that way back when (like on Windows 95) they would get this. It was some sort of Windows memory leak problem. Anyway a re-boot of the SAP Application solved my problem.

Take it easy buddy.

Answers (2)

Answers (2)

ken_halvorsen2
Active Participant
0 Kudos

Need the question to be unanswered to award points.

Former Member
0 Kudos

Hi Ken,

What is the value of parameter PHYS_MEMSIZE ?

Regards

Bhupender