on 07-28-2009 5:10 PM
Hi All
I'm experiencing something strange and wonder if anyone has any suggestions.
This has happened twice now, once in the ECC 6 and once in the CRM 2007 production system (SQL Server 2005 on a cluster)
In the most recent (happening now) I was running statistics update with Transaction ST04_MSS on several table. On one table the update contunied to run over night and my user id was logged off. But the Workprocess (viewed from sm51) continued to run. Refreshing the screen shows that the Work Process continues to run, but restarts with a new PID every 30 seconds (confirmed via the System Log).
I have tried to Cancel the process with & without core, but it continues. I have even tried to End the process at the OS level with out succes.
I actually have 2 Work processes running like this at the same time now. The dev_disp log shows "DpHdlDeadWp: restart wp (pid=9144) automatically" (of course the pid changes each time) every 30 seconds.
The dev_w3 log shows, for example:
Tue Jul 28 08:41:37 2009
create_con (con_name=r/3)
Loading library DB library 'E:\usr\sap\<sid>\SYS\run\dbmssslib.dll' ...
Library 'E:\usr\sap\<sid>\SYS\run\dbmssslib.dll' loaded
New connection 0 created
sysno 00
sid R3P
systemid 562 (PC with Winidows NT)
relno 7000
patchlevel 0
patchno 181
intno 20050900
make multithreaded, ASC11, 64 bit, optimized
pid 5884
kernel runs with dp version 241(ext=110) (@(#) DPLIB-INT-VERSION-241)
-
-
rdisp/queue_size_check_value : -> off
***ERROR => sapinit: no memory or imporper size for ztta/short_area <= 0(-3) [sapinit.c 1107]
***ERROR => DpMemInit: sapinit 9-3) [dpxxdisp.c 10057]
DP_FATAL_ERROR => DpSapEnvInit: DpMemInit
This is the first time this has happened in the ECC 6.0 system and it has only happened once in CRM 2007 system. The fix for the CRM system was to reboot the application. Of course I could do this in the ECC 6 system, but want to figure out why it's happening first.
There are no runtime error DBIF_RSQL_NO_MEMORYshort dumps associated with this.
The ztta/short_area parameter is at the default 1,600,000, which hasn't been touched since upgrade 1 year ago.
The ztta/roll_area parameter is at the maximum value 10,000,000.
Does anyone have any suggestions about what may have caused this or how to cancel it with out rebooting the system?
> This has happened twice now, once in the ECC 6 and once in the CRM 2007 production system (SQL Server 2005 on a cluster)
> rdisp/queue_size_check_value : -> off
> ***ERROR => sapinit: no memory or imporper size for ztta/short_area <= 0(-3) [sapinit.c 1107]
> ***ERROR => DpMemInit: sapinit 9-3) [dpxxdisp.c 10057]
> *** DP_FATAL_ERROR => DpSapEnvInit: DpMemInit
Do you use "es/implementation = flat" on both systems? You activate VMC on the CRM?
Markus
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
> No, OS is MS Windows 2003. I'm not sure what VMC is, but do not think we use it here.
I asked because there is a known problem with memory management in Windows 2003, especially, if VMC is activated (check in your CRM system if parameter vmcj/enable = 1 is set)
Note 1316558 - System hang situations on Windows Server 2003
This problem seems to become so big, that SAP doesn't allow CRM 7.0 to be installed on Windows 2003 (Note 1357247 - CRM 7.0 on Windows Server 2003: individual release approval).
I think though, that this is not your problem. I would assume more, that you ARE in fact short on memory.
What are the memory values in ST02 when this occurs?
Markus
Thanks Markus
vmcj/enable is set to "off" on both systems.
I'm not sure about the memory, although I have been adjusting the buffer parameters within the last month, I don't seem to be having many Memory related short dumps. (yes a few due to large queries, etc.)
ECC 6 - Buffer parameters:
abap/buffersize 650000
abap/pxa shared
rsdb/cua/buffersize 6000
zcsa/presentation_buffer_area 20000000
sap/bufdir_entries 12000
zcsa/table_buffer_area 60000000
zcsa/db_max_buftab 20000
rtbb/buffer_lenght 30000
rtbb/max_tables 500
rsdb/obj/buffersize 25000
rsdb/obj/max_objects 25000
rsdb/obj/large_object_size 8192
rsdb/obj/mutex_n 0
rsdb/otr/buffersize_kb 4096
rsdb/otr/max_objects 2000
rsdb/otr/mutex_n 0
rsdb/esm/buffersize_kb 4096
rsdb/esm/max_objects 2000
rsdb/esm/large_objects_size 8192
rsdb/esm/mutex_n 0
rsdb/ntab/entrycount 60000
rsdb/ntab/ftabsize 60000
rsdb/ntab/irdbsize 3000
zcsa/calendar_area 500000
zcsa/calendar_ids 200
ztta/roll_area 10000000
ztta/roll_first 1
ztta/short_area 1600000
rdisp/ROLL_SHM 32768
rdisp/PG_SHM 16384
rdisp/PG_LOCAL 150
em/initial_size_MB 8192
em/blocksize_KB 1024
em/address_space_MB 1024
ztta/rollextension 2000000000
abap/heap_are_dia 6000000000
abap/heap_area_nondia 6000000000
abap/heap_area_total 30000000000
abap/heaplimit 40000000
abap/use_paging 0
The CRM has not had this happen since the reboot, but again only had it happen once.
> vmcj/enable is set to "off" on both systems.
Ok.
> I'm not sure about the memory, although I have been adjusting the buffer parameters within the last month, I don't seem to be having many Memory related short dumps. (yes a few due to large queries, etc.)
The parameters look all good.
What I wanted to have a look at is the actual memory consumption in ST02.
How much physical memory is available on those machines?
Are they virtualized or are they running on bare metal?
Markus
The system is on 2 - Dell PowerEdge R900, X64-based PC, 8 - EM64T family 6 Model 15 ~2926 Mhz, with 32 GB Total Phys Mem in a cluster with the Ci and Db on seperate servers. There are 3 Virtual App servers attached as well as 1 other "real" server for application servers. Users can log on to the Ci as well as the app servers.
Both times this has happened it happened on the Ci.
I don't think it's a memory issue, but more that a user is logged off or terminated while running a long program that is running in a dialog session. We don't have any auto log off parameters set for the system, but there is a time out on the terminal servers we use to log into the system with.
The Effected processes don't seem to be using any cpu or memory. I can stop the restart momentarily, by running sm51 - selecting the work process and going to Process -> restart after error -> No, then cancel the process with or without core. The report doesn't disappear and eventually the work process does restarts.
It's as if the work process has lost it's connection but doesn't know it should stop. ???? Very puzzling.
Ken
Thanks Markus
I spoke to another Basis Admin, who said he did recognize the ztta/short_area error as an Old Windows error from older versions. I did note that all of the OSS Notes I could find were mostly between 1995 and 2005. He didn't know why it would be poking it's ugly head up again.
The trace record showed a couple of things:
Mon Jul 27 13:49:25 2009
SQLBREAK: DBSL_CMD_SQLBREAK: CbOnCancel was not set
Program canceled
Reason = soft cancel
Report = CL_SQL_RESULT_SET======CP
(By the way, I found 1 note on this message, # 834235, which indicates that because of it's unpredictable behaviour, the SQL Break functionality wouldn't be supported by MSSQL-DbSl anymore. Resulting from this that each cancellation of a running operation (Stop transaction) will cause a work process restart. I'm now wondering if trying to cancel a transaction may have caused the wp to continually restart.)
Mon Jul 27 13:49:30 2009
Soft kill timeout, terminated process
Previous sql break failed, terminate without db cleanup and hooks
Mon Jul 27 13:49:31 2009
The work process was restarted but ended with:
ERROR => sapinit: no memory or improper size for ztta/short_area <= 0(-3) [sapinit.c 1107]
ERROR = DpMemInit: sapinit (-3) [dpxxdisp.c 10057]
DP_FATAL_ERROR => DpSapEnvInit
This occurrence started to cascade and expand. By 16:00 there were several work processes (+12) observed on the Ci each of the App servers had work processes in similar state.
Thinking that it might be a memory leak type of problem we tried to stop the work process by terminating it at the OS level via the Task Manager. Which was unsuccessful and the work processes started to recycle them selves. Soon each work process had over 100 err's and only showed ** in that column.
Noting that there was a similar occurence in our CRM system the previous week and that re-cycling the application fixed that situation, we tested recycling the application on one of the App servers.
After the re-cycle the application the work process behavior could not be recreated, so we recycled the application completely and after restart the problem does not seem to be re-occurring.
Thanks you very much for the Help Markus
Hi Ken:
Did any unusual transports get into your systems recently?
Did you recently apply a SupPack in both ECC and CRM?
Do you have a process processing IDOCs, that due to failures in your system configuration or maybe the SPOOL table becoing full, is trying to re-processsing the IDOCs over and over again?
I remember having a similar issue in the past, and basically the process was trying to restart every time because due to the issue it found, it could not complete the task, then tried to run it again. That I remember happened to me when I was applying some transports and they could not get through.
Unfortunately I am on vacation mode, hence I do not have the information by hand. That is the nice thing about being on a permanent job. You became a consultant, so ... go back to work ... slave!!!
Regards,
JC
Hey JC
- Did any unusual transports get into your systems recently?
No, we're on another "GoLive" with a freeze on.
- Did you recently apply a SupPack in both ECC and CRM?
Nope.
- Do you have a process processing IDOCs, that due to failures in your system configuration or maybe the SPOOL table becoing full, is trying to re-processsing the IDOCs over and over again?
This is a possibility, but nothing has changed in the system iDoc's are processed daily without occurence. Both occurences where initiated from an End User running a report. For example the last one was after I was running a statistics update on a table, but the remote desktop time out logged my session off. (Not an SAP time out) The next day, the process was still running yet nothing being done. trying to kill the process atthe SAP level had no effect because the PID was changing every 30 seconds.
- I remember having a similar issue in the past, and basically the process was trying to restart every time because due to the issue it found, it could not complete the task, then tried to run it again.
Sounds similar, but we definately had a memory issue here. Then it really got bad when I tried shutting down the disp+work processes that seemed to have a lot of memory attached via the Task Manager at the OS level. Then the process wouldn't restart after that. The error counter on SM51 went wild until it passed 99 the only showed **. As I mentioned earlier the trace record had a bunch of ***ERROR => sapinit: no memory or imporper size for ztta/short_area. Our Operate Manager who's an Old time Basis Geek, said that way back when (like on Windows 95) they would get this. It was some sort of Windows memory leak problem. Anyway a re-boot of the SAP Application solved my problem.
Take it easy buddy.
Need the question to be unanswered to award points.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Ken,
What is the value of parameter PHYS_MEMSIZE ?
Regards
Bhupender
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
User | Count |
---|---|
94 | |
11 | |
11 | |
10 | |
9 | |
7 | |
6 | |
5 | |
4 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.