Skip to Content
author's profile photo Former Member
Former Member

Strange Memory Leak (Huge!) with Data Services (Integrator)

We are preparing for a software release of a BI product that uses Data Services (Intergrator) Release 3.

Database Platform - (standalone database no other software installed)

HW - Dell 6850, with 4 quad-core CPUS, 4 GB memory, pagefile 4 GB

SW - Windows 2003 SP2, SQL Server 2005 SP2 (9.00.3054.00) Max memory set to 2 GB minimum set to 500 MB.

Data Services Platform - identical to database platform, standalone just BO software installed. 11 R3

ETL job run will take over 11 hours. Pagefile will fill up, physical memory will peg to nearly 4 GB. Task Manager, though shows only 250 MB for all current processes. BODI processes only show about 120-140 MB of memory.

3.8 GB of physical RAM and 3.2 GB of pagefile are being used on the database server, yet Task Manager Processes only shows 250 GB of RAM in use. Nothing else is running on the server except SQL Server, the OS processes and Trend Office Microscan Client.

Add a comment
10|10000 characters needed characters exceeded

Assigned Tags

Related questions

4 Answers

  • Best Answer
    Posted on Jul 22, 2008 at 11:49 AM

    What I forgot..... The reson I account the network as the bottleneck is because a 1GBit Ethernet has a bandwidth of maybe 50-80MByte/sec. This is less than a modern single disk can do, not to mention an entire disk array. And if over that one network card you read and load, there is just half the bandwidth available. As a result, in all my tests with DI the network was the limiting factor.

    Obviously, add a lot of complex transformations, dozends of lookups etc and sooner or later the engine will become the bottleneck. Then you can use DegreeOfParallelism feature and maybe Partitioning to utilize the server better and if that doesn't help, yes, then the network is no longer the bottleneck.

    It seems we come from different angles, you more from the complex transformation side whereas I do more - well - almost straight copies. And as I said, the network is not the bottleneck in your case, certainly not for a job that takes 11 hours.

    Edited by: Werner Daehn on Jul 22, 2008 1:53 PM

    Add a comment
    10|10000 characters needed characters exceeded

    • Thanks. May I ask what your dataflow in question is doing? In other words, what is causing that many session. My guess would be lookup_ext() calls with DoP. In no_cache mode?

      Question I am asking is, maybe I can suggest some changes. And maybe, they might make sense...

      If you do not have the time for that, that's just fine.

  • Posted on Jul 21, 2008 at 07:07 AM

    If it would be a memory leak, the windows task manager would show under the tab "processes" the memory allocated by e.g. the al_engine (=Data Integrator Job or Dataflow). A memory leak is nothig else than a process allocating memory but not releasing it altough no longer used.

    Couple of thoughts:

    • What is the available, total and system cache memory at such time? (Performance tab of the task manager)

    • In the process list go to View -> Select Columns -> Virtual Memory Size. Add that. Otherwise you will see in the process manager how much physical memory the process takes, all swapped out memory is not listed there. Virtual Memory gives you the entire memory the process allocated.

    • Compare your hardware/software performance with others. I have setup a test job customers run and we have got quite a few numbers back already to compare: https://boc.sdn.sap.com/node/5070

    You said you have two servers, one with DI and one with SQL Server. I assume this issue happens on the DI server only, does it?

    And on a side note, I am not too much excited installing DI on a server of its own. That means you read over the network and write over the network hence reducing the available network bandwith by half. And that on the resource that is the limiting factor anyway - in most cases. I prefer installing all on the target database machine. On the other hand, if the job takes 11hours, either you are loading terabytes of data or there is something severly wrong anyway.

    Add a comment
    10|10000 characters needed characters exceeded

  • author's profile photo Former Member
    Former Member
    Posted on Jul 21, 2008 at 12:44 PM

    I appreciate your response and willingness to help, but do not appreciate your lecturing tone. I've worked in IT for nearly 25 years, have been an Oracle DBA for the last 10 years and fully understand the concept of a "memory leak". Unfortunately, my experience with BODI/Data Services extends back only several months.

    The VM column was added to Task Manager, yes, I've worked in IT long enough to know about adding columns in task manager. Task manager processes only (including the al* processes) only accounted for about 250 MB of physical RAM and VM of about 3-400 MB. Physical memory and VM (pagefile) are pegged to about 90% of 4 GB and 8 GB, allocated respectively.

    The network and the servers (cards installed) are at 1 GB. Network traffic is not the issue... there are no waits and network traffic is unimpeded. I would never place an app server on the the same machine as a database, but would consider doing so if the network were antiquated, i.e., 10//100 Mbps. Our configuration MUST mimic the customer's hardware, this is not an IT shop, but a development lab.

    The issue, I discovered, is caused by the poor locking mechanisms in SQL Server. There are no blocking locks, but the processes were waiting for grants on database and table level locks. Why row level locks are not issued is a mystery to me, but they can be forced at the table level by changing a parameter. The ETL developer also discovered a SQL statement that was spinning CPU like crazy and has made corrections

    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member Former Member

      I'm an Oracle DBA (on Linux/Unix) and have little experience with SQL Server or Business Objects. I've made several suggestions to our developers, but we are constrained by the fact that we are creating a set of delivered software to a customer. Yes, it would be better to drop the indexes and constraints before loading the data, but that will unnecessarily complicate the ETL deliverable and invite customer tinkering with a shrink-wrapped product.

      The issue appeared to be a memory leak of some kind. All BO Data Services jobs/sessions would be killed or closed and the physical and virtual memory would not be released until reboot. This is not an uncommon problem in Windows environments.

  • Posted on Jul 22, 2008 at 11:41 AM

    No offense indended. Neither do I know you in person nor your background, so I thought I better cover all possibilites, even the trivial ones. I am sure you had a similar conversation like the one below with a customer of yours

    "My computer is not working!" and after hours you figured the solution: "Turn on your monitor"

    The problem I was having was the inconsistant information or how I interpreted it:

    Fact1: you see swapping and huge memory consumption

    Fact2: The sum of virtual memory of all processes is less than the physical memory, at least does not account for swapping.

    Fact3: Hence DI has to have a memory leak.

    That didn't make sense to me. And I am still not sure I got your point about the core reason of it and where my understanding was wrong. Was the al_engine still living as a zombie process within Windows? Or did the SQL Server not release the session memory? I have no idea how to tell either one anyway....

    In regards to row level locking in the SQL Server database I can possibly give you some background that might help: SQL Server does row level locking until it figures that to be inefficient because of the number of changes and does automatically switch to page level locking. If somebody else or even your own process but different session is touching any row inside that block, you get an error. This has caused us some troubles with DI, e.g. you have a Table Comparison in row-by-row mode. A few rows pass through the TC transform, the SQL Server is switching to page level locking and suddenly the TC cannot read a row anymore. The concept of an insert/update prohibiting a read (!!), a read of a row never touched even, is something we as Oracle DBAs would never dream of initially.

    Are you intending to go to the Business Objects User Conference this year? [http://www.myboc.org/?extcmp=salesflash_global_2008_2107]

    If yes, let me know. I buy you a beer. From Oracle DBA to Oracle DBA.

    Add a comment
    10|10000 characters needed characters exceeded

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.