Skip to Content

Data Services Job server starts job multiple times

Dear all,

we are experiencing a very weird issue:

We are on DS 4.2 SP10

Jobs are scheduled with the Data Service Scheduler

Sometimes (can be different jobs) the jobs start multiple times (2 times or more; sometimes within seconds, sometimes within minutes) it does not happen every day and in the Task Scheduler there is only one task per schedule.

The jobs sometimes fail or are successful. Problem is if a job is successful and ran 3 times we get tripled lines which is causing a huge issue.

Here is a screenshot of a job that ran tiwce times.

Have any one of you experienced something similar?

Would really appreciate some help as this is causing a lot of pain

Kind Regards

Chris

capture.jpg (23.7 kB)
Add a comment
10|10000 characters needed characters exceeded

  • Hi Sravanth,

    the job entries only exist once in the task scheduler.

    The issue does not occur every day, sometimtes the job runs 2 times, sometimes 1 time sometimes 3 or 4 times.

    I have deactivated and deleted, recreated jobs many times and it never solved the issue.

    Kind Regards

    Chris

  • Hi Christopher,

    We have faced the same issue of double triggering of Batch Jobs in Prod, as well as in Test.

    Reason:

    During PROD/TEST Source systems downtime, we have unscheduled the jobs and rescheduled for the Next job run. This has created duplicate job entries in DSP/DST Server Task Scheduler.

    Tasks to perform:

    1. Delete duplicate job entries from the DSP/DST server task scheduler.

    2. After the downtime, make sure to activate the job instead of rescheduling.

    NOTE:

    If still the jobs are triggering in duplicate then we might need to follow the SAP Note: 1250124, where we need to delete the complete schedules and .bat files for specific jobs and recreate those jobs again.

Assigned Tags

Related questions

8 Answers

  • Posted on Apr 02, 2019 at 02:04 PM

    Good Morning,
    Thank you for providing the Data Services version and the screenshot of the multiple executions.
    Can you please review the following KBA's for assistance in resolving your multiple executions.
    https://launchpad.support.sap.com/#/notes/2132053

    https://launchpad.support.sap.com/#/notes/1250124

    https://launchpad.support.sap.com/#/notes/1572605

    Add a comment
    10|10000 characters needed characters exceeded

    • 0

      Hi Jessica,

      thank you for those, but they did not help to solve my issue. What I have found out form the log files is that whenever there is a multiplication of a job the log records this line:

      JobServer: Error <RWSecureSocketError: in RWSecureSocket::send: SYSTEM_ERROR> while processing request <2> from client <172.16.53.10>. (BODI-850260)

      Any ideas on this?

      KR Chris

  • Posted on Apr 12, 2019 at 09:01 PM

    Chris,
    Based on that error, the job server is failing to connect during that request handshake. If the job schedule is failed to trigger the AL_RWJoblauncher will attempt to launch the job again, which may explain the duplicate entries.

    If you would like further troubleshooting, please create a ticket and provide the following logs
    - AL_RWJoblauncher.log found at <InsallDir>\logs\

    - Server_event.log found at <InsallDir>\logs\<job server>\server_eventlog_<date>.txt

    - ATL and Execution logs. <InsallDir>\logs\<job server>\repository\


    If you are utilizing a server group please review KBA 2477561 - How to troubleshoot SAP Data Services Server Group

    Add a comment
    10|10000 characters needed characters exceeded

  • Posted on Mar 02, 2020 at 02:25 AM

    Hi Chris, Jessica,

    Thanks for create this log on the BODS Scheduling issue. Is there any solution for the issue? I have faced the same issue but haven't found the root cause yet.

    1. When deactivate/reactivate the jobs, the issue disappeared for a day, then the issue started to happen at random time on random jobs. Mostly it happened from midnight to 5am when most of our daily jobs started - more resources(CPU/Memory) usage during this time.

    2. I confirmed there is no duplicated entries in Windows Task Scheduler or in the .bat/.txt file location.

    If you have found the solution. Could you please share with me? Thank you very much!

    Email:wenjiewu@hotmail.ca

    Thanks

    Wenjie Wu

    Add a comment
    10|10000 characters needed characters exceeded

  • Posted on Mar 14, 2020 at 02:58 PM

    Production server is DS 4.2 SP13 on Windows Server 2016. This was a new install performed the prior week to move up from an existing production system that was on 4.2 SP08. The new environment uses the same source and target databases as well as the same repository database as the old environment. Besides the difference in DS versions, the only other difference is that the old environment used Windows Server 2012 R2. This is a single job server, it is not part of a server group.

    Any time a job is initiated from a scheduled job in the Management Console, it runs twice. These jobs are triggered through the Windows Task Scheduler at 12:10 am.

    If I manually start the job through the Management Console it runs only once.

    If I manually start the job through the Windows Task Scheduler (using the existing task that was created by the Management Console) it runs only once.

    I used the Export Execution Command to create a .bat file and then scheduled execution of the .bat through the Windows Task Scheduler. The job runs twice.

    I've tried deleting the schedule and recreating it. That didn't help. I verified the entry in AL_SCHED_INFO and AL_MACHINE_INFO looks good too.

    The prior version (SP08) would have this problem once a year at most. The new install has the problem EVERY DAY.

    Add a comment
    10|10000 characters needed characters exceeded

  • Posted on Mar 23, 2020 at 09:48 PM

    I gave up on finding a solution within the Data Services configuration. There was a AW Job Launcher parameter in the DSConfig.txt file that I tried and that didn't work either.

    The "solution" is that the job now checks to see if a prior instance is already running. I encapsulated the logic in a custom function. If the function returns a non-null value then another instance of the job is already running.

    Variable declarations:

    $LV_Run_ID - varchar(20)

    $LV_Job_Name - varchar(256) based on the size of the column in al_history

    $LV_Prior_Run_ID - varchar(20)

    This could be done with no variables but it helps to have them in case you need to debug.

    Create a Datastore that points to the local repository that the job will be running out of. Change the Datastore name used in the sql() call below to the name of your Datastore. Pasting the text into the SAP site resulted in mangled formatting. You'll have to adjust it yourself or get the file from the attachment. get-job-prior-instance.txt
    # ############################################################################# Function: Get_Job_Prior_Instance # Date : 03/16/2020 # Author : Jim Egan (ProKarma, Inc.) # Purpose : Find the run ID of a running instance of the same job # : that was started before this instance. # # Modifications # Date : # Author : # Purpose : # ############################################################################# # Get metadata for this instance of the job $LV_Run_ID = job_run_id(); $LV_Job_Name = job_name(); # Get the run ID of the same name of the job that is running (status of S or SR) and has been started within the past 30 seconds. # If there is no other job that started before this instance then the return value will be NULL. # Status S is for jobs started normally, SR is for jobs started with the "Enable Recovery" option. $LV_Prior_Run_ID = sql('Runtime_Repo', 'SELECT MAX(OBJECT_KEY) FROM AL_HISTORY WHERE OBJECT_KEY < [$LV_Run_ID] AND SERVICE = {$LV_Job_Name} AND STATUS IN (\'S\',\'SR\') AND START_TIME >= (SYSDATE - (30 / 60 / 60 / 24))');

    return $LV_Prior_Run_ID;

    I call the above function in a Conditional transform. If the return is null then the job continues with the normal flow located within the TRUE branch of the conditional. If the return if not null then the FALSE branch of the conditional is executed and I print a message to the log saying that another instance is already running, then the job ends without throwing an exception.


    Add a comment
    10|10000 characters needed characters exceeded

  • Posted on Apr 14, 2020 at 06:32 PM

    Hi Jim,

    I was at a similar version as yours, we were on 4.2 SP08, now we are at 4.2SP12. This is where we saw the schedule duplication(run twice). I am talking with SAP still on potential solutions, based on my testing: if I create a brand new repo and migrate the jobs there, the issue does not happen yet.(maybe it will come out later....?)

    Another work around to avoid the duplicate run is to add the below code into a script at the beginning of the job, you may also try it.

    *****************************

    $L_Count = sql( 'REPO_DATASTORE_NAME', 'select count(0) from ( select DENSE_RANK() OVER(ORDER BY START_TIME desc) as "rnk", STATUS from AL_HISTORY where service = \'JOB_NAME\' ) where "rnk" = 2 AND STATUS = \'S\' '); if ($L_Count <> 0) begin raise_exception( 'Earlier instance of this job is still executing'); end

    *******************************

    Thanks

    Wenjie

    Add a comment
    10|10000 characters needed characters exceeded

  • Posted on Apr 14, 2020 at 06:40 PM

    Hi Jim,

    May I ask what is the "RWJobLaunch" configuration you updated? Is it this below:

    [int]

    ........

    RWJobLaunchSemaphoreNumber = 4

    Thanks

    Wenjie Wu

    Add a comment
    10|10000 characters needed characters exceeded

  • Posted on Jul 16, 2020 at 02:55 PM

    Wenjie, If you go to the properties of the job in designer you can select the single instance option which will only allow one instance to run. This will stop two instances from running at the same time, but will not fix the AL_RWJoblauncher from starting the job multiple times.

    I've recently ran into issues where the history cleanup that is done by EIM Adaptive Processing services cause locking on the repository database on MS SQL Server which ends up causing a job submission request to timeout. Once the lock is released the job actually starts running even though submission request has timeout. If the job submission is retired after the first timeout you will now see two job running on the job server after the database lock is released.

    Try turning off history cleanup and see if your multiple submission issue goes away.

    Thx,

    Adam

    Add a comment
    10|10000 characters needed characters exceeded

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.