cancel
Showing results for 
Search instead for 
Did you mean: 

Data Integrator 11.7.3: Error <70602>

Former Member
0 Kudos

Hi,

I have scheduled my entire production environment in one job, many levels, many workflows, dataflows, scripts, variables aso. After an upgrade from 11.7.2 to 11.7.3 the job stops every night initiating one particular workflow and dataflow. It's a Sybase repo.

Here is the part of the log where the df initiates and fails. Anybody with the same experience or knows what this sybase error 70602 often relates to in BODI ?

23687 1 WORKFLOW 04.12.2008 06:29:17 Work flow <EDW_FAKTA_OPPDRAG_FEILMELDING_WF> is started.

4139 1 DATAFLOW 04.12.2008 06:29:23 Process to execute data flow <EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF> is started.

4139 1 DATAFLOW 04.12.2008 06:29:29 Data flow <EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF> is started.

4139 1 DATAFLOW 04.12.2008 06:29:29 Cache statistics for data flow <EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF> are not available to be used for optimization and need

4139 1 DATAFLOW 04.12.2008 06:29:29 to be collected before they can be used.

4139 1 DATAFLOW 04.12.2008 06:29:29 Data flow <EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF> using PAGEABLE Cache with <3443 MB> buffer pool.

4139 1 DATAFLOW 04.12.2008 06:31:47 Data flow <EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF> is terminated due to error <70602>.

4139 1 DATAFLOW 04.12.2008 06:31:47 Process to execute data flow <EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF> is completed.

23687 1 WORKFLOW 04.12.2008 06:31:48 Work flow <EDW_FAKTA_OPPDRAG_FEILMELDING_WF> is terminated due to an error <70602>.

23687 1 WORKFLOW 04.12.2008 06:31:48 Work flow <FAKTA_2_KJORING> is terminated due to an error <70602>.

23687 1 WORKFLOW 04.12.2008 06:31:48 Work flow <EDW_WF> is terminated due to an error <70602>.

23687 1 JOB 04.12.2008 06:31:48 Job <DW_Job> is terminated due to error <70602>.

regards,

-jem

Accepted Solutions (1)

Accepted Solutions (1)

werner_daehn
Active Contributor
0 Kudos

What does the error log say? The trace log you posted doesn't go into the details.

Former Member
0 Kudos

Sorry, forgot this log in the previous mail.

4139 1 DBS-070602 04.12.2008 06:31:47 |Dataflow EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF

4139 1 DBS-070602 04.12.2008 06:31:47 Sybase SQL Server <BODIREP> error message for operation <ct_results>: <Sybase CS Library message number <16843171> Severity

4139 1 DBS-070602 04.12.2008 06:31:47 <1>: Message Text is: ct_results(): user api layer: external error: This routine cannot be called until all fetchable results

4139 1 DBS-070602 04.12.2008 06:31:47 have been completely processed.>.

4139 1 DBS-070601 04.12.2008 06:31:47 |Dataflow EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF

4139 1 DBS-070601 04.12.2008 06:31:47 SQL submitted to Sybase SQL Server <BODIREP> resulted in error <Sybase CS Library message number <16843024> Severity <1>:

4139 1 DBS-070601 04.12.2008 06:31:47 Message Text is: ct_command(): user api layer: external error: This routine cannot be called while results are pending for a

4139 1 DBS-070601 04.12.2008 06:31:47 command that has been sent to the server.>. The SQL submitted is <select OBJECT_KEY, MACHINE, SERVER, TYPE, SERVICE, CONTAINER,

4139 1 DBS-070601 04.12.2008 06:31:47 INST_MACHINE, INST_SERVER, INST_SERVER_PORT, INST_REPO, INST_EXEC, INST_EXEC_KEY, RUN_SEQ, START_TIME, END_TIME,

4139 1 DBS-070601 04.12.2008 06:31:47 EXECUTION_TIME, STATUS, HAS_ERROR, IGNORE_ERROR, NORM_MACHINE, NORM_SERVER, NORM_TYPE, NORM_SERVICE, NORM_CONTAINER,

4139 1 DBS-070601 04.12.2008 06:31:47 NORM_INST_MACHINE, NORM_INST_SERVER, NORM_INST_REPO, NORM_INST_EXEC from AL_HISTORY where OBJECT_KEY = 139>.

4139 1 REP-100117 04.12.2008 06:31:47 |Dataflow EDW_FAKT_OPPDRAG_FEILMELDING_AKTIV_DF

4139 1 REP-100117 04.12.2008 06:31:47 Cannot retrieve <Job_runid> <select OBJECT_KEY, MACHINE, SERVER, TYPE, SERVICE, CONTAINER, INST_MACHINE, INST_SERVER,

4139 1 REP-100117 04.12.2008 06:31:47 INST_SERVER_PORT, INST_REPO, INST_EXEC, INST_EXEC_KEY, RUN_SEQ, START_TIME, END_TIME, EXECUTION_TIME, STATUS, HAS_ERROR,

4139 1 REP-100117 04.12.2008 06:31:47 IGNORE_ERROR, NORM_MACHINE, NORM_SERVER, NORM_TYPE, NORM_SERVICE, NORM_CONTAINER, NORM_INST_MACHINE, NORM_INST_SERVER,

4139 1 REP-100117 04.12.2008 06:31:47 NORM_INST_REPO, NORM_INST_EXEC from AL_HISTORY where OBJECT_KEY = 139> from the repository. Additional database information:

4139 1 REP-100117 04.12.2008 06:31:47 <SQL submitted to Sybase SQL Server <BODIREP> resulted in error <Sybase CS Library message number <16843024> Severity <1>:

4139 1 REP-100117 04.12.2008 06:31:47 Message Text is: ct_command(): user api layer: external error: This routine cannot be called while results are pending for a

4139 1 REP-100117 04.12.2008 06:31:47 command that has been sent to the server.>. The SQL submitted is <select OBJECT_KEY, MACHINE, SERVER, TYPE, SERVICE, CONTAINER,

4139 1 REP-100117 04.12.2008 06:31:47 INST_MACHINE, INST_SERVER, INST_SERVER_PORT, INST_REPO, INST_EXEC, INST_EXEC_KEY, RUN_SEQ, START_TIME, END_TIME,

4139 1 REP-100117 04.12.2008 06:31:47 EXECUTION_TIME, STATUS, HAS_ERROR, IGNORE_ERROR, NORM_MACHINE, NORM_SERVER, NORM_TYPE, NORM_SERVICE, NORM_CONTAINER,

4139 1 REP-100117 04.12.2008 06:31:47 NORM_INST_MACHINE, NORM_INST_SERVER, NORM_INST_REPO, NORM_INST_EXEC from AL_HISTORY where OBJECT_KEY = 139>.>.

Former Member
0 Kudos

what is the exact version of 11.7.3 you have, there are few fixes which have gone into 117.3.3, 11.7.3.4 and 11.7.3.5 related to sybase

What is the OS that you are using for job server ? the Hot Fixes are not all platform release so you may not find the fix pack for your platform

From the error message this is a bug, please file a case with support if this issue doesn't gets resolved with latest fix pack

Former Member
0 Kudos

Hi again,

We are running sybase repo on Solaris platform, and with BODI version 11.7.3.6 all the way, jobserver, engine and designer. I can not find any newer versions either on ESD.

regards,

-jem

Edited by: Jørn Erik Mathisen on Dec 5, 2008 10:46 AM

Former Member
0 Kudos

11.7.3.6 is the latest patch, and you are already using that

The error is happening on Repository related queries, this is a bug, some where the results fetched by previous SQL execute are not consumed. during the execution of job, there are few repository tables that are updated with the Statisitcs and Progress of the job, something might be going wrong in that

is it happening for all the jobs ?

to isolate the problem I need more info like what is the database type for source and target tables used in the DF. Do you get any errors before this or this is the first error, try using In Memory for DF, since for pageable cache there will be queries that are excuted in repo. Are you using enable recovery ?

Former Member
0 Kudos

Hi Manoj and thank you for taking your time to help.

All the ETL are gathered in one job. It starts by extracting from an Oracle 8.xx source and load the data into an post area, then we are staging data from post to stage og finally the job loads the data into the edw. Post, Stage and EDW is separate schemas in the same oracle 10G database.

The workflows loading Post and Stage works fine. It stops on an EDW dataflow.

But funny you mentioned statistics...because we got an extended use of validation rules in the EDW loads..collecting both quality statistics and sample data. The customer wants to do this to have an overall measurement for the DWQ using the metadata reports. Maybe this traffic gives us an inconvenient sideeffect..?

I shall take a closer look at the validation.

No use of recovery.

Thanks,

-jem

Edited by: Jørn Erik Mathisen on Dec 5, 2008 9:57 PM

Former Member
0 Kudos

Hi again,

We created an Oracle repo and a new jobserver in this environment. When the job reached the edw part it stopped on a different dataflow with the errorlog:

12894 1 DBS-070301 06.12.2008 05:47:58 |Dataflow EDW_DIM_FORSKRIFT_AKTIV_DF

12894 1 DBS-070301 06.12.2008 05:47:58 Oracle <nsbtekbo> error message for operation <OCIStmtExecute>: <ORA-01480: trailing null missing from STR bind value

12894 1 DBS-070301 06.12.2008 05:47:58 >.

5171 1 RUN-050316 06.12.2008 05:48:01 |Session DW_Job|Workflow EDW_WF|Workflow DIM_3_KJORING|Workflow EDW_DIM_FORSKRIFT_WF

5171 1 RUN-050316 06.12.2008 05:48:01 EDW.DIM_FORSKRIFT

5171 1 RUN-050304 06.12.2008 05:48:01 |Session DW_Job|Workflow EDW_WF|Workflow DIM_3_KJORING|Workflow EDW_DIM_FORSKRIFT_WF

5171 1 RUN-050304 06.12.2008 05:48:01 Function call <raise_exception ( EDW.DIM_FORSKRIFT ) > failed, due to error <50316>: < EDW.DIM_FORSKRIFT>.

We then unchecked the validation statistics for that particular df and the job went through. I remember error <50316> as an "out of memory" error..dataflow level ? And maybe that was the case with sybase also, guess Oracle is more compatible and robust.

But the oracle error we got, could it be something with gathering statistics from an inappropriate portformat, stringformat, in the validationtransform ? Which lead to an out of memory..? Because all the other validationstransforms went well and are using the same repotables-columndef I guess.

thanks,

-jem

Former Member
0 Kudos

In both case with Sybase repo and Oracle repo, the error is coming from the database, most likely data issue not much like a memory issue

In case of Sybase does the DF thats failing has validation transform ? and if you disable to collect data quality stats, does the job run successfully ?

are you collecting sample data also ? did you try disabling only the collect sample data, and enbling collect DQ stats, does the job fail in that case also

if collect DQ stats or collect sample data is enabled, the information is stored in Repositroy table, in case of oracle did it print the SQL that is failing ? table name anything

The problem could be with the Data or colud be with the saving of validation rules name etc to the repository, do you see the validation rules in Management Console for this job ?

are you able to reproduce the issue by runnig only that DF ? if yes then try running the DF with no data from source and enabling print all trace messages if it fails then we can isolate the Query that is failing for repo

if the data from source table is less than run the DF with enabling print all trace messages, this will print each row with data, and you can see if the data has some problem or not

Former Member
0 Kudos

Hi Manoj,

Takes some time to test this out because running df's separately does not give the same results or an authentic setting and the test environment is poorly. We are therefore adjusting-watching the prod job, which run every night on the sybase repo. What we did:

- The first night we disabled both quality stats and sample data for all validation objects in the job and the job completed successfully.

- The second night we activated the quality stats for all validation objects in the job and the job completed successfully.

- Then we took out a df to a new and separat job for a stresstest against the repo, running 50 days with data. With only the quality stats still checked. It failed with this error:

13328 1 RUN-050406 13.12.2008 12:42:45 |Session New_Job2|Workflow EDW_FAKTA_ORDRE_KONTERING_WF|Dataflow EDW_FAKT_ORDRE_KONTERING_DF

13328 1 RUN-050406 13.12.2008 12:42:45 Data flow <EDW_FAKT_ORDRE_KONTERING_DF> received a bad system message. Message text from the child process is <blank message,

13328 1 RUN-050406 13.12.2008 12:42:45 possibly due to insufficient memory>. The process executing data flow <EDW_FAKT_ORDRE_KONTERING_DF> has died abnormally. For

13328 1 RUN-050406 13.12.2008 12:42:45 NT, please check errorlog.txt. For HPUX, please check stack_trace.txt. Please notify Customer Support.

13328 1 RUN-050409 13.12.2008 12:42:46 |Session New_Job2|Workflow EDW_FAKTA_ORDRE_KONTERING_WF

13328 1 RUN-050409 13.12.2008 12:42:46 The job process could not communicate with the data flow <EDW_FAKT_ORDRE_KONTERING_DF> process. For details, see previously

13328 1 RUN-050409 13.12.2008 12:42:46 logged error <50406>.

- We then unchecked the quality stats and the job completed successfully.

- Finally we hold the prod job for 4 days, to give more data/stress on the repo (still with only the quality stats checked) and the job completed successfully.

Tonight we will try to turn on sample data for the stage workflow.

But when the job did fail in the beginning it failed on df's in the edw workflow. Means it run through the stage workflow with both quality stats and sample data. (As mentioned earlier post, stage and edw are different schemas in the data warehouse database, oracle 10G. And the stagejobs read from post and the edw from the stage.)

One difference I can see is that the validations in stage are pretty simple and with "clean" ports, read:nice formats. And we are just checking on incoming missing values from source-post (NULL). In the edw df's the validations are more complex and we often test on lookup returns (if-exists variants). And the format on the return values are not always nice..for instance, Number values in oracle (large integer contents) are converted to decimal (28,7).. We have not changed that. But again, if we run these df's manually - not scheduled, it runs through.

Forgot to mention that the job got 135 df's, and roughly 50 with validations.

I just wonder how robust this feature is..and how does it really works ? Fx how does the good rows impact the traffic against the repo ?..not at all or may huge amount of good data slow down stats counts/updates ? What about "bad formats" ?..is converting portformats an issue, against the repotables I mean ?

Any parameters to adjust in general ?

I guess sybase is not the most robust platform either.. What about DS 3.0 ? Is it more robust ?

I did also adjust the lenght and type of characters in the validation name. So they are pretty simple now. It's not easy to get any detail traces, in prod and with huge amount of data..but I will try..

Regards,

- jem

Former Member
0 Kudos

are you able to reproduce the issue all the time ? since you have lot of DFs I assume not all of them gets executed every time or every dataflow gets executed all the time and process some data?

The Data Quality data and stats for validation transform are stored in tables starting with names AL_QD* the data is stored as strings, but if the problem is happening because of conversion errors Sybase would throw conversion erros not this error

enabling trace for such huge job will not give much info, the problem here is the error message is coming for the next statement thats getting executed not the statement which caused the issue

can you check if something can be done at database logging level, which can give some idea about the statement that got executed before this, check if you have old database logs when this error happened, check if it has something

Let me check if trace can be set only for repo SQL

Former Member
0 Kudos

Hi,

It's not easy to reproduce the issue exactly because it stops on different df's but we located the error to most likely having something to do with validations. And all the df's are running every night with data, big company..

Last night the job failed again, on the stage df's. I agree that the logged df not necessary is the problemchild, because that df was running in parallel with several others df's with validation. The err message was:

18551 19 RUN-050406 17/12/2008 04:51:12 |Session DW_Job|Workflow STAGE_WF|Workflow STAGING_FASE2|Workflow STAGING_FASE2_LP2|Workflow STG_INDIVIDVEDLIKEHOLD_WF|Dataflow STG_INDIVIDVEDLIKEHOLD_Update_KM_DF

18551 19 RUN-050406 17/12/2008 04:51:12 Data flow <STG_INDIVIDVEDLIKEHOLD_Update_KM_DF> received a bad system message. Message text from the child process is <blank

18551 19 RUN-050406 17/12/2008 04:51:12 message, possibly due to insufficient memory>. The process executing data flow <STG_INDIVIDVEDLIKEHOLD_Update_KM_DF> has died

18551 19 RUN-050406 17/12/2008 04:51:12 abnormally. For NT, please check errorlog.txt. For HPUX, please check stack_trace.txt. Please notify Customer Support.

18551 19 RUN-050409 17/12/2008 04:51:13 |Session DW_Job|Workflow STAGE_WF|Workflow STAGING_FASE2|Workflow STAGING_FASE2_LP2|Workflow STG_INDIVIDVEDLIKEHOLD_WF

18551 19 RUN-050409 17/12/2008 04:51:13 The job process could not communicate with the data flow <STG_INDIVIDVEDLIKEHOLD_Update_KM_DF> process. For details, see

18551 19 RUN-050409 17/12/2008 04:51:13 previously logged error <50406>.

18551 19 RUN-050316 17/12/2008 04:55:41 |Session DW_Job|Workflow STAGE_WF|Workflow STAGING_FASE2|Workflow STAGING_FASE2_LP2|Workflow STG_INDIVIDVEDLIKEHOLD_WF

18551 19 RUN-050316 17/12/2008 04:55:41 STG_INDIVIDVEDLIKEHOLD FEILET

18551 19 RUN-050304 17/12/2008 04:55:41 |Session DW_Job|Workflow STAGE_WF|Workflow STAGING_FASE2|Workflow STAGING_FASE2_LP2|Workflow STG_INDIVIDVEDLIKEHOLD_WF

18551 19 RUN-050304 17/12/2008 04:55:41 Function call <raise_exception ( STG_INDIVIDVEDLIKEHOLD FEILET ) > failed, due to error <50316>: <STG_INDIVIDVEDLIKEHOLD

18551 19 RUN-050304 17/12/2008 04:55:41 FEILET>.

We did put the df's serial and it run successfully. So I got the feeling that the validation feature is affected by the amount of data and (separately or in combination with) several df's accessing the same AL_QD* tables, maybe pagelock or timeout..? Maybe it's sybase.

>AL_QD* the data is stored as strings, but if the problem is happening because of conversion errors >Sybase would throw conversion erros not this error

ok

The good news is that we have managed to "isolate" a df today that fail with the same error when it run alone in a dummy job. I will run it tomorrow with the dba next to me, watching the repo databaselog.

Thanks,

- jem

Former Member
0 Kudos

Hi,

It was not easy to get anything out of the huge logs I received from the IT department but we isolated one df that got the same error as the original job when we run it with several days of data. This df is pretty complex, lots of lookups aso and with 2 validation objects. Probably one of the df's that often caused the errors for the daily job. This df executes about 200' rows each day and I did an attempt with one month, 4 mill, and with both stats and sample checked. It went like this:

- I just run it 3 times and it failed after 200' - 400' rows.

- I then adjusted the DOP to 3 and checked 3 big lookups with run as separat processes. I then reached 700' - 1 mill rows before it failed.

- I remember you mentioned something about In mem so I did that next. This time the hole df did run successfully

- I adjusted the original job but it failed near another complex df. This time I just adjusted that df to run in mem. I failed again but when I checked a couple of heavy operations as separat processes and adjusted the DOP the hole job went through.

So this look like the way to do it, but I am not 100% sure why..:) Could it be helping the page cache out and let the os do the job with swapping if necessary aso. ? The pagable cache got 3.2 gb.

I think we will go for an upgrade to 3.1 sp1.

Thanks for your patience !

- jem

Answers (0)