Skip to Content
0

ORA-01555 snapshot too old, but table not updated at the same time

Dec 18, 2017 at 11:37 AM

904

avatar image

Hi Oracle experts,

Can somebody explain why there's a short dump Oracle error ORA-01555 snapshot too old in the following scenario ?

I'm surprised because I thought that the only possibility of ORA-01555 was that the same table is updated and read at the same time. In the current scenario, this table is not updated.

Scenario : 17 parallel jobs of the same program are running at the same time, each one reading the table ZTABLE1 and inserting into ZTABLE2 (the 2 tables have the same structure). There is no process updating ZTABLE1 at the same time.

This scenario is simplified here, but it really reads one table and inserts data into another table, no other table is accessed.

data: mycursor TYPE cursor,
      itab TYPE TABLE OF ztable1.
Open cursor with hold mycursor on select * from ztable1
   where package = <package dedicated to each job> " 1 to 17
   order by primary key
Do.
  Fetch next mycursor into table ITAB package size 10000.
  if sy-subrc <> 0. EXIT. endif.
  Insert ZTABLE2 from table ITAB.
  call function 'DB_COMMIT'.
Enddo.
close cursor mycursor.

If I decrease the package size to 1000, there's no short dump any more.

I get the following short dump at the FETCH NEXT mycursor.

DBSQL_SQL_ERROR CX_SY_OPEN_SQL_DB SQL error "SQL code: 1555" occurred while accessing table "ZTABLE1".
Database error text: SQL message: ORA-01555: snapshot too old: rollback
segment number 39 with name "_SYSSMU39_3645373441$" too small
Return value of the database layer: "SQL dbsl rc: 99"

SAP system configuration :

ORACLE 11.2.0.3.0
Linux x86_64 PL 390
Kernel 742 Linux GNU SLES-11 x86_64 cc4.3.4 use-pr150925 PL 300

Oracle parameters :

SELECT RETENTION FROM DBA_TABLESPACES WHERE TABLESPACE_NAME = 'PSAPUNDO'
NOGUARANTEE

undo_management AUTO
undo_retention 43200

Thank you!

Sandra

10 |10000 characters needed characters left characters exceeded

Sorry, I can't attach a file with my current browsers. Here is the short dump:

Catégorie              Erreurs d'installation
Err. exécution         DBSQL_SQL_ERROR
Except.                CX_SY_OPEN_SQL_DB
Date et heure          15.12.2017 20:44:25
|Désignation
|    SQL error "SQL code: 1555" occurred while accessing table "ZTTDC_ARTICUL_J2".
|Causes
|    Database error text: "SQL message: ORA-01555: snapshot too old: rollback
|     segment number 39 with name "_SYSSMU39_3645373441$" too small"
|Solutions
|    Note which actions and entries caused the error to occur.
|
|    Consult your SAP administrator.
|
|    Using transaction ST22 for ABAP dump analysis, you can view, manage,
|    and retain termination messages for longer periods.
|Analyse des erreurs
|    An exception has occurred which is explained in more detail below. The
|    exception is assigned to class 'CX_SY_OPEN_SQL_DB' and was not caught in
|     procedure
|    "AGGREGATE_CHILD_TABLES_J" "(METHOD)", nor was it propagated by a RAISING
|     clause.
|    Since the caller of the procedure could not have anticipated this
|    exception, the current program was terminated.
|    The reason for the exception is:
|    Database error text: SQL message: ORA-01555: snapshot too old: rollback
|     segment number 39 with name "_SYSSMU39_3645373441$" too small
|    Return value of the database layer: "SQL dbsl rc: 99"
|Comment corriger l'erreur
|    The exception must either be prevented, caught in procedure
|     "AGGREGATE_CHILD_TABLES_J" "(METHOD)", or
|    the possibility of it occurring must be declared in the RAISING clause
|    in the procedure.
|    To prevent the exception, note the following:
|
|    Analyze the entries in the system log (transaction SM21).
|    If the error occurs in a non-modfied SAP program, you might be able to
|    find a solution in the SAP Notes system. If you have access to the SAP
|    Notes system, check there first using the following keywords:
|
|    "DBSQL_SQL_ERROR" CX_SY_OPEN_SQL_DB
|    "Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL" bzw. Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL
|    "AGGREGATE_CHILD_TABLES_J"
|    If you cannot solve the problem yourself, please send the following
|    information to SAP:
|
|    1. This description of the problem (short dump)
|    To do this, choose  System -> List -> Save -> Local File (unconverted)
|    on the screen you are in now.
|
|    2. A suitable system log
|    To do this, call the system log in transaction SM21. Restrict the time
|    interval to ten minutes before the short dump and five minutes after
|    it. In the display, choose System -> List -> Save -> Local File
|    (unconverted).
|
|    3. If these are programs of your own, or modified SAP programs: Source
|    code of these programs
|    To do this, choose  More Utilities -> Upload/Download -> Download in
|    the Editor.
|
|    4. Details regarding the conditions under which the error occurred or
|    which actions and input caused the error.
|Environnement système
|    SAP Release..... 740
|    SAP Basis level 0007
|
|    Application server... n1slsap120
|    Network address...... 172.33.112.198
|    Operating system... Linux
|    Release.............. 2.6.32-573.12.1.el6.
|    Hardware type....... x86_64
|    Character length..... 16 Bits
|    Pointer length........ 64 Bits
|    Work process number... 73
|    Shortdump setting. full
|
|    Database server... n1susap030
|    Database type..... ORACLE
|    Database name..... EP1
|    Database user ID SAPECC
|
|    Terminal..........
|
|    Character set C
|
|    SAP kernel....... 742
|    Created on....... Oct 30 2015 17:34:47
|    Created at....... Linux GNU SLES-11 x86_64 cc4.3.4 use-pr150925
|    Database version OCI_112, 11.2.0.3.0, V1, default
|    Patch level....... 300
|    Patch text.......
|
|    Database............. ORACLE 11.2.*.*.*, ORACLE 12.*.*.*.*
|    SAP database version. 742
|    Operating system... Linux 2.6, Linux 3, Linux
|
|    Memory consumption
|    Roll.... 0
|    EM...... 0
|    Heap.... 13621824
|    Page.... 24576
|    MM used. 3975280
|    MM free. 6788856
|Utilis. et transaction
|    Client................. 900
|    User.................. FF_FONCT6
|    Language key.......... F
|    Transaction.........
|    Transaction ID...... 5A33312B553269E9E1000000AC2170C6
|
|    EPP whole context ID...... 0050568E446F1ED7B8B6DDDFF2E7EA88
|    EPP connection ID........ 00000000000000000000000000000000
|    EPP call counter......... 0
|
|    Program.............. Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL
|    Screen.............. SAPMSSY0                                1000
|    Screen line.......... 6
|    Active debugger..... "none"
|Infos sur point d'interruption
|    The termination occurred in ABAP program "Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL", in
|     "AGGREGATE_CHILD_TABLES_J". The main program
|    was "Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL".
|
|    In the source code, the termination point is in line 361 of (Include)
|    program "Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL".
|    Program "Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL" was started as a background job.
|
|    Job name....... Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL:10
|    Job initiator.. FF_FONCT6
|    Job number...... 19474300
|    The termination is due to exception "CX_SY_OPEN_SQL_DB" occurring in procedure
|     "AGGREGATE_CHILD_TABLES_J" "(METHOD)".
|    This exception was not handled locally or declared in the RAISING
|    clause in the procedure's signature however.
|    The procedure is in program "Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL           ". The
|     source code begins in line 322 of
|    (include) program "Z_TDC_R_RE_ZTP_MAJ_TABLE_COLL           ".
|Extrait code source (Texte source modifié)
|Ligne|Source
|  331|          lt_articul_j             TYPE TABLE OF zttdc_articul_j,
|  332|          l_tabix                  TYPE i,
|  333|          l_next_group_first_tabix TYPE i.
|  334|    FIELD-SYMBOLS :
|  335|          <ls_articul_j2>           TYPE zttdc_articul_j2.
|  336|
|  337|    REFRESH lt_articul_j2.
|  338|
|  339|    IF p_test = abap_false.
|  340|      OPEN CURSOR WITH HOLD l_cursor FOR
|  341|            SELECT * FROM zttdc_articul_j2
|  342|            WHERE ( ztp_rule > p_frule AND ztp_rule < p_trule )
|  343|              OR ( ztp_rule = p_frule AND ztp_rule < p_trule AND fi_bukrs >= p_fbukrs )
|  344|              OR ( ztp_rule = p_trule AND ztp_rule > p_frule AND fi_bukrs <= p_tbukrs )
|  345|              OR ( ztp_rule = p_frule AND ztp_rule = p_trule AND fi_bukrs BETWEEN p_fbukrs A|
|  346|            ORDER BY PRIMARY KEY.
|  347|    ELSE.
|  348|      OPEN CURSOR WITH HOLD l_cursor FOR
|  349|            SELECT * FROM zttdctstarticuj2
|  350|            WHERE ( ztp_rule > p_frule AND ztp_rule < p_trule )
|  351|              OR ( ztp_rule = p_frule AND ztp_rule < p_trule AND fi_bukrs >= p_fbukrs )
|  352|              OR ( ztp_rule = p_trule AND ztp_rule > p_frule AND fi_bukrs <= p_tbukrs )
|  353|              OR ( ztp_rule = p_frule AND ztp_rule = p_trule AND fi_bukrs BETWEEN p_fbukrs A|
|  354|            ORDER BY PRIMARY KEY.
|  355|    ENDIF.
|  356|
|  357|
|  358|    DO.
|  359|*{   REPLACE        EP1K900321                                        2
|  360|*\      FETCH NEXT CURSOR l_cursor APPENDING TABLE lt_articul_j2 PACKAGE SIZE 200000.
|>>>>>|      FETCH NEXT CURSOR l_cursor APPENDING TABLE lt_articul_j2 PACKAGE SIZE 10000.
|  362|
|  363|*}   REPLACE
|  364|      IF sy-subrc <> 0.
|  365|        EXIT.
|  366|      ENDIF.
|  367|
|  368|      l_next_group_first_tabix = 1.
|  369|      REFRESH lt_articul_j2_group.
|  370|      LOOP AT lt_articul_j2 ASSIGNING <ls_articul_j2>.
|  371|        IF sy-tabix <> 1 AND NOT (
|  372|                  <ls_articul_j2>-ztp_rule = ls_articul_j2_previous-ztp_rule
|  373|              AND <ls_articul_j2>-fi_bukrs = ls_articul_j2_previous-fi_bukrs
|  374|              AND <ls_articul_j2>-fi_belnr = ls_articul_j2_previous-fi_belnr
|  375|              AND <ls_articul_j2>-fi_gjahr = ls_articul_j2_previous-fi_gjahr
|  376|              AND <ls_articul_j2>-fi_buzei = ls_articul_j2_previous-fi_buzei
|  377|              AND <ls_articul_j2>-ref_table = ls_articul_j2_previous-ref_table
|  378|              ).
|  379|          l_next_group_first_tabix = sy-tabix.
|  380|          " Traiter le groupe de lignes précédent (puis le vider).


0
* Please Login or Register to Answer, Follow or Comment.

1 Answer

Best Answer
Mike Pokraka Dec 19, 2017 at 09:12 PM
0

I’ve seen this a few times on a large SAP system. The scenario was in principle similar to yours.

Oracle has a roll area where in progress transaction deltas are stored, the purpose being to provide read consistency and to store changes in a temporary place in order to roll back if needed. On commit the data is transferred to the log and then lastly saved to the main DB. If you throw too much at the DB then it can’t cope. My memory is a bit hazy here, but as I remember it one or both of these acts similar to a ring buffer, Oracle just keeps filling it in sequence, and when it reaches it’s tail then you start to see this sort of error.

So when you decrease your block size, you reduce the amount it needs to hold in it’s rollback buffer and flushes it to the log segment more often. The simple answer is to use a smaller block size or to increase the rollback buffer. The good answer is more involved and can only be done on your system, tuning large parallel jobs is an art form in itself.

Just in case you haven’t already found it, there’s also a nice technical blog about it over here.

Show 6 Share
10 |10000 characters needed characters left characters exceeded

Thank you very much, sorry for the late answer, I was quite busy.

This is all about the committed read. If I understand well, you describe the classic reason of ORA-01555. In my case, I don't understand the reason of the Oracle error on ZTABLE1 table, because it was not updated at the same time. The only reasons I could find in the web is about both an update and a read at the same time (and it's also what I ever thought).

The general algorithm of the program is:

  • ZTABLE1 is truncated.
  • 17 parallel jobs of the same program ZPROG1 are running, to fill up ZTABLE1. There's a COMMIT after every bulk insert. This works well.
  • Wait for the end of the 17 parallel jobs.
  • 17 parallel jobs of the same program ZPROG2 are running, each one reading the table ZTABLE1 (cursor with hold, with given size of package) and bulk inserting into ZTABLE2, each insert followed by a database commit. Here the ORA-01555 occurs for some of the jobs during the FETCH NEXT CURSOR of ZTABLE1.
0

I cannot say for 100% certain as my Oracle days were many moons ago, but I'm pretty sure the same principle applies:

Read consistency means that Oracle will always provide the same data within a transaction, even if it is updated in the meantime. In order to do so it may clone read-only data into the transaction buffer as well, if it is being modified elsewhere.

It is possible that if your update of ZTABLE1 is incomplete at the point ZPROG2 starts, that Oracle decides that ZTABLE1 is dirty and thus gets copied. Or something else could update ZTABLE1 once ZPROG2 has started, which would also force Oracle to buffer it.

So in other words:

Process 1 starts reading table 1
Process 2 modifies table 1 and does a commit.
Process 1 will still continue to receive the results of the query as it was at the start of the query.

In order to do this, even readonly data must be copied into the buffer.

More gory details at https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cncpt/data-concurrency-and-consistency.html

0

I am sure that the update was finished and committed. There was no SELECT during the UPDATE.

I have seen that there were a few bugs in Oracle, but I didn't check whether the corrections were applied. But I'm surprised that there still could be some bugs on so simplistic scenario (maybe not so simplistic at Oracle side).

By the way, I didn't tell that the update was about 300 millions of rows inserted in ZTABLE1. Maybe there was a kind of delaying/buffering at Oracle side.

Anyway, it's difficult now to investigate more, as there was a workaround (smallest SELECT cursor package size), and my client is happy with the solution.

Thank you. I mark your answer as the right one, because it explains the classic ORA-01555 problem.

0

Thanks. I think the detail falls under what I meant by “can only be done on your system”.

I am speculating here, but if you’re running Oracle RAC or a shadow database, it’s possible that Oracle will hold on to a transaction delta in the buffer until synchronisation is complete, even though the ‘local database’ transaction is consistent and committed. But I’m just making a semi-educated guess so don’t take my word for it.

1
(I know that it's not Oracle RAC, but I don't know whether it's a shadow database or not - don't even know what it is - I will ask)
0

Shadow DB = mirrored / hot standby DB. Ideally geographically separate.

1