Skip to Content

Checkpoint not complete

Dear all,

After four Oracle 10 upgrades, the only remaining warning I have on db13 in all databases I upgraded is about "checkpoint not complete" . This only happens when there is a lot of activity in the database . It never happened before though, even when there was a lot of activity in the database.

I found note #79341 and #1068186 and I have set DISABLESELFTUNE_CHECKPOINTING

to FALSE, but this didn't particularly help, still get the warnings

My questions are:

The log buffer size is increased by the upgrade to 14M (instead of 1M that is used to be). I have 2 groups of redo log files, with 2 files each. Each file is 20M on size.

I was thinking of increasing them to 30M each. Do you think that this is a good thing to do?

Under /oracle/SID/saptrace/background, I see in the log writer trace the messages below. Do you know why I get these messages and are they relevant to the issue I have with the checkpoint?

  • SERVICE NAME:() 2007-08-09 09:56:38.059

  • SESSION ID:(316.1) 2007-08-09 09:56:38.059

Maximum redo generation record size = 197120 bytes

Maximum redo generation change vector size = 190204 bytes

tkcrrsarc: (WARN) Failed to find ARCH for message (message:0x10)

tkcrrpa: (WARN) Failed initial attempt to send ARCH message (message:0x10)

  • 2007-08-09 12:55:29.062

LGWR: Archivelog for thread 1 sequence 3662 will NOT be compressed

  • 2007-08-09 14:56:22.902

LGWR: Archivelog for thread 1 sequence 3663 will NOT be compressed

  • 2007-08-09 15:09:29.136

LGWR: Archivelog for thread 1 sequence 3664 will NOT be compressed

  • 2007-08-09 18:25:38.287

Many thanks

Andreas

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

12 Answers

  • Best Answer
    avatar image
    Former Member
    Sep 06, 2007 at 10:21 AM

    Support team,

    There are many different kind of checkpoints. The checkpoint that is issued with a logswitch is named "log switch checkpoint" in the alert log. If you issue manually an "alter system checkpoint" command, then you will see in the alert log that a "global checkpoint" is issued, which is executed immediately and turns the status of all ACTIVE redo logs to INACTIVE. These INACTIVE redo logs are no longer needed for recovery and can be dropped.

    Now Oracle 10 does not trigger a full (global) checkpoint on logswitch (I think Oracle 9 used to do that after 2 consecutive log switches). Oracle 10 issues a global checkpoint when it tries to switch to an already ACTIVE redo log. The log switch checkpoint that is triggered by a log switch appears to have low priority and the database writer doesn't pick it up immediately like it does for the global checkpoint. I have seen checkpoints like this taking 3-4 minutes to complete while the data to be saved is only 50M so I can't believe that it is because of slow db writer.

    Andreas

    Add comment
    10|10000 characters needed characters exceeded

  • Aug 29, 2007 at 10:06 AM

    Hello all, thank you for your replies so far, I awarded some points

    I have had 9 systems upgraded so far. I am playing with different combinations of redo size, redo groups numbers and DISABLESELFTUNE_CHECKPOINTING. Here are my findings

    Changing the DISABLESELFTUNE_CHECKPOINTING parameter hasn't made much difference anywhere. Whether set to true, false or reset (not sure what the default value is then as I can not see it in db03 or using show parameter from sqlplus), I have loads of checkpoint not completed messages in my busy systems

    I had 4 redo log groups initially consisting of 1 redo log each of 20 M

    Initially, I increased each redo log files size to 50M, the warnings were reduced (from 8-10 per day to 2-3). I then added 2 more groups of 1 redo log each (50M again). I now have none or 1 warning per day there

    Vinod, if you can copy and paste me the extract from the database upgrade guide that is talking about the size of the redologs I will be grateful. I tried to look for it both in the SAP Oracle 10 upgrade guide and the Oracle guide from Metalink but I couldn't find it

    Eric, or anybody else that understands the term "the database is fluctuating" from note 1068186, could you please explain it here?

    Many thanks

    Andreas

    Add comment
    10|10000 characters needed characters exceeded

    • if your system had big loads in certain periods, I would advise using 6 or 8 regolog groups of 200MB each. If this is not enough, come back to this topic.

      20MB was really not enough.

      50 MB is OK for small SAP systems.

      200MB is usually enough for most SAP systems.

  • Aug 22, 2007 at 09:13 AM

    Dear all,

    Thank you for your replies so far. In the meanwhile, I increased the size of my redologs from 20M to 50M. The checkpoint warnings still exist but they are significally reduced. I plan to put two more redo log groups (I only have two at the moment) and see what happens

    Anyone knows why this behaviour in Oracle 10? The systems I am working on haven't had any increase in the user load and never had checkpoint warnings. I found notes 79341 and 1068186 but it is not clear to me what exactly happens for Oracle not to issue checkpoints as before. As I said, I have this issue directly after the upgrade to Oracle 10 and without any additional load to the system.

    Many thanks

    Andreas

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      <b>Anyone knows why this behaviour in Oracle 10? The systems I am working on haven't had any increase in the user load and never had checkpoint warnings. I found notes 79341 and 1068186 but it is not clear to me what exactly happens for Oracle not to issue checkpoints as before. As I said, I have this issue directly after the upgrade to Oracle 10 and without any additional load to the system.</b>

      We had exactly the same behaviour like you have when going from 9i to 10g. I have made the following observation:

      - 9i: dbwr immediately starts to work after the first log switch checkpoint

      - 10g: dbwr is kind of lazy and is not working a lot before a SECOND logswitch is pending

      You can easily see what is going on, with querying v$log:

      [code]SQL> select * from v$log

      GROUP# THREAD# SEQUENCE# BYTES MEMBERS ARC STATUS FIRST_CHANGE# FIRST_TIM

      -


      -


      -


      -


      -


      --- -


      -


      -


      15 1 54069 1048576000 1 YES INACTIVE 1.8710E+12 27-AUG-07

      16 1 54070 1048576000 1 YES ACTIVE 1.8710E+12 27-AUG-07

      17 1 54071 1048576000 1 NO CURRENT 1.8710E+12 27-AUG-07[/code]

      In this example there is one checkpoint - of group 16 - pending, therefore this redo is still active. When there is no inactive group, and the current is full, a checkpoint not complete occurs.

      I am not very convinced that the disableselftune_checkpointing will help much in your case (but i could be wrong, so be free to try). dbwr must write all changed blocks covered in the group before the log group can be reused. If under load the log gets full, there is no way to prevent a log switch checkpoint (there are of course checkpoint events other than log switches).

      Regards Michael

  • avatar image
    Former Member
    Aug 10, 2007 at 01:49 PM

    Hi

    OSS Note 79341 suggestions are the best.

    Regards

    Ganesh

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Aug 10, 2007 at 02:59 PM

    SAP Note 830576 has this for buffer size: LOG_BUFFER 1048576

    I had the same problem with checkpoints and reduced the log_buffer size to 1048576 and the issue went away.

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Aug 13, 2007 at 07:53 PM

    Hi,

    you should set DISABLESELFTUNE_CHECKPOINTING

    to TRUE not to FALSE as stated in OSS note 1068186 to reduce the number of checkpoints.

    You should also consider increasing the number of DB writers (parameter db_writer_processes)

    Thanks

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Aug 14, 2007 at 12:48 PM

    Dear Team,

    A Checkpoint is a database event which synchronizes the modified data blocks in memory with the datafiles on disk.

    Checkpoint not complete indicates that Oracle wants to reuse a redo log file, but

    the current checkpoint position is still in that log. In this case, Oracle must

    wait until the checkpoint position passes that log

    this situation may be encountered if DBWR writes too slowly, or if a log switch happens before the log is completely full, or if log file sizes are too small.

    I would even recommend sizing them a few hundred Mbytes, you can also consider adding an extra group because the checkpoint of the log about to be overwritten is not yet complete.

    Vinod

    Add comment
    10|10000 characters needed characters exceeded

  • Sep 03, 2007 at 02:41 PM

    Thank you for your answers, I awarded some more points

    I think I have my problem solved, 4 groups of 100M each has done the trick. I noticed a strange behaviour while resizing the redo logs, which might be the cause of the problem.

    I understand that a log switch is always triggering a checkpoint (and the opposite is not true). When I was dropping redo log groups, I got two different kind of errors. One was the the redo log group to be dropped can not be dropped because it is current. This makes sense and I easily issued an "alter system switch logfile" statement.

    SQL> ALTER DATABASE DROP LOGFILE GROUP 13;

    ALTER DATABASE DROP LOGFILE GROUP 13

    *

    ERROR at line 1:

    ORA-01623: log 13 is current log for instance BWT (thread 1) - cannot drop

    ORA-00312: online log 13 thread 1: '/oracle/BWT/origlogA/log_g13m1.dbf'

    ORA-00312: online log 13 thread 1: '/oracle/BWT/mirrlogA/log_g13m2.dbf'

    SQL> alter system switch logfile;

    System altered.

    Then, when I tried to drop the same redo log group (that was not current any more, as I had just switched to the next one) I got the message

    SQL> ALTER DATABASE DROP LOGFILE GROUP 13;

    ALTER DATABASE DROP LOGFILE GROUP 13

    *

    ERROR at line 1:

    ORA-01624: log 13 needed for crash recovery of instance BWT (thread 1)

    ORA-00312: online log 13 thread 1: '/oracle/BWT/origlogA/log_g13m1.dbf'

    ORA-00312: online log 13 thread 1: '/oracle/BWT/mirrlogA/log_g13m2.dbf'

    So despite the switch, a checkpoint has not been issued yet. When I executed,

    alter system checkpoint;

    I could then drop log froup 13

    I had similar issues with other log groups . What I do not understand is why I had to issue a manual checkpoint, I believed that the log switch should have issued a checkpoint itself.

    Many thanks

    Andreas

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      <b>ORA-01624: log 13 needed for crash recovery of instance BWT (thread 1)</b>

      This means that this group is still active and its logswitch checkpoint is not done yet.

      Again a brief overview:

      - every change generates redo (written to the current redo group) and changed blocks in the buffer cache (memory)

      - everytime the redo log gets full a logswitch checkpoint is triggered, indicating DBWR to write all changed blocks to the data files, a new log group becomes current

      - to loggroup stays active until the checkpoint completes (meaning all changed blocks covered in the redo are now written to disk. If the instance crashes before the checkpoint completed, this very log group would be needed for the recovery. Therefore you are not allowed to drop either a current, or an active log group.

      So a checkpoint is mainly done by DBWR, the CKPT process only updates the file headers when a checkpoint completes.

      Regards Michael

  • avatar image
    Former Member
    Aug 10, 2007 at 02:14 PM

    Hi Andreas

    You can ignore the LGWR not compressed messages. If needed i can post the corresponding oracle metalink doc here, it is 284618.1. I don't hope you are on Oracle 10.1.0.X

    Checkpoint not complete mostly happens, when the DBWR cannot keep up writing dirty blocks from the buffer cache to the database files. Every logswitch (redo log full, switch to next redo group) triggers a log checkpoint. Here is a general recommondation, please apply careful as this is depending on your system:

    - if possible go to four redo log groups

    - 20mb per redo log is tiny. if possible extend to 50mb.

    I have seen systems with more than 20mb redo generated per second, they needed redo log sizes around 1gb. Redo log switches should occur once per minute under load. If log_checkpoints_to_alert is set to true you can see log switches in the alert log.

    >> Beginning log switch checkpoint up to RBA [0x13d2.2.10], SCN: 44944154

    With 4 groups and 50mb, there should be less pressure on the dbwr. Please do not hesitate to come back if you need further clarification.

    Regards

    Michael

    Add comment
    10|10000 characters needed characters exceeded

  • avatar image
    Former Member
    Sep 06, 2007 at 05:46 AM

    Hi All,

    Just to side track abit. Can i said that when "checkpoint not complete" occured, it acutally mean that oracle is not able to read the data due to checkpoint is logging the process. "Checkpoint not complete" is actually not an error targeting the checkpoint issue. Or i can rephase the error as "read error due to checkpoint is in progress" Kindly correct me.

    Regard

    Lauran

    Add comment
    10|10000 characters needed characters exceeded

    • Former Member

      Lauran,

      It means that the redo log file can not be used because it's contents haven't been saved on the datafiles yet. So, it is rather "write error due to checkpoint in progress"

      Andreas