cancel
Showing results for 
Search instead for 
Did you mean: 

Redo reorg in Oracle11g

benoit-schmid
Contributor
0 Kudos

Hello,

After migrating to 11g, we have noticed checkpoint is not complete during intensive batch jobs.

It basically means that Oracle 11g is gummier in redo size than 10g.

Notes 79341 specifies that, when the interval between log switch is less than 1 minute,

we should increase the redo size.

When the interval is higher than one minute, we should increase the redo groups number.

Could you explain me why?

I would be tempted to say that small redo size would force a checkpoint faster.

Therefore, with fast redo switch, it would be better to have a small redo size.

Thanks in advance for your answer.

Accepted Solutions (1)

Accepted Solutions (1)

stefan_koehler
Active Contributor
0 Kudos

Hello Benoît,

do you really hit a "checkpoint is not complete" issue or do you only see some "Private strand flush not complete" message like that:


Tue Dec 06 14:30:18 2011
Thread 1 cannot allocate new log, sequence 61050
Private strand flush not complete
  Current log# 4 seq# 61049 mem# 0: /oracle/<SID>/origlogB/log_g14m1.dbf
  Current log# 4 seq# 61049 mem# 1: /oracle/<SID>/mirrlogB/log_g14m2.dbf
Beginning log switch checkpoint up to RBA [0xee7a.2.10], SCN: 4172998151
Thread 1 advanced to log sequence 61050 (LGWR switch)

If you only hit the "Private strand flush not complete" issue - it is a normal behavior - for details please check metalinknote #372557.1.

Could you explain me why?

Well basically said, if you have a real "checkpoint is not complete" issue - both solutions (increasing log file size or adding more redo log groups) have the same effect.

Your database system has more time to bring your redo log group to status INACTIVE until it will be reused. If it is still ACTIVE and you want to reuse it, it is not possible because of the need of crash recovery.

Checkpoints are a complex beast based on some algorithms like FAST_START_MTTR_TARGET for example.

-> http://docs.oracle.com/cd/E11882_01/server.112/e17110/initparams087.htm#i1127412

Jonathan Lewis replied in detail on some OTN topics:

/thread/609453

Regards

Stefan

volker_borowski2
Active Contributor
0 Kudos

Hi,

a log can be overwritten, when it is INACTIVE and ARCHIVED.

You might receive "archiving needed" waits on already inactive logfiles which are not archived yet.

So may be also increase log_archive_max_processes if you have big logs.

Volker

stefan_koehler
Active Contributor
0 Kudos

Hey Volker,

i guess you are working with some crazy systems to see such situations )

I have never seen that a redo log group was already inactive but not archived. Especially with the load dependent amount of ARCn processes: http://docs.oracle.com/cd/E11882_01/server.112/e25494/archredo003.htm#ADMIN11337

Yes this could happen, but then we also are not talking about a "checkpoint is not complete" issue.

Regards

Stefan

volker_borowski2
Active Contributor
0 Kudos

Hey Volker,

i guess you are working with some crazy systems to see such situations )

Indeed. I already told in another thread a while ago.

The old beast (~30TB) is fading out, and the newborn child is already at 11 TB and pacing at 6 TB a year.

I have 8 groups of 1200MB onlinelogs riding 4 filesystems (A/B/C/D).

I have switches every 12 to 14 seconds during peak processing.

I submitted 8 ARC procs (4 have been default), because they are cheap, and I wasn't in the mood for calculating

And Oracle terminates these when it sees no need for them and restarts them as needed, so no need to be thrifty.

Archiving a log (from ...logA/B/C/D to oraarch) takes between 65 and ~80 seconds (so 20 sec gap in high peaks).

Might be possible to tweak this with some special filesystemcache settings on either SAN or OS,

but since the ARC-gang gets the job done in time, I like to spend the memory elsewhere.

We are about to move to new metall soon, so I guess I'll need to do some new measurement then

Volker

benoit-schmid
Contributor
0 Kudos

Hello Stephan,

> do you really hit a "checkpoint is not complete" issue or do you only see some "Private strand flush not complete" message

It seems that there is no

  1. grep -i "strand flush" /oracle/PRD/saptrace/diag/rdbms/prd/PRD/trace/alert_PRD.log_2011-11-*

  2. grep -i "strand flush" /oracle/PRD/saptrace/diag/rdbms/prd/PRD/trace/alert_PRD.log

  3. grep -i "Checkpoint not complete" /oracle/PRD/saptrace/diag/rdbms/prd/PRD/trace/alert_PRD.log | wc -l

27

> Well basically said, if you have a real "checkpoint is not complete" issue - both solutions (increasing log file size or adding more redo log groups) have the same effect.

1. If it is the case why does the Sap specifies the following?

A. Time between switch log > 1 min -> Increase number of redo groups

B. Time between switch log < 1 min -> Increase size.

2. For me it does not have the same effect as the dbwr is started earlier with smaller redos.

Don't you agree?

Regards,

benoit-schmid
Contributor
0 Kudos

Hello Volker,

Hi,

>

> a log can be overwritten, when it is INACTIVE and ARCHIVED.

> You might receive "archiving needed" waits on already inactive logfiles which are not archived yet.

> So may be also increase log_archive_max_processes if you have big logs.

>

> Volker

What is log in the alert int this case?

I guess that we are not facing this issue:

// grep -i "archiv" /oracle/PRD/saptrace/diag/rdbms/prd/PRD/trace/alert_PRD.log | grep -v "Archived Log entry "

log_archive_dest_1 = "LOCATION=/oracle/PRD/oraarch/PRDarch"

log_archive_format = "%t_%s_%r.dbf"

ARC0: Archival started

ARC1: Archival started

ARC2: Archival started

ARC3: Archival started

Thanks for your answer.

benoit-schmid
Contributor
0 Kudos

Hello Volker,

> I have 8 groups of 1200MB onlinelogs riding 4 filesystems (A/B/C/D).

> I have switches every 12 to 14 seconds during peak processing.

We are living in the same world

I have 8 groups of 50 MB switching between 0s and 8s when there avalaible redos.

It switches at 18s when checkpoint is not completing fast enough because it waits for inactive redo.

The checkpoint takes a little less than one minute.

Regards,

stefan_koehler
Active Contributor
0 Kudos

Hello Benoît,

ok you have a real "Checkpoint not complete", but to be honest you are far far away from the world of Volker. Your redolog group size of 50M is just too small for a SAP system. Increase the redo size to at least 700M or so.

I can only guess why SAP suggests two different things, but i think it is based on much larger redo log groups than 50M. Why?

If you have larger redo log groups (like Volker for example or even bigger) and have a longer log switch sequence (> 1 min) you also need time to archive these data (which is implemented as one to one relationship between ARC(n) and redo log group). So in this case having more redo log groups is the right approach to provide the ARC(n) processes the needed time. If you would increase the redo log group size in this case the archiving would take even longer after a logswitch.

If you have a short log switch sequence (< 1 min) your redo log size is just to small. But in this case increasing the redo log groups could also solve the "Checkpoint not complete" issue because of you also provide the database writer (DBWR) more time to peform a "full checkpoint" for all the corresponding data. The time for the archiver is mostly not an issue here, because of the small amount of data for each redo log group.

2. For me it does not have the same effect as the dbwr is started earlier with smaller redos.

I am not pretty sure if i get your question right, but if you will set FAST_START_MTTR_TARGET to 1 second or so, your DBWR will write much more data between the log switches (no matter how long the log switch sequence is) and it don't need so much time after a log switch to perform a "full checkpoint". So maybe your "Checkpoint not complete" is also gone after setting a tiny crash recovery window (details are explained in the links from above), but this should not be the solution in your case.

Maybe i am missing some crazy cases here too ) ... Volker it is your part 😛

Regards

Stefan

benoit-schmid
Contributor
0 Kudos

Hello,

Hello Benoît,

> ok you have a real "Checkpoint not complete", but to be honest you are far far away from the world of Volker. Your redolog group size of 50M is just too small for a SAP system. Increase the redo size to at least 700M or so.

I guess you meant 70MB instead of 50MB?

1. Oracle Doc specifies that there should be one log switch during normal operation.

Therefore I should not have too big redo group otherwise I may loose several hours of work

for one lost redo.

Do you agree?

2. From what I see the redo naming in SAP for group x is:

/oracle/PRD/origlogB/log_g1xm1.dbf

/oracle/PRD/mirrlogB/log_g1xm2.dbf

If I have more than 10 redo groups can set 2xm1 in the names

or SAP programs may not like it?

Thanks in advance for your answer.

stefan_koehler
Active Contributor
0 Kudos

Hello Benoît ,

I guess you meant 70MB instead of 50MB?

No i mean 700 MB. You have said "I have 8 groups of 50 MB switching between 0s and 8s when there avalaible redos." Calculation: 60 / 8 = 7.5 = 7.5 * 50 MB = 375 MB per minute

As rule of thumb the redo log group switch sequence should be at least between 1 and 5 minutes, so you would need round about 700 MB for each log group in peak.

Therefore I should not have too big redo group otherwise I may loose several hours of work for one lost redo. Do you agree?

No - at first you can control the max loss of online redo data by parameter ARCHIVE_LAG_TARGET ( http://docs.oracle.com/cd/E11882_01/server.112/e17110/initparams009.htm#CHDHFDGI ). But then you also have to transfer this log file immediately to tape or a standby database to get a real benefit of it. But if you have such requirements in real life you mostly have a SYNC standby database and you are not working with the archiver.

The name of the redo log files don't matter.

Regards

Stefan

Former Member
0 Kudos

Hello Stefan, my 5 cent , on ADM506 our lecturer says about one system with 40 redo log group with 500MB size , and that was not enough.

To Benoît Schmi, your 8 groups with 50mb is really small for your system (as you can see from logs)

With regards Sergo Beradze.

benoit-schmid
Contributor
0 Kudos

Hello,

> No - at first you can control the max loss of online redo data by parameter ARCHIVE_LAG_TARGET ( http://docs.oracle.com/cd/E11882_01/server.112/e17110/initparams009.htm#CHDHFDGI

FYI, this feature is deactivated by default in O11g.

Would you know why they do not activate this nice feature by default?

See you,

benoit-schmid
Contributor
0 Kudos

Hello Stefan, my 5 cent , on ADM506 our lecturer says about one system with 40 redo log group with 500MB size , and that was not enough.

> To Benoît Schmi, your 8 groups with 50mb is really small for your system (as you can see from logs)

> With regards Sergo Beradze.

Hello,

Which sizes and numbers do you have in your systems?

What is the time between log switches?

Regards,

Former Member
0 Kudos

40 redo log group with 500MB size

At some point you might consider tuning your disk io and changing your application.

Which sizes and numbers do you have in your systems?

Largest has 16 groups with 1gb each. I often have 8 groups (double the default) with sizes ranging from 100 - 500mb, works very well. Today's system can virtually generate any rate of log amount, 10 years ago our largest systems generated 50m/s. So if you are running on full steam 1gb will be written in seconds, but as the rule of thumb says, try to have 1 switch per minute under load.

In your case i would have added 4 more groups and resized the logs to 100 or 200mb to start with.

Besides that a checkpoint not complete is NO problem unless you have online users on the systems. So moving large batch jobs to night processing is VERY helpful

Cheers Michael

volker_borowski2
Active Contributor
0 Kudos

Sorry I missed this interesting discussion throughout the week, but I was too busy elsewhere.

The basic tuning problem is, that the checkpoint is not related to the logs directly, but only indirectly.

Tuning checkpoints is related to

the db_cache_size (possible amount of work to be done),

the number of database writers (# of guys which do the work)

and the amount of change activity in the cache (real work to be done).

So if you have a big db_cache, in high activity the work that needs to be done is quite high.

Checkpoint:

- get SCN of previous checkpoint

- record current SCN

- schedule a process that scans the entire db_cache to check, if a block has been modified after the SCN of the previous checkpoint, if yes, write the block to the disks

- when complete store recorded SCN as succesfull checkpoint

So if you have a big cache (which is i.e. ~ 80 GB in my big toy) of which 5% are dirty, we talk about the need to scan

80GB of RAM and do DBWR write IO for 4 GB of db-blocks. This will take some time.

The log relation comes in with a switch, because this triggers a checkpoint. So you need to get the checkpoint done

before the logs are hitting their own tail.

If you have high change activity on only a couple of blocks of your db_cache (processing the same objects all the time),

enlarging the logs helps, because the amount of work for the next checkpoint will not really change, but you are triggering

less checkpoints in the same time. So instead of checkpoint#1 AND checkpoint#2 need to write block 4711, you only

do checkpoint#2 and write 4711 as well.

If you have high change activity linear distributed over the db_cache, only enlarging the logs does not help at all.

You are triggering less checkpoints, but these are getting bigger and bigger the longer you wait, because more

and more blocks get dirty. You can counter this with more log groups (buying time) or with the parameter stefan mentioned (doing some cheerleading for the DBWRs to start earlier than the checkpoint is requesting it).

This is tuning the checkpoints, and after this you need to deal with the log-fallout:

- the logs are bigger, you need longer to copy them to the archive destination

- online logs are cio-written, archive logs on oraarch are written fs-buffered.

- online logs live on several filesystems while the archive destination is a single one

- since you have no distributed archive destinations, increasing ARCs only helps on the read side

- you need to do writes but also reads (brarchive) on the archive destination

When summing this all up, you probably end with a setup like mine, which is not perfect at all.

I still get some checkpoints incomplete somtimes, I guess this esp. happens when a high concurrency of different

application types work at the same time, generating real big checkpoints (see above).

But 85% of the time I am happy.

Volker

volker_borowski2
Active Contributor
0 Kudos

Hello Volker,

> What is log in the alert int this case?

> Thanks for your answer.

I can look that up on Tuesday, but I think it was

"... archiving needed ..."

Volker

stefan_koehler
Active Contributor
0 Kudos

Hey Volker,

So if you have a big cache (which is i.e. ~ 80 GB in my big toy) of which 5% are dirty, we talk about the need to scan 80GB of RAM and do DBWR write IO for 4 GB of db-blocks. This will take some time.

....

If you have high change activity on only a couple of blocks of your db_cache (processing the same objects all the time), enlarging the logs helps, because the amount of work for the next checkpoint will not really change, but you are triggering less checkpoints in the same time.

Well this behaviour was in prior Oracle 8i times. Nowadays we have a much smarter logic under the hood. The buffer cache is split into multiple data sets and each data set has a checkpoint queue. These checkpoint queues contain the dirty blocks and are written down by the DBWR(n) processes continuously - called incremental checkpionts.

So no need to scan the whole database cache or something like that and this logic avoids DBWR(n) write peaks at a log switch too )

As i mentioned earlier - Jonathan Lewis described this in some detail on this OTN thread:

/thread/609453 [original link is broken]

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:19311485023372

Regards

Stefan

volker_borowski2
Active Contributor
0 Kudos

Hi Stefan,

thanks for that info.

Actually I was aware of "incremental checkpoints" as a (well now not any more all too new)

"new" feature, but I never looked up how these really work in detail :-). Thanks again.

@Benoit: I looked up the messages as we had some of these again recently (because someone

"downsized" my eight archivers to four again...) This actually wasn't a peak situation, but we had

trouble with some SAN IO in addition, this is why the times are quite bad.


:
:
Tue Nov 29 12:37:38 2011
ALTER SYSTEM SET log_archive_max_processes=8 SCOPE=BOTH;       <<< yelling for help
Tue Nov 29 12:37:39 2011
Beginning log switch checkpoint up to RBA [0x169bb.2.10], SCN: 11863071730
Thread 1 advanced to log sequence 92603 (LGWR switch)
  Current log# 2 seq# 92603 mem# 0: /oracle/PRD/origlogB/log_g12m1.dbf
  Current log# 2 seq# 92603 mem# 1: /oracle/PRD/mirrlogB/log_g12m2.dbf
Tue Nov 29 12:39:51 2011

ORACLE Instance PRD - Can not allocate log, archival required    <<< message to grep for
Thread 1 cannot allocate new log, sequence 92604
All online logs needed archiving                                 <<< message to grep for

  Current log# 2 seq# 92603 mem# 0: /oracle/PRD/origlogB/log_g12m1.dbf
  Current log# 2 seq# 92603 mem# 1: /oracle/PRD/mirrlogB/log_g12m2.dbf
Tue Nov 29 12:40:08 2011
Completed checkpoint up to RBA [0x169b6.2.10], SCN: 11862476358
Tue Nov 29 12:41:09 2011
Archived Log entry 92237 added for thread 1 sequence 92596 ID 0x20a4cb56 dest 1:
ARC2: STARTING ARCH PROCESSES                 <<<<<<<<<< Reinforcements called
Tue Nov 29 12:41:11 2011
ARC4 started with pid=569, OS id=47186752     <<<<<<<<<< here they come 
Tue Nov 29 12:41:11 2011
ARC5 started with pid=628, OS id=12518526     <<<<<<<<<< and the next guy
ARC4: Archival started     <<<<<<<<<< first knight starting to fight
Tue Nov 29 12:41:12 2011
Beginning log switch checkpoint up to RBA [0x169bc.2.10], SCN: 11863191874
Thread 1 advanced to log sequence 92604 (LGWR switch)
  Current log# 1 seq# 92604 mem# 0: /oracle/PRD/origlogA/log_g11m1.dbf
  Current log# 1 seq# 92604 mem# 1: /oracle/PRD/mirrlogA/log_g11m2.dbf
ARC5: Archival started
:
:

Volker

Answers (4)

Answers (4)

former_member204746
Active Contributor
0 Kudos

Ask SAP to do amn EarlyWatch report for you.

They will recommend what to do with size and number of redo logs.

Last time I had an EarlyWatch report, they recommend 8 groups of 800MB each... That was for a 1TB database.

former_member204746
Active Contributor
0 Kudos

increase the size of redologs... also increase the number of redolog groups.

investigate usage of brspace with option moredo

refer to SAP Note 1259767 - Management of online redo log files ...

Bonne chance!

benoit-schmid
Contributor
0 Kudos

increase the size of redologs... also increase the number of redolog groups.

>

> investigate usage of brspace with option moredo

>

> refer to SAP Note 1259767 - Management of online redo log files ...

>

> Bonne chance!

Hello Eric,

My problem is not performing the change as I have already performed it once on my systems.

My question is understanding the best redo size/number combination.

Regards,

sunny_pahuja2
Active Contributor
0 Kudos
benoit-schmid
Contributor
0 Kudos

Hi,

>

> Check below thread, it will clear your doubt:

>

> http://help.sap.com/saphelp_nwpi71/helpdata/en/c4/3a728b505211d189550000e829fbbd/content.htm

>

> Thanks

> Sunny

Hello your post does not clear my doubts as in my case, the problem is not the logwr that writes too slow.

The problem, as stated in the first answer is that the dbwr takes times to write the dirty blocks to disk.

The redo can not be reused until all its dirty blocks have been written.

Regards,

Former Member
0 Kudos

Checkpoint not complete happen when all redo logs are still active. To render them inactive the dbwr process(es) have to write the data blocks changed by these transactions to the data files.

dbwr is triggered by checkpoints (e.g. log switch checkpoints). So if you have frequent log checkpoints (< below 1min) and dbwr still cannot keep up increase the log size, or add more groups otherwise.

many small log files -> early trigger of dbwr

large log files -> plenty of space for transactions, more time for dbwr to write changes to data files

In most cases the combination of both measures is the optimal solution. I just had a case with checkpoint not complete. There were 4 groups with 150mb each, i added two more groups and increase the log size to 300mb. It looks like this improved the situation considerably.

And last but not least dbwr can also be tuned (very platform depending):

- tune random io write performance on data files

- configure dbwr_io_slaves or multiple dbwr processes

Cheers Michael

benoit-schmid
Contributor
0 Kudos

Hello Michael,

> large log files -> plenty of space for transactions, more time for dbwr to write changes to data files

Yes but dbwr will also take more time to write dirty blocks if there are more to write.

Therefore we could think that it takes twice the time to write two times more dirty blocks.

Am I wrong?

Regards,

Former Member
0 Kudos

Yes but dbwr will also take more time to write dirty blocks if there are more to write.

Therefore we could think that it takes twice the time to write two times more dirty blocks.

Am I wrong?

Basically you are right, but i believe that often the same blocks are changed several times and only the latest change has to be written to file. Thus larger logfiles (more changes on the same block within a checkpoint) can mean less writes for dbwr.

But it would be better if you ask someone with more profound oracle knowledge than me...