cancel
Showing results for 
Search instead for 
Did you mean: 

Backup performance: AIX 6.1 Oracle 11.2 filesystemio_options

Former Member
0 Kudos

Dear colleagues,

after switching the filesystemio_options from ASYNCH to SETALL the backup performance decreased by 20%.

In the past we had /oracle/<SID> and /oracle/<SID>/sapdata in one filesystem and thought this might prevent Oracle from the benefits of O_CIOR.

So we redesigned the filesystem and put /oracle/<SID>/sapata in an extra filessytem. Unfortunately the backup performance is still decreased when using SETALL.

Could it be related to the fact that we don't use local disks but SAN-disks?

It is quite annoying that we don't get the best performance when following the SAP, Oracle and IBM recommendations.

Best regards, Henning

--

Additional information:

- Mount-option cio is not set as recommended.

- We don't suffer functional problems. "Only" the performance is decreased.

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Another additional information:

We are backing up to TSM via backint. No direct attached tapes or disks for backup.

Former Member
0 Kudos

It is TDP for Oracle.

No RMAN.

We had made the changes on Friday the 14th. The average had significantly increased as you can see in the attached backup overview.

Since the changes were done locally I assume the backup infrastructure is not the bottle neck.

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

> It is TDP for Oracle. No RMAN.

This combination is not possible.

> Since the changes were done locally I assume the backup infrastructure is not the bottle neck.

I never mentioned this - i meant the different phases of a RMAN backup with "(reading, transferring or sending the data)".

>The average had significantly increased as you can see in the attached backup overview.

I don't see any improvement in your attached text file. The backup is even slower (e.g. compared 15.06. and 08.06).

Once again you first need to find the bottleneck. Guesses and looking into a crystal ball makes no sense at all here.

Just to mention two common mistakes or issues (without any indication):

  • Setting BUFFSIZE too low, if native flat file backup with TDP for mySAP is used (depends on the CIO stuff as well)
  • As you have mentioned you have moved the data files to a new file system. Have you crosscheck the file fragmentation (AIX fileplace)?

Regards

Stefan    

Former Member
0 Kudos

Hi Stefan,

you are right about our client. We are using TDP for SAP via backint.

About decrease/increase I was a bit confusing:

We changed from ASYNCH to SETALL because it was recommended in the SAP-notes. The time increased so the performance got worse.

Rolling back the change would require a restart and we would have to plan a maintenance. Since the system is generally running an emergency restart is not applicable.

Thank you for the hint concerning filesystem fragmentation. I just scheduled a reorg for tonight.

My next question might sound pretty basic to you:

Where should I set the BUFFSIZE? On SAP-, Oracle-, AIX- or TSM-level?

Regards, Henning

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

> Thank you for the hint concerning filesystem fragmentation. I just scheduled a reorg for tonight.

A reorg of what? I am talking about JFS(2) file system fragmentation (as mentioned tool fileplace), not database table or index fragmentation. However a table or index reorg would not change anything as you are using "old fashion flat file backup".

> Where should I set the BUFFSIZE? On SAP-, Oracle-, AIX- or TSM-level?

The parameter BUFFSIZE is related to TDP for mySAP. It can increase the performance / throughput in several concurrent I/O cases.

Official Documentation: http://publib.boulder.ibm.com/infocenter/tivihelp/v1r1/index.jsp?topic=%2Fcom.ibm.itsmerp.doc%2Ffbro...

BUFFSIZE n|131072


    This parameter specifies the block size (in bytes) for the buffers used for disk I/O. The size of the buffers passed to the Tivoli Storage Manager API functions is the value of BUFFSIZE increased by approximately 20 Bytes.

    The valid range is from 4096 (4 KB) to 32 MB. Inappropriate values will be adjusted automatically.

    If BUFFCOPY is set to PREVENT the value of BUFFSIZE must not exceed 896 KB.

    If not specified, the default value is 131072 (128 KB) for UNIX or Linux systems and 32768 (32 KB) for Windows systems.

From my experience with very large infrastructures - you can scale up to round about 768 kb, after that the performance improvement is insignificant, but this depends on the environment as always.

But once again - find the bottleneck and its root cause first. You can adjust 1 million parameters and achieve nothing.

Regards

Stefan

Former Member
0 Kudos

> We changed from ASYNCH to SETALL because it was recommended in the SAP-notes. The time increased so the performance got worse.

With SETALL you bypass filesystem cache therefore performance of backup got worse.

Using Administration Assistant you can view performance data to determine bottlenecks during backup/restore. If only change you performed is SETALL option more likely it is disk i/o related. If it's true (first perform monitoring to be sure) try to tune disk i/o. See official TSM documentation for TDP for SAP.

Former Member
0 Kudos

> A reorg of what?

A reorg of the vg. According to our Unix-team that is the usual way to defragment AIX filesystems.

> BUFFSIZE is related to TDP for mySAP.

The current value is 131072 (which is the default value.)

I will double it an monitor the backups in the next days.

Unfortunately my next response here will take at least two days.

Former Member
0 Kudos

Since we had changed two systems on the same day, I could test both options independently.

Filesystem defragmentation significantly improved the performance almost to the pre-change level.

The question here is if the defragmentation benefits are independent from the filesystemio_options parameter. To verify I would have to restart the system with ASYNCH instead of SETALL which I will do some time in the future.

Doubling the BUFFSIZE didn't show measurable improvements. One backup was a little slower the second a little faster. But both significantly slower than before the filesystemio_options-change.

I will update this discussion in a week or so.

Former Member
0 Kudos

Hi Roman,

since the backup performance improved after FS-defragmentation you were probably pointing in the right direction. It surprised me because topas doesn't show obvious bottle necks on disk I/O.

Why does SAP only recommend SETALL if it has disadvantages on SAN-storage? Or are there signifant advantages in other areas that should outperform the disadvantages in backup performance.

The "Administration Assistant" looks interesting. I will get deeper into it.

Best regards, Henning

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

> The question here is if the defragmentation benefits are independent from the filesystemio_options parameter.

We can not make any statement to this as we don't know what exactly was done by your Unix team. However you would know, if you would check the file fragmentation with "fileplace" (as suggest a few times).

> Doubling the BUFFSIZE didn't show measurable improvements

Not a real surprise. It was just one out of 1 million possibilities / parameters as previously mentioned.

> It surprised me because topas doesn't show obvious bottle necks on disk I/O.

That's reasonable, if the problem is related to file (system) fragmentation. You can not get that huge disk throughput (to create a bottleneck) with "block hopping".

> Why does SAP only recommend SETALL if it has disadvantages on SAN-storage?

It has no disadvantages on SAN storage (or local attached storage) - it just by passes the file system cache of the OS and avoids i-node locking (exclusive). It is faster to access the database "outside of Oracle" (like old fashion flat file backup), if you have enough available memory and the file system cache was able to "double cache" the data file blocks (think about the database buffer cache as well). However nowadays there is *NO* reason in professional environments to use that kind of backup technique anymore.

> Or are there signifant advantages in other areas ...

Yes there are. Check this IBM case study:

http://www-03.ibm.com/systems/resources/systems_p_os_aix_whitepapers_db_perf_aix.pdf

Regards

Stefan

Former Member
0 Kudos

Why does SAP only recommend SETALL if it has disadvantages on SAN-storage? Or are there signifant advantages in other areas that should outperform the disadvantages in backup performance.

gave an excellent answer on your question. FILESYSTEM_OPTIONS=SETALL allows oracle to use CIO capabilities of JFS2 file system in addition to async i/o:

- async i/o;

- bypass file system cache (Oracle already use its buffer cache for caching therefore file system cache is only overhead in memory consumption and unneccessary CPU cycles for its maintain);

- avoiding i-node locking (multiple threads may simultaneously perform reads and writes on a shared file).

Disadvantages: use of CIO disables any file system mechanisms that depend on the use of file system cache, e.g. sequential read-ahead (applications that perform a significant amount of sequential read I/O may experience degraded perfromance).



Former Member
0 Kudos

Hi Stefan,

I will follow the filesystem and fragmentation topic. That was definitely a good hint.

About BUFFSIZE I didn't want to annoy you. I just wanted to let later readers know what for me had worked and what hadn't.

Maybe you can clarify one final thing that irritates me.

According to the IBM case study, Direct I/O is most beneficial when files are read in large increments. The examples are like strong disadvantage for 1 byte, medium disadvantage for 4 KB and big advantage for 10 MB increments.

I would expect that a backup reads the files in large portions so that it should benefit from Direct I/O.

What is wrong in my thought?

Thank you for your strong statement for RMAN vs backint. In upcoming discussions in my company I have a good argument for changing the our infrastructure.

Kind regards,

Henning

P.S.: Please allow me one comment to the case study. It refers to AIX 5.2 and Oracle 9i. According to SAP-note 948294 there is a new option O_CIOR different to O_CIO. It is used if Oracle 11.2.0.2 detect AIX 6.1.

So the case study explains the different techniques very clearly but not all conclusions must be valid in our environment.

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

> I would expect that a backup reads the files in large portions so that it should benefit from Direct I/O. What is wrong in my thought?

At first you are using concurrent I/O and not direct I/O, if you set filesystemio_options to "SETALL", but both work the same way with the file system cache. However the fastest way to access a file system page is to do no physical I/O at all. You generate "physical I/O" requests at any time by using direct or concurrent I/O, because of none of the file system pages are cached. You use the file system cache (by async I/O) and so maybe no I/O request for the disks is generated at all if the page is already cached. So it is pretty fast, if you have a large file system cache (usually any memory that is not needed at all is assigned to it) and that's the reason.

> Thank you for your strong statement for RMAN vs backint. In upcoming discussions in my company I have a good argument for changing the our infrastructure.

You can also use RMAN with backint. You just need to adjust a few parameters in init<SID>.sap and everything works fine. Not all RMAN features are available by BR*Tools / backint, but it is much better as the old fashion stuff.

> So the case study explains the different techniques very clearly but not all conclusions must be valid in our environment.

No not really, because of the O_CIOR matters in a different way (mainly for other programs than Oracle).

Note that the access to the Oracle datafiles by all other programs which do not support CIO directly will probably be significantly slowed down. This is because caching, read ahead etc. is no longer possible.

AIX 6.1 and AIX 7.1 combined with Oracle >= 11.2.0.2 introduced a new open flag O_CIOR which is same as O_CIO, but this allows subsequent open calls without CIO. The advantage of this enhancement is that other applications like cp, dd, cpio can access database files in read only mode without having to open them with CIO. Starting with Oracle 11.2.0.2 when AIX 6.1 or AIX 7.1 is detected, Oracle will use O_CIOR option to open a file on JFS2. Therefore you should no longer mount the filesystems with mount option –o cio.

Source: "Oracle Architecture and Tuning on AIX"

This behavior can be easily simulated by mounting a various file system with the "cio" option and try to copy a file with "cp" from it. You will notice a tremendous performance break-down by using "cio" as the third party tool does not "support" it. Before this enhancement you also got issues (SAPnote #948294) with external tools, if you have not mounted the file system with cio option and the files were explicitly opened by Oracle with cio.

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

sorry for challenging your patience but there is again something I don't understand.

We are using AIX 6.1 and Oracle 11.2.0.2. We haven't mounted the filesystem with cio-option.

According to your explanation we shouldn't experience a performance break-down because this new O_CIOR allows to combine the benefits of performant access of Oracle using CIO and other programs can access the same files without performance break-down.

Why do we experience a decreased performance when changing from ASYNCH to SETALL?

Probably I have a Gordian knot in my brain.

Best regards, Henning

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

you are mixing up several things.

> According to your explanation we shouldn't experience a performance break-down because this new O_CIOR allows to combine the benefits of performant access of Oracle using CIO and other programs can access the same files without performance break-down.

Yes, you do not experience a performance break-down due to cio issues like the read ahead or an I/O error. But you will notice a performance break-down based on file system cache usage.

> Why do we experience a decreased performance when changing from ASYNCH to SETALL?

Because of the file pages (like pages for data files) are not cached by the file system cache anymore, if the database accesses the data files with cio. So the file pages need to be read from disk, if you access the data files with third party tools (like backint with old fashion backup). If you set the filesystemio_options parameter to ASYNC, the file pages are placed in the OS file system cache (while accessing them by usual Oracle operations) and so you can access them faster by backup as they are already there. Otherwise you need to read the pages from disk (with SETALL option).

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

I finally got it. Thank you for your patience.

Best regards,

Henning

Former Member
0 Kudos

Dear Stefan,

one last final question. If I got it right the backup performance should improve if backint would access the files without using filesystem cache. (At least it would be worth a try.)

For tape- and disk-copy there are parameters tape_copy_cmd and disk_copy_cmd.

Is there a parameter to make backint bypass the AIX-filesystem cache?

Best regards, Henning

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

just think about your argumentation once again.

> If I got it right the backup performance should improve if backint would access the files without using filesystem cache.

How should this improve the performance? What should cause the performance improvement by passing the file system cache in such cases? ... the file system pages need to be read from disk anyway (and this is the slow-down part)

If the backup performance is so important for your company - get a consultant (on-site) to analyze the bottleneck and provide a proper solution (and get the rid of the old backup procedure).

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

you linked to the IBM case study. On page four it says that direct I/O is about three times faster than cached I/O if data is read in large increments (table on top of page five). One reason is that if using a filesystem cache the data first is read from disk to cache and than from cache to the target ("double-copying"). The second reason is about read-ahead and page size which I don't really understand at first glance.

Since backint reads files of several GB I would expect (I don't know for sure) that the files are read in large chunks. That's why I would expect direct I/O to improve the backup performance.

The backup performance of our systems is not yet critical. My initial point was that SAP recommends settings which decrease the backup performance measurable. The recommendation was not presented as "maybe" with possible exceptions. You probably know that if one creates a message in the SAP-Net often the first reaction is to follow the recommendations. When being out of that range, there is no support.

Best regards, Henning

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

> When being out of that range, there is no support.

Even if you are in this range - you are lost from time to time

> On page four it says that direct I/O is about three times faster than cached I/O if data is read in large increments (table on top of page five)

Ok now i got it. But your previous conclusion, that "backup performance should improve if backint would access the files without using filesystem cache" has nothing to do with the show case (in the study) as your setup by running an Oracle database is different. The case study is running this test with an "empty" file system cache and contiguous streaming I/O, but this is usually not the case, if you run an Oracle database with ASYNC / SETALL. So the questions in your environment are:

  • What is the file system cache configuration?
  • What is the available file system cache for Oracle data files?
  • Is it possible to read that large chunks in "one piece" (no hopping due to jfs2 fragmentation - tool "filepace")?

Afaik (but i am not exactly sure) you can specify that chunk size for backint with the previous mentioned BUFFSIZE parameter (crosscheck this with truss). Be aware of that you are using no cio (O_CIOR) for backint access in your current setup (so you are using a variant of the cache examples from the case study right now).

Regards

Stefan

Former Member
0 Kudos

Hi Stefan,

I had tried the TSM parameter memoryefficientbackup=yes which is originally intended to avoid load on the filesystem cache from backups. Up to know I could only try one backup but it didn't show any performance improvement. For better statistics I would have to wait for some more.

"fileplace" shows more then 300 fragments for some files. What can I do about it? "defragfs" didn't change anything on the filesystem.

Kind regards, Henning

------------------------------

Some cache settings:

                       maxfree = 1088

                       minfree = 960

                      maxperm% = 90

                      minperm% = 3

                    maxclient% = 90

                strict_maxperm = 0

               lru_file_repage = 0

              strict_maxclient = 1

             lru_poll_interval = 10

             page_steal_method = 1

maxuproc        16384                                Maximum number of PROCESSES allowed per user        True

ncargs          1024                                 ARG/ENV list size in 4K byte blocks                 True

maxpgahead = 8

maxpout         8193                                 HIGH water mark for pending write I/Os per file     True

minpout         4096                                 LOW water mark for pending write I/Os per file      True

                ipqmaxlen = 100

                  rfc1323 = 0

                   sb_max = 1048576

            tcp_recvspace = 16384

            tcp_sendspace = 16384

            udp_recvspace = 42080

            udp_sendspace = 9216

Former Member
0 Kudos

Hi all,

to complete the information. The parameter memoryefficientbackup=yes didn't show any measurable performance improvement on two systems. Therefore I disabled it again.

Best regards, Henning

Answers (1)

Answers (1)

stefan_koehler
Active Contributor
0 Kudos

Hi Henning,

> It is quite annoying that we don't get the best performance when following the SAP, Oracle and IBM recommendations.

All of these "best practices" are a good rule of thumb, but nothing more. Luckily every client environment is different and that's how i earn my money. Showing my clients how it really works under the hood and how to improve it

> So we redesigned the filesystem and put /oracle/<SID>/sapata in an extra filessytem.

That's was the first right step.

> Unfortunately the backup performance is still decreased when using SETALL. Could it be related to the fact that we don't use local disks but SAN-disks?

Well at first you need to know, that setting filesystemio_options to "SETALL" does implicit CIO (for the Oracle processes) even if the file system is not mounted with "cio" option. Check my blog for more details. SAN disks usually scale pretty well (even better as local ones), but it depends on the used storage device (like SVC, DS or XIV) if they are setup right (on OS and behind).

The second question is how do you perform backup? Do you use RMAN or "old fashion flat file"? What is the bottleneck (reading, transferring or sending the data)? Do you use a third party product like TDP for mySAP or TDP for Oracle?

For example - there should be no CIO problem when using using Oracle 11.2.0.2 (or higher) on AIX 6.1 wit TDP for mySAP (and "old fashion backup"), otherwise you maybe need to adjust the third party buffers to get a pretty good throughput.

As always .. too less information and  it depends

Regards

Stefan