cancel
Showing results for 
Search instead for 
Did you mean: 

Anyone have problems with raw devices when sqm_cache_enable=share (new default)

sladebe
Active Participant
0 Kudos

I had some problems with a new Linux based repserver (version 16.0 SP03 PL05) where it got weird low level OS errors accessing Linux raw devices (for the stable queue).

I think it's related to the new new memory_control=smart and sqm_cache_enable=share default settings (where "smart" means automatically/dynamically tune memory usage within repserver modules, and "share" has something to do with sharing sqm memory when "smart" memory control is turned on)

Turning sqm_cache_enable to "on" seemed to solve the problem.

I think the problem might have been that the repserver was treating raw devices like a buffered file. When the repserver issued a flush-the-buffer command (aka osync or dsync), Linux responds with an I/O error (in rsaiolinux, in my case)

SAP tech support hadn't heard of anyone else having problems like this. Has anyone else seen this problem?

Thanks
Ben

Accepted Solutions (0)

Answers (3)

Answers (3)

ivy_wang3
Explorer

Hello,
Changing 'sqm_cache_enable' to on is just a workaround.
What is the running OS version? As some OS will not support the raw device further, for example: SLES 15.4. I would suppose that is due the AIO does not work well for RAW device on the OS.

Actually, in modern SRS partition usage – a block device with filesystem will be used in most cases, that will be easy to manage as the partition presented as a file and easy to expand.

The default sqm_write_flush mode is ‘on’ which data is written to memory buffer then flushed to the disk. Which is more suitable for SRS which is data movement service as it need write & read disk frequently.

If you want the SQM work as raw device to bypass the OS cache, it can configure the sqm_write_flush as dio.

sqm_write_flush

Specifies whether data written to memory buffers is flushed to the disk before the write operation completes. Values are:

  • on – data written to the memory buffer is flushed to the disk.
  • off – data written to the memory buffer is not flushed to the disk.
  • dio – enables direct I/O and allows Replication Server to read and write to the disk without file system buffering. Available only in Solaris (SPARC) and Linux.

Default: on

sladebe
Active Participant
0 Kudos

Ivy, sorry, I missed your response.

Re: What is the running OS version?

RedHat Linux. Raw devices are still supported

Re: Actually, in modern SRS partition usage – a block device with filesystem

Just to be specific, Linux "block devices" are different from files in filesystems in Linux. My understanding is Linux "block devices" are lower level in the OS and skip a filesystem layer of logic. Linux "block devices" are cached, while Linux raw character oriented devices are not cached.

So it sounds like we should someday migrate repserver partitions from raw devices to either Linux block devices or to files. Thanks

sladebe
Active Participant
0 Kudos

Note, this is still a problem with repserver 16.0 SP04 PL04:

[100] MYREPSERVER_RS.-11:16:52-1> admin config,sqm_cache_enable; -mvert
Configuration: sqm_cache_enable
Config Value:  share
Run Value:     share
Default Value: share
Legal Values:  list: on,off,share
Datatype:      string
Status:        Restart required

[101] TEST1604_1_RS.-11:17:03-1> admin version; Replication Server/16.0/EBF 30655 SP04 PL04 rs160sp04pl04/Linux AMD64/Linux 4.12.14-95.114-default x86_64/2852/OPT64/Mon Feb 13 03:07:02 2023
From the errorlog file:
I. 2023/05/04 08:58:39. Replication Agent for MYSERVER.testdb1 connected in passthru mode. I. 2023/05/04 08:58:39. Setting system upgrade locater for version 1100 to 000000000000000000000000000000000000000000000000000000000000000000000000 for database MYSERVER.testdb1. E. 2023/05/04 08:58:39. ERROR #6120 dAIO( ) - d64/sqm/rsaiolinux.c(1019) SQM detected a failing status from an outstanding AIO (error_code = 22.). E. 2023/05/04 08:58:39. ERROR #6026 SQM(102:1 TEST.testdb1) - /generic/sqm/sqmio.c(1384) Block write failed for queue '102:1', segment 0, block 0. OS dependent error is 'error_code = 22.' I. 2023/05/04 08:58:39. SQM stopping due to an exception: 102:1 TEST.testdb1 E. 2023/05/04 08:58:39. ERROR #14023 REP AGENT(MYSERVER.testdb1) - neric/exec/execint.c(7373) SQM had an error writing to the inbound-queue. I. 2023/05/04 08:59:39. Replication Agent for MYSERVER.testdb1 connected in passthru mode. E. 2023/05/04 08:59:39. ERROR #14023 REP AGENT(MYSERVER.testdb1) - neric/exec/execint.c(7373) SQM had an error writing to the inbound-queue.

Changing sqm_cache_enable from share to "on" fixes the problem: