cancel
Showing results for 
Search instead for 
Did you mean: 

SQL Anywhere 11 databases are getting corrupt

Former Member
0 Kudos

Hi

We are having problems with databases that are getting corrupted.

We have a server that runs a SQL Anywhere 11 database. (the consolidated database)
Beside that, we have about 15 remote databases that run on the laptops of our mobile workforces.

To keep all databases in sync, we have setup a two-way server-to-laptop replication.

For the replication process we use “SQL Remote”.

Each laptop should replicate his data once in about 1 or 2 weeks.

For exchanging the messages we use the message type “file”.

This year we already experienced 6 times that a remote database got corrupt after replication. (after running dbremote.exe).

The log file of the dbremote utility does NOT show any errors, the replication seems to be finished without any problems.
But after this, it’s not possible anymore to connect to the database, because of the following assertion error:

    • [Sybase][ODBC Driver][SQL Anywhere]Internal database error *** ERROR *** Assertion failed: 201501 (11.0.1.2376)
      Page 0x0:0x7182d1 for requested record not a table page -- transaction rolled back
      Error code=-301
      SQL state=HY000

Can someone tell me more about this error?

What goes wrong and why?

Or does someone has some tips on how to prevent this or tips on how I can do some analyzing myself to see what exactly got broken in the database?

Any help is appreciated.

Thanks.

Accepted Solutions (1)

Accepted Solutions (1)

jeff_albion
Employee
Employee
0 Kudos

Hi Krenton,


Page 0x0:0x7182d1 for requested record not a table page -- transaction rolled back

Error code=-301

SQL state=HY000

Can someone tell me more about this error?

Specifically, the page at offset 0x7182d1 in database (So if you have a 4KB page size, this would be byte offset 7439057 x 4096 = 30470377472 into the database) in 0x0 (the main database running) has a page header that indicates that the data on the page is not table data. We don't know what it is, but the signature for table information connected to the table you're scanning is 'not as it should be'.


What goes wrong and why?

In the real world, lots of things can go wrong. I list some of the possible things on this question here on the SQL Anywhere forum. This list can include software bugs, but can also include instances where the media is not respecting write-through or write-ordering - see http://scn.sap.com/docs/DOC-34090 (SQL Anywhere I/O Requirements for Linux and Windows) for more details.

There are also known instances where Intel drivers are not set with the correct write-through settings. The details of this are in the documentation: DocCommentXchange (Improving Robustness on Intel storage drivers).


Technical Support should still provide assistance. If the problem is a newly-discovered bug in SQL Anywhere 11, their advice will be to apply a more recent EBF or upgrade to a new version. However, that is unlikely in this case.

Breck is correct - it would probably be best to take a look at the database file in technical support for more clues as to what's happening here. With the current information, it's hard to say more than "something was not correct at the time the page was looked at by the database server."

Regards,

Jeff Albion

SAP Active Global Support

Former Member
0 Kudos

Hi Jeff,

Thanks for you reply.

After your explanation, I'm starting to get confident that it has something to with the hardware of the laptops.

Like I said in an earlier reply to Breck, we also have 3 remote databases that run on Windows 2008 R2 servers with a permanent connection to the consolidated database and they never had any problem.

Most of our laptops have the database stored on an additional disk drive. The diskdrive is placed in a bay that replaces the cd-rom player of the laptop. The bay has (like the original cd-rom drive) a sata connector. Although I'm not 100% sure if all the databases that got corrupt are stored on such "additional" drive, do you think it can be a reason???

I also will try to contact SAP support again, to see if they can analyze a corrupt database.

Thanks again!

reimer_pods
Participant
0 Kudos

Those drives might be seen by the OS as removable drives and behave differently from internal drives, depending on the configuration ("fast" vs. "safe"). This might lead to database corruption, if the device is shutdown (sleep mode or hibernation) with disk writes cached but not written.

jeff_albion
Employee
Employee
0 Kudos

Hi Krenton,


Most of our laptops have the database stored on an additional disk drive. The diskdrive is placed in a bay that replaces the cd-rom player of the laptop. The bay has (like the original cd-rom drive) a sata connector. Although I'm not 100% sure if all the databases that got corrupt are stored on such "additional" drive, do you think it can be a reason???

I actually have a similar bay in my laptop, and I have to say that from personal experience, the drive has been prone to "wiggle"/"jump" out of its holder and will disconnect the drive unexpectedly (particularly if I'm carrying it around). I need to push the external casing back in and it will re-connect.

----

If you go into Device Manager, Disk Drives, and right-click "Properties" on the drive, what is the Removal Policy (Policies tab) set on this drive? It should be set for "quick removal" (and constantly flush operations to the disk).

Any other settings in this scenario could very much result in database corruption if the drive is unexpectedly "wiggled" out of its SATA socket...

Regards,

Jeff Albion

SAP Active Global Support

Former Member
0 Kudos

Hi Jeff,

Although we don't recognize the problem of the disk bays that wiggle/jump out of its holder, it sounds like a plausible cause of our problem. The more because the disk removal policy was set to the default "Better performance" instead of the "Quick removal" setting.

So my idea was to focus on the hard disk bays, were it not for the fact that (by coincidence) today I received another laptop from one of our field technicians with a corrupt database.

And just this laptop is one of the few that doesn't have an additional hard disk bay, but only 1 big SSD drive with 2 partitions.....

Also just for my understanding,

If a database got corrupt, because of for example the hard disk bay disconnects unexpectedly,
would it got corrupt on a random place or is it likely that a specific page get corrupt?
I would expect that it would be at a random position than...

Maybe I say something stupid now, but to me it looks like the database gets corrupt at (almost) the same position:

4 log files:

Page 0x0:0x8762d1

Page 0x0:0x8742f5

Page 0x0:0x8242d1

Page 0x0:0x7182d1

Can it be that the numbers are slightly different because of a growing database what makes the page number shift up a little bit?

This goes beyond my knowledge, so maybe it makes no sense at all what i'm saying, but it is interesting to learn more about it

Is it also possible to do a check based on the page number (f.e. 0x0:0x8242d1) if you open the database in a hex editor ???

Thanks for everything so far,

jeff_albion
Employee
Employee
0 Kudos

Hi Krenton,


Can it be that the numbers are slightly different because of a growing database what makes the page number shift up a little bit?

Possibly - we would have to actually look at the databases in support to better decide what might be happening.


Is it also possible to do a check based on the page number (f.e. 0x0:0x8242d1) if you open the database in a hex editor ???


Yes - this is essentially what we do here at SAP, but we have slightly more advanced tools internally. I provided the byte calculation above - it will be a combination of multiplying out the offset by the database page size, which will give you the byte offset of where we are looking in the database file.

Regards,

Jeff Albion

SAP Active Global Support

Answers (2)

Answers (2)

Former Member
0 Kudos

As an alternative you could try a commercial tool for SQL AnyWhere database recovery.

Stellar Phoenix Repair for SQL AnyWhere - Repairs and recovers data from corrupt SQL AnyWhere database files. It is compatible with Windows 8, 7, Vista, XP and supports SQL AnyWhere 9, 10, 11 & 12.

It also has got attention of Breck Carter.

SQL Anywhere: Stellar Phoenix Repair for SQL Anywhere

former_member329524
Active Participant
0 Kudos

1. Rebuild (unload/reload) the consolidated DB.

2. Check that there is enough disk space for log and temp files.

Assertion error is usually caused by either hardware problems (corrupted disk or low disk space) or by corrupted tables/indexes.

I suggest you start by rebuilding your DB and by checking the hard drive. If neither of this helps, then it is something more exotic, like circular logic in triggers on replicated tables or unresolved deadlocks.

Former Member
0 Kudos

Hi Arcady,

Thanks, for your answer.

Should I rebuild the consolidated DB, even when it are the remote databases that are getting corrupt?

I also don't think it has something to do with corrupt disks or low disk space. I has more than enough space left and it happened on multiple different remote databases on different laptops...

former_member188493
Contributor
0 Kudos

No, you should not do anything with your consolidated database... it is your remote databases that are having the problem.

Some random thoughts...

- Six occurrences of corrupt databases among 15 databases in one year is an ASTOUNDINGLY high rate of failure.

- It is highly unlikely that dbremote is having anything to do with the problem.

Questions...

Are the failures always the same? same assertion error?

Has there been any changes in the physical hardware? e.g., have solid state drives or flash drives been used to store the remote databases?

former_member329524
Active Participant
0 Kudos

Sorry, Krenton

I assumed it was the consolidated DB, which is getting corrupted, it made no sense otherwise

OK, so, we are in the exotic territory now...

1. Please, see if the problematic DB's were not created (backed up) from the same DB source, which also gets corrupted. You might want to rebuild at least one of the remote DBs and see if it helps.

2. Please, make sure that there are no other operations on the remote DBs at the same time or right after the db remote. For example, you can have a long transaction, together with the backup, which user cannot see. And since this is a laptop, the user can just close it and the computer goes to hibernation mode or shuts down altogether before the transaction is complete and the db gets corrupted.

former_member188493
Contributor
0 Kudos

>the user can just close it and the computer goes to hibernation mode or shuts down altogether before the transaction is complete and the db gets corrupted.

Not true. You can safely pull the plug out of the wall on a running SQL Anywhere database without fear that the database will be corrupted. Shutdown, hibernation, abnormal termination... none of these cause corruption... when the database is restarted, the automatic recovery process takes care of recovering the database to the most recent checkpoint plus subsequent committed transactions.

Database corruption has two causes: hardware problems and software bugs. Sadly, the former (e.g., memory errors) often masquerade as the latter (bad disk errors).

In this particular case, SAP technical support should be contacted.

former_member329524
Active Participant
0 Kudos

>Not true. You can safely pull the plug out of the wall on a running SQL Anywhere database without fear that the database will be corrupted. Shutdown, hibernation, abnormal termination... none of these cause corruption... when the database is restarted, the automatic recovery process takes care of recovering the database to the most recent checkpoint plus subsequent committed transactions.

That may well be true on the newer versions, but I have seen all kinds of similar (even if rare) cases on version 11 in my time, Besides, no matter what you do when programming the DB you cannot possibly prepare it for all kinds of obscene environment behavior...

BTW, Is version 11 even supported by SAP?

former_member188493
Contributor
0 Kudos

Whatever database corruption you might have seen was NOT caused by an abnormal shutdown... there must have been some other cause. The automatic recovery process has been part of SQL Anywhere forever and ever, and it works flawlessly... if it didn't, there would be hundreds of trouble reports every single day because database servers are abnormally terminating all the time.

This is an important point, because saying "automatic recovery does not work" is like saying "primary key integrity does not work"... automatic recovery is one of the main reasons people use databases instead of ordinary files.

Here is an excerpt from the SQL Anywhere 6 Help (it looks exactly the same in version 16):

=====

The automatic recovery process

When a database is shut down during normal operation, the database server carries out a checkpoint so that all the information in the database is held in the database file. This is a clean shutdown.

Each time you start a database, the database server checks whether the last shutdown was clean or the result of a system failure. If the database was not shut down cleanly, it automatically takes the following steps to recover from a system failure:

1 Recover to the most recent checkpoint

All pages are restored to their state at the most recent checkpoint, by copying the checkpoint log pages over the changes made since the checkpoint.

2 Apply changes made since the checkpoint

Changes made between the checkpoint and the system failure, which are held in the transaction log, are applied.

3 Rollback uncommitted transactions

Any uncommitted transactions are rolled back, using the rollback logs.

=====

former_member329524
Active Participant
0 Kudos

Be that as it may, can Krenton call SAP regarding an issue with version 11?

As far as I understand, it hit EOL a long time ago....

former_member188493
Contributor
0 Kudos

EOL means that Engineering will not make any more changes to the software; i.e., there will be no more EBFs or SPs.

Technical Support should still provide assistance. If the problem is a newly-discovered bug in SQL Anywhere 11, their advice will be to apply a more recent EBF or upgrade to a new version. However, that is unlikely in this case.

The challenge lies in getting past first-line SAP support to talk to someone in Waterloo, Ontario.

Former Member
0 Kudos

Hi Breck,

Thanks for explaining the automatic recovery process.

It is good to know that abnormal shutdown should not be the cause.

I also was starting to think that a laptop that goes into sleep or hibernate mode could be the reason, because besides the 15 laptop, we also have 3 remote databases on other servers that have a permanent connection to the consolidated database, and they run dbremote at least once a day. So these 3 remote databases has by far run the most synchronizations, but none of those 3 got ever corrupted.

(perhaps I should have mentioned that already in my original post).

So I should start to think about the hardware of the laptops than.

I will also try to contact SAP support again to see if they can analyze a corrupt database.

btw fyi:
the assertion error is always the same, but the page is different:

Internal database error *** ERROR *** Assertion failed: 201501 (11.0.1.2376)

Page 0x0:0x8762d1 for requested record not a table page -- transaction rolled back (-301)

Internal database error *** ERROR *** Assertion failed: 201501 (11.0.1.2376)

Page 0x0:0x8742f5 for requested record not a table page -- transaction rolled back (-301)

Internal database error *** ERROR *** Assertion failed: 201501 (11.0.1.2376)

Page 0x0:0x8242d1 for requested record not a table page -- transaction rolled back (-301)

Internal database error *** ERROR *** Assertion failed: 201501 (11.0.1.2376)

Page 0x0:0x7182d1 for requested record not a table page -- transaction rolled back (-301)

Thanks Again.