cancel
Showing results for 
Search instead for 
Did you mean: 

Error starting database / nameserver

Former Member
0 Kudos

Hi gurus,

After unexpected server shutdown HANA is not starting. Getting a little more deeper in the logs I´ve found the error listet bellow, it happens in nameserver startup process. Any hint?

hana001:/usr/sap/HDB/SYS/exe/hdb> ./hdbnameserver

service startup...

accepting requests at 127.0.0.1:30201; 127.0.0.2:30201

searching for master nameserver hana001:30201 ...

assign as master nameserver. assign to volume 1 started

checking for recovery request ...

loading topology ...

opening persistence ...

assign failed with persistence startup error. exception  1: no.3020046  (DataAccess/PageAccess/impl/PageImpl.cpp:399)

    Wrong savepoint version: Expected 8373 but found 8361.

exception throw location:

1: 0x00007fb55f11bbbc in PageAccess::Page::verifyHeader(PageAccess::SizeClass, DataAccess::SavepointVersion const&, bool) const+0x3d8 at PageImpl.cpp:399 (libhdbdataaccess.so)

2: 0x00007fb55ee80da7 in DataAccess::SavepointImpl::loadRestartPage(PageAccess::PageIO&, unsigned long, bool, bool)+0x693 at Page.hpp:277 (libhdbdataaccess.so)

3: 0x00007fb55ee0205f in DataAccess::PersistenceManagerImpl::prepareOpen(unsigned long, bool, bool)+0x4b at PersistenceManagerImpl.cpp:5385 (libhdbdataaccess.so)

4: 0x00007fb55ee0223c in DataAccess::PersistenceManager::open(ltt::refcounted_handle<DataAccess::PersistenceConfiguration> const&, bool)+0x88 at PersistenceManagerImpl.cpp:2461 (libhdbdataaccess.so)

5: 0x00007fb56c510771 in PersistenceLayer::PersistenceSystem::initialize(NameServer::ServiceStartInfo const&, bool, PersistenceLayer::PERSISTENCE_MODE)+0x4a0 at PersistenceSystem.cpp:392 (libhdbpersistence.so)

6: 0x00007fb56c5437bc in PersistenceLayer::PersistenceFactory::initPersistence(PersistenceLayer::PERSISTENCE_MODE, ltt::releasable_handle<DataAccess::LoggerFactory>&, DataAccess::TransactionCallback*, NameServer::ServiceStartInfo&, ltt::refcounted_handle<TransactionManager::TransactionControlBlockFactory>&, bool)+0x158 at PersistenceFactory.cpp:394 (libhdbpersistence.so)

7: 0x00007fb56c43375a in PersistenceController::startup(PersistenceLayer::PERSISTENCE_MODE, NameServer::ServiceStartInfo*, bool, DataAccess::TablePreloadWriteCallback*, DataAccess::TablePreloadReadCallback*, Backup::RecoverCbc_Federation*)+0x1256 at PersistenceController.cpp:559 (libhdblogger.so)

8: 0x00007fb5751eedd5 in NameServer::Topology::initPersistence(NameServer::ServiceStartInfo&, bool, bool, TREX_ERROR::TRexError*, bool, ltt_adp::basic_string<char, ltt::char_traits<char>, ltt::integral_constant<bool, true> >, NameServer::ServiceStartInfo::RequestAction)+0x421 at Topology.cpp:285 (libhdbns.so)

9: 0x00007fb5752fb0c4 in NameServer::TREXNameServer::loadTopology(NameServer::LoadTopologyMode, NameServer::ServiceStartInfo&, Backup::Backup_ExtendedRecoveryInformation*)+0x5b0 at TREXNameServer.cpp:18396 (libhdbns.so)

10: 0x00007fb575305c23 in NameServer::TREXNameServer::assign(NameServer::ServiceStartInfo&)+0x2790 at TREXNameServer.cpp:2123 (libhdbns.so)

11: 0x00007fb5762d4f17 in TRexAPI::TREXIndexServer::assign(NameServer::ServiceStartInfo&, bool, TREX_ERROR::TRexError&)+0x93 at TREXIndexServer.cpp:992 (hdbnameserver)

12: 0x00007fb57630c129 in TRexAPI::AssignThread::run(void*)+0x35 at TREXIndexServer.cpp:543 (hdbnameserver)

13: 0x00007fb56cd03db5 in TrexThreads::PoolThread::run()+0x831 at PoolThread.cpp:389 (libhdbbasement.so)

14: 0x00007fb56cd056f0 in TrexThreads::PoolThread::run(void*&)+0x10 at PoolThread.cpp:165 (libhdbbasement.so)

15: 0x00007fb55471a9f0 in Execution::Thread::staticMainImp(void**)+0x700 at Thread.cpp:461 (libhdbbasis.so)

16: 0x00007fb55471bfc8 in Execution::Thread::staticMain(void*)+0x34 at ThreadMain.cpp:26 (libhdbbasis.so)

stopping service...

persistence initialization failed -> stopping instance ...

cannot send signal. (2, No such file or directory)

prepare for shutting service down...

stop ClockMonitor thread...

stop MasterTokenLockWriter /usr/sap/HDB/SYS/global//hdb/nameserver.lck thread...

setInactive(nameserver@hana001:30201)

hana001:/usr/sap/HDB/SYS/exe/hdb>

Also bellow HDB start log:

hana001:/usr/sap/HDB/HDB02> ./HDB start

StartService

Impromptu CCC initialization by 'rscpCInit'.

  See SAP note 1266393.

OK

OK

Starting instance using: /usr/sap/HDB/SYS/exe/hdb/sapcontrol -prot NI_HTTP -nr 02 -function StartWait 2700 2

08.06.2016 03:44:24

Start

OK

08.06.2016 03:44:46

StartWait

FAIL: process hdbdaemon HDB Daemon not running

Final part from Daemon trace:


[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545057 i Daemon       TrexDaemon.cpp(03513) : stdout = /usr/sap/HDB/HDB02/hana001/trace/xsuaaserver.out
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545062 i Daemon       TrexDaemon.cpp(03513) : stderr =
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545066 i Daemon       TrexDaemon.cpp(03513) : maxstdfiles = 2
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545071 i Daemon       TrexDaemon.cpp(03513) : runlevel = 4
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545075 i Daemon       TrexDaemon.cpp(03513) : flags =
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545080 i Daemon       TrexDaemon.cpp(03513) : window =
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545084 i Daemon       TrexDaemon.cpp(03513) : isolation = 0
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545088 i Daemon       TrexDaemon.cpp(03513) : userId = 4294967295
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545093 i Daemon       TrexDaemon.cpp(03513) : groupId = 4294967295
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545104 i Daemon       Daemon.cpp(00666) : runlevel 0 completely started
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545115 i Daemon       Program.cpp(00173) : line up of program group mdcdispatcher to instances <none>.
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545126 i Daemon       Program.cpp(00173) : line up of program group nameserver to instances 0.
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.545460 i Daemon       TrexDaemon.cpp(02078) : starting program at '/usr/sap/HDB/HDB02/exe/hdbnameserver' with args ''
[17502]{-1}[-1/-1] 2016-06-08 03:44:28.569590 i Daemon       RunningInstance.cpp(00191) : start 'hdbnameserver' as process 17518
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.313937 i Daemon       SignalsUNIX.cpp(00647) : signo 3 SIGQUIT from user errno 0 code 0
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314034 i Daemon       SignalsUNIX.cpp(00647) : sender pid 17518 real user id 1000 executable '/hana/shared/HDB/exe/linuxx86_64/HDB_1.00.112.02.1459851171_2840326/hdbnameserver'
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314167 i Daemon       TrexDaemon.cpp(03558) : got shutdown event (stop)
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314228 i Daemon       Daemon.cpp(00752) : comment file contains: nameserver: persistence initialization failed
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.314836 i Daemon       RunningInstance.cpp(00214) : stop process hdbnameserver with pid 17518
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.315143 i Daemon       TrexDaemon.cpp(02598) : stopped child with pid 17518 (17518)
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.401872 i Daemon       TrexDaemon.cpp(03778) : process hdbnameserver with pid 17518 exited normally with status 1
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.401939 i Daemon       TrexDaemon.cpp(03850) : all instances in runlevel 1 stopped
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.401952 i Daemon       Program.cpp(00173) : line up of program group dummyserver to instances <none>.
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.402492 i Daemon       ProgramStarter.cpp(00229) : writing started programs file /usr/sap/HDB/HDB02/hana001/lock/started_programs.txt:
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.528876 i Daemon       TrexDaemon.cpp(01616) : cleaning all
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.528931 i Daemon       TrexDaemon.cpp(02681) : found 8 segments in use with maximum id 8
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529017 i Daemon       TrexDaemon.cpp(02705) : segment 4, key 0x9f08cf81, id 3506180, sequence 107, owner 1000/1001, size 117, atime 2016-06-08 03:44:32, dtime 2016-06-08 03:44:35, ctime 2016-06-08 03:18:33, created by pid 17077, last changed by pid 17077, nattach 0
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529056 i Daemon       TrexDaemon.cpp(02705) : segment 5, key 0x00000000, id 3538949, sequence 108, owner 1000/1001, size 1060911, atime 2016-06-08 03:44:35, dtime 2016-06-08 03:44:35, ctime 2016-06-08 03:18:33, created by pid 17077, last changed by pid 17077, nattach 1 (deleted)
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529069 i Daemon       TrexDaemon.cpp(02705) : segment 6, key 0x00000000, id 3571718, sequence 109, owner 1000/113, size 1024, atime 2016-06-08 03:44:26, dtime 2016-06-08 03:44:26, ctime 2016-06-08 03:44:25, created by pid 17494, last changed by pid 17494, nattach 1
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529157 i Daemon       Main.cpp(01530) : exit hdbdaemon
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.529171 i Daemon       Main.cpp(01541) : Success: /sapmnt/ld7272/a/HDB/jenkins_prod/workspace/FA_CO_LIN64GCC48HAPPY_rel_fa~newdb100_rel/sys/src/TrexDaemon/Main.cpp:1541 in "int main(int, char**)". Leaving the "main"
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.530121 i Network      NetworkListener.cpp(00805) : closing listen socket 7 bound to 127.0.0.1
[17502]{-1}[-1/-1] 2016-06-08 03:44:35.530283 i Network      NetworkListener.cpp(00805) : closing listen socket 8 bound to 127.0.0.2

Accepted Solutions (1)

Accepted Solutions (1)

n_vorstermans2
Explorer
0 Kudos

Hi,

Quickly looking at the log I experiences a similar issue on an IBM blade before, where after a HANA crash it couldn't find the right save point

Please have a look at this note: 2203059 - HANA persistence corruption because of GPFS rapid repair feature


Solution

  • Upgrade to at least GPFS 4.1.1-1 (GPFS 4.1.1-2 or later recommended). Please see SAP Note 1880960 for current recommended minimum GPFS versions
  • Restore HANA DB to the last possible point in time if you are already in the situation with the error.

I updated the GPFS which resolved the issue. As I cannot comment on what hardware you are running this, you would either need to check the GPFS or worst case do a system restore to the last point in time or full backup.

Regards,

Nick

Former Member
0 Kudos

Hi Nick,

Thanks for your reply. I´ve already saw the note, my hardware isn´t from IBM. This is happening in a sandbox server, I´ve tried to restore de database using HANA Studio, unfortunately during the restore process the nameserver is restarted/started and since this issue is happening when starting it I cannot proceed, hdbsql is not an option since the database is offline.

How am I supossed to restore the database in this situation?

n_vorstermans2
Explorer
0 Kudos

Are you trying to recover a Point-in-Time? If so are you trying the last snapshot? You could attempt one snapshot before if possible.

In case that doesn't work either:

On OS level could you do under <SID>adm a HDB stop and then check HDB info if anything is still running for the nameserver? Also check whether the lock files are released. If the lock files are not released, please delete them manually.

In case this still doesn't work, you can also backup and delete the nameserver.ini to ensure it uses the system default found under /sapmnt/<SID>/global/hdb/custom/config

Worst case you could restore your latest full backup.

Regards,

Nick

Former Member
0 Kudos

Yes, Poin-in-Time. No snapshot is displayed for selection on screen, I supossed this was due to nameserver error. Isn´t it?

I also tried to restore using full backup, unfortunately after starting the restore process the nameserver is restarted/restarted automatically and since it doesn´t start restore doesn´t start.

Bellow HDB info after HDB stop:

USER   PID  PPID %CPUVSZ   RSS COMMAND

hdbadm   25824 25823  0.0  13884  2804 -sh

hdbadm   25965 25824 17.3  12900  1720  \_ /bin/sh /usr/sap/HDB/HDB02/HDB info

hdbadm   25990 25965  0.0   4944   616  \_ ps fx -U hdbadm -o user,pid,ppid,pcpu,vsz,rss,args
hdbadm3155 1  0.0 171056 22000 /usr/sap/HDB/HDB02/exe/sapstartsrv pf=/usr/sap/HDB/SYS/profile/HDB_HDB02_poahnehp -D -u hdbadm

Where can I find the mentioned lock files?

Thanks!

Former Member
0 Kudos

Bellow the behavior when trying database restore from full backup:

Same savepoint error (Wrong savepoint version: Expected 8373 but found 8361.) in nameserver trace.

Answers (2)

Answers (2)

former_member183326
Active Contributor
0 Kudos

Do you know if the datavolume was being exchanged  by an older datavolume? You should see something similair to

[15067]{-1}[-1/-1] 2016-05-19 13:58:22.353111 i Logger           SavepointImpl.cpp(03893) : AnchorPage with SPV=37047 loaded.  Persistence created at <DATA -- TIME>.


I would also check the var/log/messages for any suspicious activity.


What is your file system, is it XFS? If so there is a known issue with this which is detailed in SAP Note 2246163

n_vorstermans2
Explorer
0 Kudos

Have you first checked if any PID's are still running? Looking at the screenshots I get the idea your system still has processes running in the background (especially on the last message). As this is not Production or is it live, you can run "HDB kill" on the OS to make sure everything is down before trying again.

The lock files was based on your earlier screenshot where it had an ID mismatch. They are generally found in /sapmnt/<SID>/HDB<Instance#>/<host>/lock

Former Member
0 Kudos

Nick,

I´ve moved /hana/shared/HDB/HDB02/backup/log to somewhere else, after this full backup restore worked ok. The environment is back. For my current understanding is weird, but ok, gonna move forward and check it later.

Thanks for your help!