Skip to Content
0

BO enterprise is down and auto recovering

Dec 31, 2017 at 03:05 PM

65

avatar image

Hi All,

We are facing an issue where our BO launch ,CMC ,Open Doc links are not working .They just like loading no errors shown

This happens exactly between 12;20 AM to 1:05 AM.After 1 AM it is auto recovered.This is happening frequently.When we checked during the issue,Tomcat and SIA services are running fine but we noticed request are serving and piled up in tomcat manager

We came to know At 12:00 AM deduplication process is executed by Infra team on BO FRS. This last till 12:15 AM and then at 12:20 AM replication of FRS starts between data centers. Usually deduplication runs once and replication runs every 2 hours.

we are not 100% Sure this is the root cause.How to track back and find reading from FRS is taking too long and find the RCA.

Kindly Help!

10 |10000 characters needed characters left characters exceeded
* Please Login or Register to Answer, Follow or Comment.

3 Answers

Best Answer
Denis Konovalov
Jan 03 at 12:59 PM
2

Platform search that runs during this time is part of the problem, but not the root cause of it.
The root cause is most likely the inadequate sizing of your environment in the are of disk and network I/O.
The FRS replication process overloads the I/O making everything else slow to respond.
Time for your IT/Network teams to analyze system performance and address it.

The way to determine if that's the root cause - as Leslie mentioned - move the deduplication process time and see if problem moves with it.

Share
10 |10000 characters needed characters left characters exceeded
Joe Peters Jan 02 at 01:45 PM
2

Sounds like a pretty safe bet that's the cause of the issue. The replication job is probably causing increased latency on the share, which is then causing the IFRS/OFRS to time out.

A "good" replication job would only copy new or modified files, which should be a small percentage of the total FRS. Is the job copying all files every time?

Show 1 Share
10 |10000 characters needed characters left characters exceeded

Thank you ! i discovered platform search is running during this time.This could be a main reason

0
Leslie Mui
Jan 02 at 10:27 PM
2

Well, the easy way to determine if it's the cause is to have infra change the timing of the jobs a bit, delay it 20 minutes and see if you see a corresponding shift in your downtime.

Fixing it would be to determine if your backup procedure has any way of lowering impact on files while it's being run, either by not locking files, or slower replication in exchange for increased performance/accessibility while the process is run.

Show 1 Share
10 |10000 characters needed characters left characters exceeded
Rajendran Devaraj (just now)

Thank you ! i discovered platform search is running during this time.This could be a main reason

0