Skip to Content

BO enterprise is down and auto recovering

Hi All,

We are facing an issue where our BO launch ,CMC ,Open Doc links are not working .They just like loading no errors shown

This happens exactly between 12;20 AM to 1:05 AM.After 1 AM it is auto recovered.This is happening frequently.When we checked during the issue,Tomcat and SIA services are running fine but we noticed request are serving and piled up in tomcat manager

We came to know At 12:00 AM deduplication process is executed by Infra team on BO FRS. This last till 12:15 AM and then at 12:20 AM replication of FRS starts between data centers. Usually deduplication runs once and replication runs every 2 hours.

we are not 100% Sure this is the root cause.How to track back and find reading from FRS is taking too long and find the RCA.

Kindly Help!

Add comment
10|10000 characters needed characters exceeded

  • Get RSS Feed

3 Answers

  • Best Answer
    Jan 03, 2018 at 12:59 PM

    Platform search that runs during this time is part of the problem, but not the root cause of it.
    The root cause is most likely the inadequate sizing of your environment in the are of disk and network I/O.
    The FRS replication process overloads the I/O making everything else slow to respond.
    Time for your IT/Network teams to analyze system performance and address it.

    The way to determine if that's the root cause - as Leslie mentioned - move the deduplication process time and see if problem moves with it.

    Add comment
    10|10000 characters needed characters exceeded

  • Jan 02, 2018 at 01:45 PM

    Sounds like a pretty safe bet that's the cause of the issue. The replication job is probably causing increased latency on the share, which is then causing the IFRS/OFRS to time out.

    A "good" replication job would only copy new or modified files, which should be a small percentage of the total FRS. Is the job copying all files every time?

    Add comment
    10|10000 characters needed characters exceeded

  • Jan 02, 2018 at 10:27 PM

    Well, the easy way to determine if it's the cause is to have infra change the timing of the jobs a bit, delay it 20 minutes and see if you see a corresponding shift in your downtime.

    Fixing it would be to determine if your backup procedure has any way of lowering impact on files while it's being run, either by not locking files, or slower replication in exchange for increased performance/accessibility while the process is run.

    Add comment
    10|10000 characters needed characters exceeded