Issues with Solr LoadBalancing in Hybris 6.3.0.2
According to this: https://help.hybris.com/6.3.0/hcd/8c5f54f386691014abb090e75e1cffb2.html Hybris is using this to do solr load balancing: https://wiki.apache.org/solr/LBHttpSolrServer
https://launchpad.support.sap.com/#/notes/2022983/E also claims that in standalone mode LBHttpSolrServer is used (although that was in 2014)
Hybris is using LBHttpSolrClient to load balance the solr requests from the storefront application nodes
-LBHttpSolrClient does round robin on the solr nodes
-When LBHttpSolrClient sees a bad request from a solr node, it puts that node in a dead pool and stops making requests to it
Once the solr nodes are in a dead pool, the app nodes ping them for aliveness every 5 seconds - we found these pings in the logs, and the format matches what you'd expect from https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBHttpSolrClient.java
-The exact aliveness urls that LBHttpSolrClient is using are invalid and solr is responding with a 404 (they are missing the core name) (Is it seem a bug similar to https://jira.hybris.com/browse/ECP-1688 ?)
-Because aliveness urls are 404, the app nodes never take the solr nodes out of the dead pool.
-Pretty much any invalid solr response can put a solr node into a dead pool.
One way forward is to figure out how to set correct aliveness urls.
It is happening in production environment. we have 1 solr master and solr 3 slaves.
How can this be fixed? ? It is a bug in 6.3.0.2 ?