Skip to Content
1
Former Member
Jun 15, 2017 at 05:22 PM

Issues with Solr LoadBalancing in Hybris 6.3.0.2

504 Views

Issues with Solr LoadBalancing in Hybris 6.3.0.2

According to this: https://help.hybris.com/6.3.0/hcd/8c5f54f386691014abb090e75e1cffb2.html Hybris is using this to do solr load balancing: https://wiki.apache.org/solr/LBHttpSolrServer

https://launchpad.support.sap.com/#/notes/2022983/E also claims that in standalone mode LBHttpSolrServer is used (although that was in 2014)

Hybris is using LBHttpSolrClient to load balance the solr requests from the storefront application nodes

-LBHttpSolrClient does round robin on the solr nodes

-When LBHttpSolrClient sees a bad request from a solr node, it puts that node in a dead pool and stops making requests to it

Once the solr nodes are in a dead pool, the app nodes ping them for aliveness every 5 seconds - we found these pings in the logs, and the format matches what you'd expect from https://github.com/apache/lucene-solr/blob/master/solr/solrj/src/java/org/apache/solr/client/solrj/impl/LBHttpSolrClient.java

-The exact aliveness urls that LBHttpSolrClient is using are invalid and solr is responding with a 404 (they are missing the core name) (Is it seem a bug similar to https://jira.hybris.com/browse/ECP-1688 ?)

-Because aliveness urls are 404, the app nodes never take the solr nodes out of the dead pool.

-Pretty much any invalid solr response can put a solr node into a dead pool.

One way forward is to figure out how to set correct aliveness urls.

It is happening in production environment. we have 1 solr master and solr 3 slaves.

How can this be fixed? ? It is a bug in 6.3.0.2 ?