Skip to Content
author's profile photo Former Member
Former Member

JGroups UDP Multicast: Members doesn't join cluster

Hi everybody!

We're trying to get a JGroups UDP Multicast cluster running here, but cannot get the nodes to join it. We've tried the Hybris "classic" udp Multicast solution, which worked like a charm.

When trying to use JGroups, we get these error messages:

 JGRP000032: hybrisnode-1: no physical address for 0e5064c6-0481-b51d-e89c-4d5b58d3be2e, dropping message
 JGRP000011: hybrisnode-1: Nachricht 3157621 von Nicht-Mitglied 0e5064c6-0481-b51d-e89c-4d5b58d3be2e wurde verworfen (View=MergeView::[hybrisnode-1|1] (2) [hybrisnode-1, hybrisnode-2], 2 subgroups: [hybrisnode-2|0] (1) [hybrisnode-2], [hybrisnode-1|0] (1) [hybrisnode-1]) (5 identische Nachrichten erhalten von 0e5064c6-0481-b51d-e89c-4d5b58d3be2e in den letzten 99075 ms)

(the last one being the german version of "Discarded a received message from a non-member")

When monitoring the multicast-messages we see, that all messages are received by all cluster nodes, but the Ping-request doesn't get any response instead its own.

When using the Hybris classic method, this works without any problems.

Any idea what could cause this?

Kind regards

Dennis

Add a comment
10|10000 characters needed characters exceeded

Assigned Tags

Related questions

3 Answers

  • Best Answer
    author's profile photo Former Member
    Former Member
    Posted on Apr 28, 2015 at 02:03 PM

    I'd like to answer my own question. The main problem really was the firewall, as some of you guys suggested (although I'm SURE, I disabled it at some point during the tests).

    However, let me clarify:

    JGroups doesn't only use the UDP Multicast group and port you can configure in local.properties. JGroups uses some other ports, which you have to configure and open up in the firewall. For this, copy the jgroups-udp-configuration file to the hybris config-directory (recommended), customize it to your needs and use the local.properties-parameter cluster.broadcast.method.jgroups.configuration to point to that file (use "${HYBRIS_CONFIG_DIR}/" in the value to point to your configuration directory).

    • For the PING-protocol, the ping requests are sent using the defined UDP multicast group and port. The ANSWERS, however, are sent using a unicast UDP packet from each cluster member back to the requesting member. This seems to differ from the "classic" hybris cluster. (That's why it was working) So, you need to open that port as well. Per default, this port is dynamically chosen and you need to configure a fixed port using the attribute "bind_port" in the UDP-tag.

    • Additionally, the FD_SOCK-protocol uses a bunch (50 per default) of dynamic TCP-ports for its client and server side, that you need to open in the firewall as well. These port ranges can be configured with the attributes "start_port" and "client_port" for the start port numbers and "port_range" for the range of both ports in the FD_SOCK-tag.

    After opening these three ports/port-ranges, the cluster works.

    Thanks everybody for the suggestions and hints.

    Here's an example configuration file:

     <!--
       Default stack using IP multicasting. It is similar to the "udp"
       stack in stacks.xml, but doesn't use streaming state transfer and flushing
       author: Bela Ban
     -->
     
     <config xmlns="urn:org:jgroups"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.1.xsd">
         <UDP
              mcast_port="${hybris.jgroups.mcast_port}"
              bind_port="45588"
              tos="8"
              ucast_recv_buf_size="20M"
              ucast_send_buf_size="640K"
              mcast_recv_buf_size="25M"
              mcast_send_buf_size="640K"
              loopback="true"
              max_bundle_size="64K"
              max_bundle_timeout="30"
              ip_ttl="${jgroups.udp.ip_ttl:8}"
              enable_bundling="true"
              enable_diagnostics="true"
              thread_naming_pattern="cl"
     
              timer_type="old"
              timer.min_threads="4"
              timer.max_threads="10"
              timer.keep_alive_time="3000"
              timer.queue_max_size="500"
     
              thread_pool.enabled="true"
              thread_pool.min_threads="2"
              thread_pool.max_threads="8"
              thread_pool.keep_alive_time="5000"
              thread_pool.queue_enabled="true"
              thread_pool.queue_max_size="10000"
              thread_pool.rejection_policy="discard"
     
              oob_thread_pool.enabled="true"
              oob_thread_pool.min_threads="1"
              oob_thread_pool.max_threads="8"
              oob_thread_pool.keep_alive_time="5000"
              oob_thread_pool.queue_enabled="false"
              oob_thread_pool.queue_max_size="100"
              oob_thread_pool.rejection_policy="Run"/>
     
         <PING timeout="2000"
                 num_initial_members="20"/>
         <MERGE2 max_interval="30000"
                 min_interval="10000"/>
         <FD_SOCK start_port="67600" client_port="67700" port_range="50"/>
         <FD_ALL/>
         <VERIFY_SUSPECT timeout="1500"  />
         <BARRIER />
         <pbcast.NAKACK2 xmit_interval="1000"
                         xmit_table_num_rows="100"
                         xmit_table_msgs_per_row="2000"
                         xmit_table_max_compaction_time="30000"
                         max_msg_batch_size="500"
                         use_mcast_xmit="false"
                         discard_delivered_msgs="true"/>
         <UNICAST  xmit_interval="2000"
                   xmit_table_num_rows="100"
                   xmit_table_msgs_per_row="2000"
                   xmit_table_max_compaction_time="60000"
                   conn_expiry_timeout="60000"
                   max_msg_batch_size="500"/>
         <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                        max_bytes="4M"/>
         <pbcast.GMS print_local_addr="true" join_timeout="3000"
                     view_bundling="true"/>
         <UFC max_credits="2M"
              min_threshold="0.4"/>
         <MFC max_credits="2M"
              min_threshold="0.4"/>
         <FRAG2 frag_size="60K"  />
         <RSVP resend_interval="2000" timeout="10000"/>
         <pbcast.STATE_TRANSFER />
         <!-- pbcast.FLUSH  /-->
     </config>
     
    
    Add a comment
    10|10000 characters needed characters exceeded

  • author's profile photo Former Member
    Former Member
    Posted on Apr 16, 2015 at 05:03 PM

    It seems like JGroups is not properly binding to the MCAST address:port.

    I found some notes made on the wiki for hybris 5.1 that may help you verify the configured multicast address and port:

    [the port] defaults to 7600 by jgroups, but set by project.properties as cluster.broadcast.method.jgroups.udp.mcast_port=45588 .. [the address] defaults to 228.8.8.8 [configurable by] editing the jgroups-udp.xml and adding mcast_addr

    Once you know the address and port JGroups should be using, on each node you can verify that it is listening using the "netstat -an" command.

    You may find your nodes are only binding IPv6. Recent versions of Java and Linux will default to IPv6 where possible. You should too, and in general all addresses that need to be configured should be resolvable by name via /etc/hosts or DNS or otherwise. This decouples application configuration from network numbering, and allows migration from IPv4 to IPv6 or vice-versa by changing hostname address mappings.

    Ignoring this, you may want to try adding some command line options to the Tanuki wrapper configuration to force java to prefer IPv4.

     -Djava.net.preferIPv4Stack=true \
     -Djava.net.preferIPv6Addresses=false \
     
    

    You may also want to try "setenforce permissive" if you're running on a recent version of Linux which has SELinux set to enforcing by default. That can prevent processes from being able to bind network ports.

    Add a comment
    10|10000 characters needed characters exceeded

    • Former Member Cristian Popa

      We've even disabled our firewall for testing purposes - without success. Like I said, I even SEE the mcast-"ping" requests on all other hosts, but they're not responding.

      When using the old "Hybris"-UDP-Multiclaster-Solution, everything works fine: I see the ping requests and all hosts respond well.

      About the Hypervisors: We have two nodes on one hypervisor, where the problem exists, too.

      So, I don't think, it is host- or network-releated.

      About the binding: I've tried adding "bind_addr" to a custom xml-configuration under the "UDP"-tag, but that didn't work. Was that right?

  • Posted on Apr 28, 2015 at 01:19 PM

    there are 2 things that could remotely be possible: 1) when you disabled the firewall, have you restarted hybris. I had that problem before and it was really confusing. Itneeds to be disabled at startup, otherwise it cannot connect. 2) the bind address. Based on my discussion with Jeremy there may be 2 ways to do this. One, that I knew of, is the jgroups configuration when in jgroups-udp.XML you set the bind address to the local IP address of the interface you want to bind to. Of xourse you can define your own custom jgrouos-udp.xml, make sure it is configured correctly. Or you can force that at OS level trough some networking setup . I am not sure exactly how this last one is done bit it may be preffered so you do not hardcode IP addresses in the config files.

    You may also want to use the jgroups OOTB multicastreceiver and multicast sender diagnostic tools, without starting hybris, to eliminate a possibility that hybris introduced any issues.

    Add a comment
    10|10000 characters needed characters exceeded

Before answering

You should only submit an answer when you are proposing a solution to the poster's problem. If you want the poster to clarify the question or provide more information, please leave a comment instead, requesting additional details. When answering, please include specifics, such as step-by-step instructions, context for the solution, and links to useful resources. Also, please make sure that you answer complies with our Rules of Engagement.
You must be Logged in to submit an answer.

Up to 10 attachments (including images) can be used with a maximum of 1.0 MB each and 10.5 MB total.