Salt communication fails between Node 0 and Node X.
Precheck tool fails with the error “ list of nodes that are not reachable” if communication between any Node 0 and Node X fails.
- Check the salt communication for the failed service:
a. Verify the status of salt-master, salt-minion on Node 0 using the below commands:
systemctl status salt-master.service
systemctl status salt-minion.service
b. Verify the status of salt-minion on Node X using the below command:
systemctl status salt-minion.service
2. If any of the services are down, you must restart the services using the below command:
systemctl restart <service_name>
3. If the failed Node ID is not associated with any of the appliances (Node X) then remove the Node X using the remove-key command:
Important - Make sure you remove the invalid Node X only.
Orchestration-cli-client --remove-key <UUID of the invalid Node X>
Note - If the failed nodeID is associated with any of the appliances (Node X) then check if the system is powered on.
4. (optional) Once any Node X is removed, you must clean the RabbitMQ of the removed Node X, perform the following:
a. List the RabbitMQ parameters using the below command:
rabbitmqctl list_parameters -p /rsa/system
Sample Output:
The following are the active and stale Hosts IPs list:
[root@SA ~]# rabbitmqctl list_parameters -p /rsa/system
Listing runtime parameters
federation-upstream carlos-upstream-14523a06-48db-45ec-b1e3-509b0a38b755 {"uri":"amqps://10.63.21.146:5671?auth_mechanism=external","expires":3600000}
federation-upstream carlos-upstream-91992551-1965-4463-8878-73abb48f85fc {"uri":"amqps://10.63.21.103:5671?auth_mechanism=external","expires":3600000}
federation-upstream carlos-upstream-a25d7e56-9f44-4eb2-bbc3-89989b348b54 {"uri":"amqps://10.63.21.115:5671?auth_mechanism=external","expires":3600000}
federation-upstream carlos-upstream-cb937111-0282-407d-aa4c-3453f07d120f {"uri":"amqps://10.63.21.202:5671?auth_mechanism=external","expires":3600000}
federation-upstream carlos-upstream-cdd2fdf1-7519-46e0-a8bd-76ca8e000f8f {"uri":"amqps://10.63.21.113:5671?auth_mechanism=external","expires":3600000}
For example, if the stale host IP "10.x.x.x" is not used or listed under
Admin >
Hosts view in the NetWitness Platform UI or orchestration-cli-client -l, you must get the corresponding mapped carlos upstream ID (for example, carlos-upstream-14523a06-48db-45ec-b1e3-509b0a38b755) from the above (Step 4.a) output to remove the stale entry.
b. Remove the carlos-upstream-xxxx which is mapped to stale_IP using the below command:
#rabbitmqctl -q clear_parameter -p /rsa/system federation-upstream carlos-upstream-<Carlos Upstream ID>
For example:
rabbitmqctl clear_parameter -p /rsa/system federation-upstream carlos-upstream-14523a06-48db-45ec-b1e3-509b0a38b755
c. Restart the services using the below commands:
service rabbitmq-server restart
service collectd restart
service rsa-sms restart
d. To verify that decommissioned node does not exist run the below command:
rabbitmqctl list_parameters -p /rsa/system