NetWitness already got the Health & Wellness service which provide a full overview for the health of all Netwitness Services and hosts, Yet I also created a script for health check to perform a quick analysis on Disk usage, Memory utilization, Existence of Core files & If there were any failed services on any NetWitness host
Also it lists all your hosts with regards to their Salt Minion IDs, hostnames, IPs and also provide a Salt Reachability check.
The procedure actually consists of 2 scripts
health-check.sh: This is a Script to run on the SA and it performs a simple Health-Check on your environment, it copies the Health-Check-host.sh to all hosts then turns it executable then run it at each host and provide the output and recommendation. It also lists all your hosts UUIDs "Salt Minion IDs", Hostnames & IPs and perform a Reachability Test as well
health-check-host.sh: This script is copied to all Netwitness hosts when you run the health-check.sh on the SA, This script analyzes the hosts disk usage,memory utilization,Existence of Core files & if there are any failed services on that host
This script (health-check-host.sh) will not run manually, it will run once you run the health-check.sh script on the SA
All Below steps are done on SSH session to the NetWitness Admin Server (SA)
1) under /root on the SA,
#vi health-check.sh
2) copy the content of health-check.sh (attached) into that file you created in step 1
3) under /root on the SA,
#vi health-check-host.sh
4) copy the content of health-check-host.sh (attached) into that file you created in step 3
5) You will only make the health-check.sh executable (not the health-check-host.sh)
#chmod +x health-check.sh
6) Run the health-check.sh
#./health-check.sh
Sample Run:
.
.
Minion did not return. [Not connected] OR No Response could point to one of the below reasons
1) If you are facing any network slowness and Salt Master (SA) is unable to reach to the Salt Minions (hosts) within a specific time limit during fetching their IPs, Hostnames, the output of the first part of the script can provide you (No Response), Don't Panic, This does not mean that SA in totally unable to reach the Host(s), but it was unable to reach it during a specific time limit thus the salt will temporarily provide you with "No Response" output.
If you run the script again during no network slowness, it should provide the output as expected
2) If the host is having 0 free memory left and utilized all its swap memory, its salt minion may not reply to salt master's request of IP,hostname & Reachability test; giving Minion did not return. [Not connected]. If you run the script again, it will show a result normally, otherwise, Thanks to check memory utilization of this host if you already isolated it's not related to above 1st point (network slowness) or a retired host/powered off host (3rd point)
3) Minion did not return. [Not connected] could also point to a retired host (that was removed from the environment yet its salt minion UUID was not deleted from the salt master) or could point to a host that's currently powered off
Please feel free to provide feedback, bug reports etc...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.