2014-08-28 02:33 PM
Is it possible? I am looking to integrate this into my restart script so I can ensure aggregation is stopped prior to restarting the concentrator or decoder.
Thanks!
2014-08-28 06:08 PM
Related question for actual RSA personnel:
If you stop the service cleanly at the OS level using normal start/stop mechanisms, does it quiesce and shut down aggregation cleanly in the process, or just slam the door and shut down regardless of aggregation state, perhaps truncating/corrupting some aggregating data in the process?
In other words, is there even a reason to check aggregation status, or does the app shut down cleanly when it's stopped in a controlled manner?
2014-08-28 06:24 PM
Would the following suffice as a starting point?
[root@NWAPPLIANCE22290 ~]# NwConsole -c login localhost:50005 <user> <password> -c concentrator/devices ls depth=3 |grep -i consuming --color
457:0x2000000000200200 /concentrator/devices/192.168.183.123:50002/stats/status (Status) = consuming
Or if SSL is enabled:
[root@NWAPPLIANCE22290 ~]# NwConsole -c login localhost:50005:ssl <user> <password> -c concentrator/devices ls depth=3 |grep -i consuming --color
457:0x2000000000200200 /concentrator/devices/192.168.183.123:50002/stats/status (Status) = consuming
2014-08-29 06:30 AM
I have been told by multiple RSA and RSA Resellers that just doing a hard reset via restart nwconcentrator can harm the device potentially. I had never seen this happen for months on end until the other day when it took almost an hour and a half to come back online because it did not get to finish the last index thread, forcing it to fix it.
I have no idea if that was because of the aggregation not being stopped.
Would also love an answer for that
2014-08-29 08:03 AM
For anyone who would also like to use the script I am using I have attached it. I find that restarting the services via the web ui to actually be less successful than using the command line which is why I wanted to create this.
You will need to set three variables in this script. User, Password, and log location. I personally created a very restricted user for each concentrator so it could view the command for NwConsole. I used the permission services.manage in the web UI.
I know this script can likely be refined, first bash script I have ever created that is actually more than just running a command or two.
2014-08-29 11:01 AM
My last post is being "moderated" (I suspect because of the link) so I think it's probably not showing up.
Pulling "concentrator/devices" reports on the consumption status of each individual subordinate device, but it's not really reporting on the aggregation status of the device you're getting ready to shut down.
In NW you can pull /${device}/stats/status where ${device} is broker/concentrator/decoder to get a single value that reports the status of aggregation for the device. (I'm assuming you can do the same in SA.)
You can script this through NwConsole, but a REST call might be easier. Quick and dirty could be something like:
curl -u "${NWUSER}:${NWPASS}" hxxps://nwbroker:50103/broker/stats/status?msg=get&force-content-type=text/plain
Checking this value, there's no grepping output, you just get "stopped" back and can compare against the value specifically in your script:
STATUS=`curl -u \"${NWUSER}:${NWPASS}\" hxxps://nwbroker:50103/broker/stats/status?msg=get&force-content-type=text/plain`
while [ "${STATUS}" != "stopped" ]; do
sleep 5
# twiddle thumbs
done
(Check the verbiage against your environment's particulars, this is from memory... maybe it says "stopping"?)
Also, check out case statements in bash - they'll make your comparisons a bit cleaner... you can run the same code block for multiple values:
case $answer in
y|Y|yes|ja|si|da) run yes code
n|N|no|nein|nyet) run no code
esac
2014-08-29 12:58 PM
So when I run that without the script I get correct results and I like it. But when I run it in the script, it pulls back extra stuff.
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 9 0 9 0 0 2296 0 --:--:-- --:--:-- --:--:-- 3000
started
2014-08-29 01:12 PM
Figured it out, need to add a -s for curl to be silent.
Thanks! I like that method better for cleaner output.
2014-08-30 11:52 AM
In earlier releases, the startup scripts for SA Core services did not specify a kill timeout, so the default timeout of 2 seconds was used by Linux. This is typically not enough time for a Concentrator to cleanly shutdown so yes, it's possible that on restart it had to fix itself, which can take some time.
Since 10.3.3, the restart scripts have been rewritten to allow up to 60 seconds to shutdown each SA Core service, which should be plenty of time when "stop nwconcentrator" is issued. Now, if you hard reboot the appliance or pull the power, this obviously does not apply and all services will have to fix their internal databases on restart.
On CentOS 6, the startup scripts are in /etc/init/nwconcentrator.conf (for example). Cat the file and look for "kill timeout 60". If you don't see that line, add it somewhere in the middle of the file. Here's an example file:
start on runlevel [35] and stopped rc
stop on runlevel [!35]
respawn
respawn limit 10 300
console none
kill timeout 60
chdir /var/netwitness/concentrator/metadb
limit core unlimited unlimited
limit nofile 65536 65536
exec /usr/sbin/NwConcentrator --stopwhenready
expect stop
2014-09-02 09:56 AM
Regardless of version, the easiest way to safely restart Decoder, Concentrator, etc. is to just do a "killall NwConcentrator" or "killall NwDecoder". This will invoke the SIGTERM handler on the service. The SIGTERM handler will do a graceful shutdown, including waiting for an infinite time until the aggregation or capture is stopped. After the process exists, upstart will automatically restart it.
If you use the upstart utilities, such as "initctl", "stop", "start", or "restart", then upstart imposes the kill timeout limits. This means that upstart sends SIGTERM, then waits for up to N seconds, and then sends it SIGKILL, which forces an immediate process stop. If you use initctl-style commands to manage the process, it's a good idea to increase the kill timeout.