While updating/installing a device to version 11.2 or above, the following error can occur and be found in /var/log/netwitness/config-management/chef-solo.log:
.......
[2019-04-16T20:55:32+00:00] ERROR: Running exception handlers
[2019-04-16T20:55:32+00:00] ERROR: Exception handlers complete
[2019-04-16T20:55:32+00:00] FATAL: Stacktrace dumped to /var/lib/netwitness/config-management/cache/chef-stacktrace.out
[2019-04-16T20:55:32+00:00] FATAL: Please provide the contents of the stacktrace.out file if you file a bug report
[2019-04-16T20:55:32+00:00] ERROR: ruby_block[resolve ips] (nw-dns-client::config line 69) had an error: Resolv::ResolvError: no address for 889e5752-6ae3-4286-a944-c182
33f4ccbc
[2019-04-16T20:55:32+00:00] FATAL: Chef::Exceptions::ChildConvergeError: Chef run process exited unsuccessfully (exit code 1)
Thus, the install/upgrade fails.
The reason can be because the target host is unable to communicate to the Admin Server on port 53 as it is attempting to use the dnsmasq service on the Admin Server to resolve, in this case, 889e5752-6ae3-4286-a944-c182 33f4ccbc. This is the salt minion id of the admin server. You can see this by running "cat /etc/salt/minion" on the Admin Server to compare.
Example output:
[root@S5-NWAdmin ~]# cat /etc/salt/minion
master: localhost
hash_type: sha256
log_level: info
id: 889e5752-6ae3-4286-a944-c18233f4ccbc
If possible, configure any firewalls between the target host and the Admin Server host to be able to communicate on port 53.
If this is not possible, the workaround is to include the minion id in the /etc/host file on the component hosts and starting in the 11.4 release, modify the chef recipe not to overwrite this workaround.
Take the example /etc/hosts file from an Endpoint Hybrid host.
[root@S5-ENDPOINTHYB ~]# cat /etc/hosts
127.0.0.1 S5-ENDPOINTHYB localhost localhost.localdomain localhost4 localhost4.localdomain4 500081a7-f678-45ef-8def-0d416a10e415
::1 S5-ENDPOINTHYB localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.102 nw-node-zero
Edit /etc/hosts and add the node id, just like you saw in the error, next to nw-node-zero, if it exists. Otherwise, please add a line that has <IP address of Admin Server> <UUID of Admin Server>
[root@S5-ENDPOINTHYB ~]# cat /etc/hosts
127.0.0.1 S5-ENDPOINTHYB localhost localhost.localdomain localhost4 localhost4.localdomain4 500081a7-f678-45ef-8def-0d416a10e415
::1 S5-ENDPOINTHYB localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.102 nw-node-zero 889e5752-6ae3-4286-a944-c18233f4ccbc
Or add the entire line if there is no existing line:
[root@S5-ENDPOINTHYB ~]# cat /etc/hosts
127.0.0.1 S5-ENDPOINTHYB localhost localhost.localdomain localhost4 localhost4.localdomain4 500081a7-f678-45ef-8def-0d416a10e415
::1 S5-ENDPOINTHYB localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.2.102 889e5752-6ae3-4286-a944-c18233f4ccbc
Next, run a command that resolves the name to ensure you successfully made this change as well as saving this value to the resolver cache on the device in question. Ping will work but note that ICMP is blocked by default so this command will normally appear unsuccessful. You can also attempt to curl on a port that should be accessible. It does not matter for our purposes. We care if the name resolves to the correct IP address.
ping 889e5752-6ae3-4286-a944-c18233f4ccbc
or
curl -v https://889e5752-6ae3-4286-a944-c18233f4ccbc:5671
If you are on version 11.4+:An additional step must be performed if you are on 11.4 or above. The chef recipes attempt to overwrite our workaround from above. So, we must modify the chef recipes to work around this issue. First, we must edit a file:
vi /var/netwitness/config-management/cookbooks/third-party/nw-dns-client/recipes/config.rb
You locate this section of code and comment it out so that it does not remove our workaround.
# removes node-zero ip mapping in /etc/hosts
delete_lines 'delete nw-node-zero entry' do
path '/etc/hosts'
pattern NWDNSClient::Helper.node_zero_ip(false).to_s
not_if { node['nw_host']['type'].eql?('node-zero') }
only_if do
node['global']['nw-hosts'] &&
node['global']['nw-hosts'].any? { |host| !host['ipv4'].empty? }
end
end
Change all this by doing adding '#' to the beginning of the lines.
# removes node-zero ip mapping in /etc/hosts
#delete_lines 'delete nw-node-zero entry' do
# path '/etc/hosts'
# pattern NWDNSClient::Helper.node_zero_ip(false).to_s
# not_if { node['nw_host']['type'].eql?('node-zero') }
# only_if do
# node['global']['nw-hosts'] &&
# node['global']['nw-hosts'].any? { |host| !host['ipv4'].empty? }
# end
#end
Please note that whenever you upgrade, you have to reapply these changes each time this file is updated, as part of the upgrade.
After completing the above steps, you may attempt to upgrade once more while tailing the /var/log/netwitness/config-management/chef-solo.log and see if you bypass this error.
A Very Special Note about resolv.conf:In 11.3 and above, we are making the /etc/resolv.conf an immutable file as part of our Chef process. If you are unable to reach the Admin Server on port 53 or your component host uses a different DNS Server from your Admin Server, you must edit the local resolv.conf on the component host. To be able to edit the file to change what DNS Servers you query, you must undo this change.
chattr -i /etc/resolv.conf
Once this is done, you can restore your DNS server settings by vi-ing the file. If you are unsure what they were prior to your upgrade, you can check the backup files that the chef creates as it goes through it is upgrade run. They are date-stamped in the file name.
[root@S5-ENDPOINTHYB ~]# locate resolv.conf
/etc/resolv.conf
/var/netwitness/config-management/cache/cookbooks/nw-dns-client/templates/default/resolv.conf.erb
/var/netwitness/config-management/cookbooks/nw-dns-client/templates/default/resolv.conf.erb
/var/netwitness/config-management/local-mode-cache/backup/etc/resolv.conf.chef-20181016174034.809106
/var/netwitness/config-management/local-mode-cache/backup/etc/resolv.conf.chef-20190415152127.680013
Also, note that the Admin Server is different. The options in /etc/resolv.conf are being overwritten by what is defined in /etc/netwitness/platform/resolv.dnsmasq. If you want to change the Admin Server's DNS Servers, you must modify it there.
If this solution does not work for you and you are still experiencing issues after following the steps in this KB article, please open a case with RSA Technical Support quoting this KB article.