2016-04-02 06:17 PM
The network connection is established, and firewall turned off for testing. When I enable the Netwitness agent, the Security Analytics Master / Puppet Master states in /var/log/messages:
puppet-master: Compiled catalog for <key> in envorionment production 0.27 seconds
python: Adding {'node': '<certificate name> , classes " {[base': ''} to ENC database
python: Adding <certificate name> user to /rsa/system
python: signing puppet Cert
python: Pinging host <certificate name> with a 40 second timeout
python: Error with mco ping. Please check configuration.
When I run $mco ping <IP Address> I get a return.
When I run $mco ping <cert name> it returns the other certificate names in the inventory.txt, but not the one I'm trying to add.
The instructions from this site to establish a new puppet certificate has been followed from this site. I also am having an issue with RabbitMQ connecting to the puppet master.
Thank you for your time!
2016-04-05 01:09 PM
Can you run 'puppet agent -t' on your head unit and screen shot it or paste it in here somehow?
2016-04-07 09:20 AM
Hi Joseph,
Sounds like you can MCO ping other devices.
1. make sure that the /etc/puppet/csr_attributes.yaml file has the correct IP and hostname of the device you are trying to add.
2. Second, make sure that the time is correct, the time should be the same or close, but not ahead of the SA Server.
3. The device knows how to reach SA head because the device knows the Puppet master in the /etc/hosts, but does the SA head know the end device?
Thank you
David
2016-04-07 04:56 PM
I concur with David, when I see these issues it is usually related to ntp problems with time not being synced. In addition as long as you can mco ping other systems, mcollective isn't dead (which I've run into before) so RabbitMQ isn't overloaded or anything strange like that.
That .yaml csr attributes file is critical as well, need name and IP to match up with contents there or the cert that Puppet generates will be in conflict with what Puppet Master sees.
2016-04-07 05:27 PM
Adam,
I have ran "puppet agent -t" quite a bit. I first thought it was a cert issue, so I cleared the old cert, and attempted to get a new one. That's where this problem lead, and that's how I received the original message from the /var/log/messages on the puppet-master.
2016-04-07 05:30 PM
The NTP server is good on the puppet master, and the node that is trying to be added.
Update: mcollective was not installed on the node that was trying to be added. I installed it, and now mcollective is working. The only thing that isn't working is rabbitMQ. I receive the error:
$service rabbitmq-server start
Status of node sa@localhost ...
DIAGNOSTICS
==========
attemtped to contact [sa@localhost]
sa@localhost:
* unable to connect to epmd (port 4369) on localhost: nxdomain (non-existing domain)
Current node details:
- node name: 'rabbitmqctl-22331@shkm'
- home dir: /var/lib/rabbitmq
- cookie hash: 2*************** (An actual hash)
I have verified that the port can work, and tested it with netcat and netstat.
The hostname has been configured in /etc/hosts, and the configuration in /etc/rabbitmq and /var/lib/rabbitmq looks exactly like it does in the clusters that are working. I'm convinced that this may be coming down to improper erlang configuration, even though the erlang looks identical to the working cluster nodes.
Thank you all again for your time and effort!
2016-04-07 05:32 PM
And you've verified that time is accurate within a few seconds between the appliance and puppet-master?
Run the date command and check that /etc/ntp.conf shows the appropriate server for each system:
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
#server 0.centos.pool.ntp.org
#server 1.centos.pool.ntp.org
#server 2.centos.pool.ntp.org
server.mydomain.com
2016-04-07 05:34 PM
Hmmm,
I'd do a ps -ef | grep erlang on SA head unit, kill any erlang processes (kill -9 <PID>) and then do service rabbitmq-server restart
Check the status of the node after words and maybe tail out the RabbitMQ log files
tail -f /var/log/rabbitmq/sa@localhost.log
2016-04-07 05:57 PM
Kevin,
Thank you for your input!
This is on a WAN, so the SA and the clustered nodes are on two different NTP servers. (I will call them working and broke to help differentiate)
SA is on one NTP server
Broken-node and Working-node are on the same ntp server. The time difference is about 40 seconds, but working-node still has no issues maintaining it's certificates, mco, or rabbit messaging.
I spent about 20 hours last week going through all .conf files last week, and ensuring broken-node and working-node were the same. Today, I found one difference:
One thing to note, /var/log/rabbitmq/startup_{log, err} and sa@localhost is empty.
$ps -ef |grep erlang
Working Node:
/usr/lib64/erlang/erts-5.10.4/bin/epmd -daemon
/usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W -w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.3.4/sbin../ebin -noshell -noinput -s rabbit boot -sname sa@localhost -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connect_options [{nodelay,true}] -sasl errorlog_type error -sasl sals_error_logger false -rabbit error_logger {file, "/var/log/rabbitmq/sa@localhost.log"} -rabbit enabled_plugins_file "/etc/rabbitmq/rsa_enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/lib/rabbitmq_server-3.3.4/sbin../plugins -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/sa@localhost-plugins-expand" -os_mon_start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/sa@localhost" -kernel inet_dist_listen_min 25672 -kernel inet_dist_listen_max 25672
Broken Node
/usr/lib64/erlang/erts-5.10.4/bin/epmd -daemon
/usr/lib64/erlang/erts-5.10.4/bin/beam.smp -W -w -K true -A30 -P 1048576 -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.3.4/sbin../ebin -noshell -noinput -hidden -boot start_clean -sasl errlog_type error -mnesia dir "/var/lib/rabbitmq/mnesia/sa@localhost" -s rabbit_control_main -nodename sa@localhost -extra wait /var/run/rabbitmq/pid
2016-04-07 06:14 PM
Is the time forward or behind? Forward is not correct.