2016-04-02 06:17 PM
The network connection is established, and firewall turned off for testing. When I enable the Netwitness agent, the Security Analytics Master / Puppet Master states in /var/log/messages:
puppet-master: Compiled catalog for <key> in envorionment production 0.27 seconds
python: Adding {'node': '<certificate name> , classes " {[base': ''} to ENC database
python: Adding <certificate name> user to /rsa/system
python: signing puppet Cert
python: Pinging host <certificate name> with a 40 second timeout
python: Error with mco ping. Please check configuration.
When I run $mco ping <IP Address> I get a return.
When I run $mco ping <cert name> it returns the other certificate names in the inventory.txt, but not the one I'm trying to add.
The instructions from this site to establish a new puppet certificate has been followed from this site. I also am having an issue with RabbitMQ connecting to the puppet master.
Thank you for your time!
2016-04-07 06:17 PM
The clustered nodes are about 40 seconds before the SA / puppet-master.
I have three that work with this type of configuration, and one that doesn't.
I'll see what I can do to get the time better in sync. Thank you for your info David!
2016-04-07 06:49 PM
More info from /var/log/rabbitmq/sa@localhost
Working Node:
=INFO REPORT======
accepting AMQP connection <0.28935.132 > (127.0.0.1:34453 -> 127.0.0.1:5672)
Bad Node:
=INFO REPORT======
Disabling disk free space monitoring on unsupported platform:
{{'EXIT' , { badarg, [ {erlang,list_to_integer, [ "systems" ] , [ ] ,
{rabbit_disk_monitor, parse_free_unix,1,[ ] } ,
{rabbit_disk_monitor,init,1,[ ] } ,
{gen_server , init_it, 6 , [ { file, "gen_server.erl" } , {line , 304} ] } ,
{proc_lib , init_p_do_apply, 3
[ {file , "proc_lib.erl " } , { line, 239 } ] } ] } },
2016-04-07 07:59 PM
Thank you for all of your help.
--Update: erlang and rabbitmq was completely uninstalled and reinstalled. RabbitMQ is now running.
On the broken node, I ran puppet agent -t, and the puppet master appeared t orespond, but also gave the error in /var/log/messages:
python #33 [1;31mError: Could not find certificat request for <broke host>
python: Pinging host <broke host> with a 30 second timeout.
So it seems that now I know rabbitmq is working, I'm getting the same results. I will continue to work the NTP route to try to get that working.
2016-04-07 08:17 PM
Update -- NTP server was updated and is now within one second of each other between the SA and cluster nodes.
2016-04-08 07:55 AM
Hi Joseph,
Nice work!
David
2016-04-08 08:48 AM
Hi Joseph,
From the SA head you can perform an mco ping and you see the node_id of the device in question?
That can be found in /var/lib/puppet/node_id
When the system makes the first connection after you discover the certificate is signed , and then the SA server performs the mco ping, once successful. you are in business.
David
2016-04-08 06:19 PM
Thanks for the help David!
Actually, the Node_ID does not come up. That's what drew my original issue. I can do an mco ping <ip>, but when I do the mco ping <node_id> the broken node does not return; only the other working nodes.
I went through the /var/log/mcollective.log on the puppet-master, and it states the message in the orginal post. I also uninstalled and reinstalled mcollective on the broken node, and configured it to point to the SA.
At the end of the day yesterday, I found some configurations on ServerFault to manually configure the puppet-master to look for that specific node by adding it to /etc/puppet/csr_attributes.yaml in the puppet master.
I still believe there is a misconfiguration on the broken node because the /etc/mcollective/facts.yaml on the broken node remains blank, while it has auto-populated on the working nodes.
Links to Research:
Config Files: csr_attributes.yaml — Documentation — Puppet
Troubleshooting connections between components — Documentation — Puppet
activemq - How to run Puppet on multiple nodes at once using MCollective? - Server Fault
Using MCollective Command Line Applications — Documentation — Puppet
2016-04-08 06:21 PM
David,
I should follow up and say that the root cause was that the certificate wouldn't sign, and the broken node didn't seem to recognize the SA, and the SA isn't receiving a certificate request from the broken node, but does receive everything else.
2016-04-08 08:18 PM
Update,
Upon testing, I found that when I try to restart rabbitmq-server. or add the broken node to the SA, the symbolic link in /etc/rabbitmq/ssl/server/key.pem and cert.pem breaks. It points to:
key.pem -> /var/lib/puppet/ssl/private_keys/<node_id>.pem
cert.pem -> /var/lib/puppet/ssl/certs/<node_id>.pem
To fix, I just make new links, but it breaks right when I try to connect. This could be the cause of the failure.
2016-04-08 08:34 PM
To the issue above, I found out that puppet will create a broken symlink if there is incorrect configuration syntax. Ongoing troubleshooting