2016-07-27 12:13 AM
Hi All ,
I'm getting below aggregation errors in my packet concentrators :
NwConcentrator[8736]: [Aggregation] [failure] Unable to retrieve page from meta pool, please increase aggregate.meta.perpage and/or aggregate.meta.page.factor and restart
NwConcentrator[8736]: [Aggregation] [info] Aggregation is stopping
NwConcentrator[8736]: [Aggregation] [info] Aggregation threads are being shutdown
While restarting the packet concentrator aggregation service i am getting an error like :
"Aggregation failed to start.TransportException: Message start was not recognized by concentrator."
While cross-verifying with RSA SA community KB and stated that - Concentrator takes some time before the aggregation actually starts because it is “initializing databases” and “initializing the SDK” so as soon as everything is initialized properly, the aggregation starts.
After successful initialization of DB and SDK , my concentrator move up with consuming state but again it get down to offline state .
What could be the possible reason for this fluctuations ?
Thanks in Advance !
Regards
Pranav
2016-08-04 11:24 AM
Hi Khaled
We have had lots of concentrators re-index and don't think we had memory issues. Bcoz I've check with other CP and found one having 50 GB of resident memory .
So my concern is why such fluctuations happens for this one .
2016-08-04 11:31 AM
Hi Pranav,
If a concentrator can withstand the nwconcentrator process to utilize 50 GB, it will still have a lot of trouble regarding the virtual memory. You can try to free up the cache and see what happens.
Regarding the virtual memory, does the other concentrators have the virtual memory very high as well?
Best regards
Khaled
2016-08-04 11:41 AM
Khaled ,
As per our setup we have 5 CP and I can see for 2 CP they have 50 GB residual memory . FYR please find below top command for other CP :
3214 root | 20 0 371g 50g 20g S 17.7 53.5 4913:11 NwConcentrator |
I can see this have more Virtual memory and not re-indexing frequently don't think we had memory issues .
Need your advice for further progress .
2016-08-04 11:52 AM
What about the total memory utilized in this concentrator?
Why I am asking is that the crash that happens could just be due to spikes that happen. In the other concentrator there was less than 2 GB free memory. Any spike will cause the concentrator to crash.
Best regards
Khaled.
2016-08-05 06:19 AM
Hi Khaled ,
Below is snap from the two CP :
[CP5 ~]# free -m
total used free shared buffers cached
Mem: 96697 95108 1589 0 4 57536
-/+ buffers/cache: 37566 59130
Swap: 20479 290 20189
[CP01 ~]# free -m ---> re-indexing problem
total used free shared buffers cached
Mem: 96730 95247 1482 0 5 72203
-/+ buffers/cache: 23038 73691
Swap: 20479 6104 14375
Thanks in advance !
Regards
Pranav
2016-08-07 07:11 AM
Hi Pranav,
From the output above, it seems that both concentrators are almost identical in the memory utilization. Does the other one crash at all?
My recommendation is to put the cron jobs in the one that is frequently crashing to confirm if that will solve the problem or not. This way you can confirm whether or not it is a memory issue.
You could also send me the /var/log/messages files after it crashes again and I will have a look to confirm if the memory is the issue. You can send me the logs on my email "khaled.gamal@rsa.com".
Best regards
Khaled