2018-03-16 12:23 PM
Just came across another issue that should have been fixed years ago but RSA engineering didn't think it would be useful to do. There is a Jira open since around 2015 for this (which was probably closed as is was likely not understood).
To anyone with basic nosql understanding it would be clear that the database will not always release the disk space after deleting files. For that to happen a table reindex needs to happen.
On IM retention, the files are deleted but the database is never instructed by SA/NW/IM to perform a table re-index. This is the perfect recipe for disaster as the system will continuously run slower and slower and grow in size. If retention wasn't set, IM wouldn't be able to load the table in the GUI.
The sad thing is that raising a ticket with engineering (and based on the fact that there was a Jira about this issue) one would expect that they would come back to you and say, this is a known issue because we are not re-indexing the data and here is the workaround.
But no, weeks and weeks on and they were still going in circles up until I figured part of the problem myself. First they tried to blame me, then they tried to blame it on a bug on mongo.
Howe about that you have a bad implementation of mongodb?
All I had to do is this inside IM database:
db.alert.reIndex()
Before:
> show collections
system.indexes 3.48KB (uncompressed), 32.00KB (compressed)
system.users NaNundefined (uncompressed), NaNundefined (compressed)
categories 16.61KB (uncompressed), 32.00KB (compressed)
aggregation_rule 45.20KB (uncompressed), 64.00KB (compressed)
alert 163.11GB (uncompressed), 40.93GB (compressed)
incident 681.45KB (uncompressed), 2.47MB (compressed)
remediation_task 0.00B (uncompressed), 72.00KB (compressed)
tracking_id_sequence 285.00B (uncompressed), 32.00KB (compressed)
fs.files 0.00B (uncompressed), 48.50KB (compressed)
fs.chunks 0.00B (uncompressed), 48.50KB (compressed)
After:
> show collections
system.indexes 3.48KB (uncompressed), 32.00KB (compressed)
system.users NaNundefined (uncompressed), NaNundefined (compressed)
categories 16.61KB (uncompressed), 32.00KB (compressed)
aggregation_rule 45.20KB (uncompressed), 64.00KB (compressed)
alert 26.98GB (uncompressed), 12.41GB (compressed)
incident 681.45KB (uncompressed), 2.47MB (compressed)
remediation_task 0.00B (uncompressed), 72.00KB (compressed)
tracking_id_sequence 285.00B (uncompressed), 32.00KB (compressed)
fs.files 0.00B (uncompressed), 48.50KB (compressed)
fs.chunks 0.00B (uncompressed), 48.50KB (compressed)
My issue is still not completely gone as the IM table with the same amounts of ESA alerts still has 8 times bigger size than the equivalent alerts table in ESA, but it's a good start.
Suggestions to management:
-Make sure your staff are trained enough to deal with your software.
-Engineering should have basic understanding about the platform (ie what appliance the service is running on) and basic knowledge on how to login to a service for example. The default passwords have been the same since 10.0.
-Take Jira tickets suggested by people that have been working with customers seriously and implement them appropriately. Not just testing stuff in a "empty" virtual machine and sign them off.
As always, do this at your own risk. If you are not comfortable or unsure, raise a Support ticket with RSA.
2018-03-23 06:19 AM
Hello Marinos,
a quick workaround for NW 10.6.x is to add the following.
Open ESA command line and type:
crontab -u root -e
#FOR ESA db
12 3 20 * * root service puppet stop
15 3 20 * * root service rsa-esa stop && echo 'db.alert.reIndex()' | mongo esa -u esa -p esa && service rsa-esa start && service puppet restart
#FOR IM db
12 3 21 * * root service puppet stop
15 3 21 * * root service rsa-esa stop && echo 'db.alert.reIndex()' | mongo im -u im -p im && echo 'db.incident.reIndex()' | mongo im -u im -p im && service rsa-esa start && service puppet restart
save it as normal vi.
For IM it is not usually that much necessary to reindex it once per month because not many data should be changing, but adjust it accordingly your rollover period.
*During the reindexation of IM rsa-im it is not affected since ESA is down and it won't forward events to IM service.
With regards Emmanuele
2018-04-09 07:34 AM
Thanks Emmanuele,
I would have preferred if Engineering knew about this requirement and included it in the code, OR mentioned it during the weeks of troubleshooting and saved both their and my time.
I will stick with my manual ways as I don't trust RSA's cronjobs or puppet scripts deleting cronjobs and anything inbetween. One less thing to worry about.
I think your assumption about IM not having so much data could be wrong. We are forwarding everything from ESA with same retention period so we expect to have the same amount of data.
Thanks anyway!