The concept of multi-valued meta keys - those which can appear multiple times within single sessions - is not a new one, but has become more important and relevant in recent releases due to how other parts of the RSA NetWitness Platform handle them.
The most notable of these other parts is the Correlation Server service, previously known as the ESA service. In order to enable complex, efficient, and accurate event correlation and alerting, it is necessary for us to tell the Correlation Server service exactly which meta keys it should expect to be multi-valued.
Every release notes PDF for each RSA NetWitness Platform version contains instructions for how to update or modify these keys to tune the platform to your organization's environment. But the question I have each time I read these instructions is this: How do I identify ALL the multi-valued keys in my RSA NetWitness Platform instance?
After all, my lab environment is a fraction the size of any organization's production environment, and if it's an impossible task for me to manually identify all, or even most, of the these keys then its downright laughable to expect any organization to even attempt to do the same.
Enter....automation and scripting to the rescue!
The script attached to this blog attempts to meet that need. I want to stress "attempts to" here for 2 reasons:
Not every metakey identified by this script necessarily should be added to the Correlation Server's multi-valued configuration. This will depend on your environment and any tuning or customizations you've made to parsers, feeds, and/or app rules.
For example, this script identified 'user.dst' in my environment.
However, I don't want that key to be multi-valued, so I'm not going to add it.
Which leaves me with the choice of leaving it as-is, or undoing the parser, feed, and/or app rule change I made that caused it to happen.
In order to be as complete in our identification of multi-valued metas as we can, we need a large enough sample size of sessions and metas to be representative of most, if not all, of an organization's data. And that means we need sample sizes in the hundreds-of-thousands to millions range.
But therein lies the rub. Processing data at that scale requires us to first query the RSA NetWitness Platform databases for all that data, pull it back, and then process it....without flooding the RSA NetWitness Platform with thousands or millions of queries (after all, the analysts still need to do their responding and hunting), without consuming so many resources that the script freezes or crashes the system, and while still producing an accurate result...because otherwise what's the point?
I made a number of changes to the initial version of this script in order to limit its potential impact. The result of these changes was that the script will process batches of sessions and their metas in chunks of 10000. In my lab environment, my testing with this batch size resulted in roughly 60 seconds between each process iteration.
The overall workflow within the script is:
Query the RSA NetWitness Platform for a time range and grab all the resulting sessionids.
Query the RSA NetWitness Platform for 10000 sessions and all their metas at a time.
Receive the results of the query.
Process all the metas to identify those that are multi-valued.
Store the result of #3 for later.
Repeat steps 2-5 until all sessions within the time range have been process.
Evaluate and deduplicate all the metas from #4/5 (our end result).
This is best middle ground I could find among the various factors.
A 10000 session batch size will still result in potentially hundreds or thousands of queries to your RSA NetWitness Platform environment
The actual time your RSA NetWitness Platform service (Broker or Concentrator) spends responding to each of these should be no more than ~10-15 seconds each.
The time required for the script to process each batch of results will end up spacing out each new batch request to about 60 seconds in between.
I saw this time drop to as low as 30 seconds during periods of minimal overall activity and utilization on my admin server.
The max memory I saw the script utilize in my lab never exceeded 2500MB.
The max CPU I saw the script utilize in my lab was 100% of a single CPU.
The absolute maximum number of sessions the script will ever process in a single run is 1,677,721. This is a hardcoded limit in the RSA NetWitness SDK API, and I'm not inclined to try and work around that.
The output of the script is formatted so you can copy/paste directly from the terminal into the Correlation Server's multi-valued configuration. Now with all that out of the way, some usage screenshots:
Any comments, questions, concerns or issues with the script, please don't hesitate to comment or reach out.