2013-08-30 09:48 AM
Hi all,
I've created the attached script, which I believe might be useful in certain circumstances. Mostly what it accomplishes can also be done by the Reporting and Alerting capabilities in SA/NextGen.
However, by generating CSV to the Standard Output it can easily be use to integrate with other tools in an automated form if necessary. It also has a key feature (in my view) missing in the reporting engine, it will be able to count and sort by count reports with more than one key, for example the top source-destination IP pairs.
It requires Python 2.7.x and has been tested on *nix and CygWin. It will display the following help/usage message when executed with -h or without parameters.
Usage: nwsdk_csv.py [options]
This script will connect via the NW REST API. It will use the 'keys' and
'where' parameters to build the necessary call. It will output the results to
STDOUT as a CSV file.
Options:
-h, --help show this help message and exit
-c CONNECT, --connect=CONNECT
[REQUIRED] NextGen REST device URL (e.g:
http://nwbroker:50103/ or https://nwcon:50105/)
-w CLAUSE, --where=CLAUSE
Query's 'where' clause
-t TIME, --time=TIME Time window (in seconds from now(), if not used time
defaults to 'All time')
-k FIELDS, --keys=FIELDS
Meta Keys to extract
-u USERNAME, --user=USERNAME
Username for REST endpoint
-p PASSWORD, --pass=PASSWORD
Username for REST endpoint
--no-count Do not display aggregation count
--no-header Do not add header line to output
--dns Resolve IP addresses via DNS
--top=TOP Filter on only Top <TOP> values for first key
-f FILENAME, --file=FILENAME
Filename tracking the latest completed METAID
--gmtime Convert Time from Epoch to GMT
Here are some basic examples of what it can be used for.
Show a count of all communications between the top 3 source addresses in the last 15 minutes, including Source IP address, Destination IP address, Service and the respective count.
# python ./nwsdk_csv.py -c https://broker:50103/ -t 900 --top 3
2013-Aug-29 11:07:02 - INFO: Using SSL, applying TLSv1 fix
2013-Aug-29 11:07:02 - INFO: Getting top 3 values for ip.src
2013-Aug-29 11:07:11 More data to process Completed ID:5763124851 Meta ID:5763124861 Last ID:5763125005
2013-Aug-29 11:07:12 All done Completed ID:5763124861 Meta ID:5763124861 Last ID:5763125005
ip.src,service,ip.dst,count
192.168.14.14,0,192.168.14.26,2872
192.168.14.13,0,192.168.14.27,2851
192.168.14.16,0,192.168.14.26,2699
192.168.14.14,0,192.168.14.14,1826
192.168.14.13,0,192.168.14.14,1820
192.168.14.16,0,192.168.14.14,1771
192.168.14.16,53,192.168.14.4,1112
192.168.14.14,80,192.168.14.11,112
192.168.14.13,80,192.168.14.11,110
192.168.14.16,80,192.168.14.11,103
192.168.14.16,0,192.168.14.11,35
192.168.14.14,0,192.168.14.11,15
192.168.14.13,0,192.168.14.11,12
192.168.14.16,0,192.168.14.4,2
In the above example, the script will first query for the top 3 Source IP addresses and will then use this on the following request as part of the 'where' clause.
The following example, shows one of The specified user was not found.'s recent queries used to extract information on all the relevant sessions.
# python ./nwsdk_csv.py -c https://broker:50103/ -k "sessionid,time,ip.src,ip.dst,service,tcp.dstport,alias.host,client,server,directory,filename,risk.info" -w "alias.host begins update,report && filename='<none>' && directory='/' && query exists && query length 100-u" --gmtime
2013-Aug-30 09:33:38 - INFO: Using SSL, applying TLSv1 fix
2013-Aug-30 09:33:43 All done Completed ID:2147212752 Meta ID:2147212752 Last ID:5842214536
sessionid,time,ip.src,ip.dst,service,tcp.dstport,alias.host,client,server,directory,filename,risk.info,count
44918863,"2013-May-05 20:47:21 GMT",192.168.14.14,213.133.99.140,80,80,updateserver.zillya.com,VPNGuardService,nginx,"/",<none>,"flags_syn|flags_rst|flags_psh|flags_ack|http1.1 without referer header|nginx http server",1
Here's a few more interesting examples, unfortunately for these I can't provide the output.
./nwsdk_csv.py -c https://broker:50103/ -k 'sessionid,time,ip.src,ip.dst,alias.host,service,tcp.dstport,udp.dstport,ip.proto,client,alert,risk.info,risk.warning,risk.suspicious' -w 'alert = my_ioc && alert != ips_of_interest' --no-count --gmtime -f track_new_myioc.lastid > track_new_myioc.output_`date +"%Y-%m-%d_%H%M"`.csv
The above example, will extract several meta keys based on alert criteria that is part of existing content on NextGen Decoders, it will save that output to a date coded file name but it will keep track of the last position on the NWDB that it queried up to in the file "track_new_myioc.lastid", it will use this same file in subsequent runs (as long as passed as a parameter) to only get new data from that position forward, this is a similar process to that used by the Alerting Engine.
# for d in `cat bad_domains`; do f="alias_host_$d.csv" ; d="'$d'"; python nwsdk_csv.py -c https://broker:50103/ -k alias.host,ip.src,ip.dst,service -w "alias.host contains $d" > $f; done
The above example will iterate through a list of domains contained on the "bad_domains" file and will provide an output per domain of the IP source, IP destination and Service for each. This can be handy for data that was collected before certain feeds where deployed.
I'm sure you will find several other examples and please shared them if you do! Please feel free to provide any feedback or ask any questions.
Thank you,
Rui
PS: I don't claim any programmer skills so feel free to re-use or modify this code.
PPS: It's been brought to my attention that if you upgrade to SA 10.6.2.2 due to its use of TLS 1.2 running this script with the native OS python 2.6.6 version will no longer work. Please reach out to me directly if you really need a "hack" around it.
PPPS: If trying to run this on NW11 appliances due to FIPS hardening you may need to prefix the execution with OWB_FORCE_FIPS_MODE_OFF=1 python nwsdk_csv.py ....
2015-09-10 03:50 PM
Hi,
Thank you for the feedback!
Try:
python nwsdk_csv.py -c https://x.x.x:50105 -t 3600 -u xxxx -p ******* -k ip.src -w "ip.src=10.200.20.34, 10.200.21.56, 10.200.25.7"
The second line it prints out to STDERR should show you what the REST call was without the "s it will just use the first IP based on my tests. In certain OSes you may need to replace the "s with 's, hope that helps!
Regards,
Rui
2015-09-11 05:10 PM
Thanks Rui!
Your answer worked perfectly.
Thanks again for a very useful chunk of code.
Don
2015-09-12 03:49 AM
Glad to hear! And thank you again for the feedback. I have a few changes going I may publish a new version soon, time permitting.
2015-10-08 09:03 AM
Removed older versions and messages to avoid confusion. Here's the latest and new version with some updated features.
--top option can now be used to exclude top values on subsequent query or executed with a different query/where clause
# == CHANGELOG ==
# - Version 0.9.16: (08 Sep 2015)
# > Changed code to output results immediately if sessionid is selected as entries will be unique, also store last_mid after each output
# - Version 0.9.15: (14 Aug 2015)
# > New features to use a seperate TOP query when using multiple keys and also to exclude values returned by that query
# > Added options --top-exclude and --top-where
# - Version 0.9.14: (28 Jul 2015)
# > FIX: 0.9.13 fix broke code when using single key NwValue calls
# - Version 0.9.13: (14 Jul 2015)
# > Check for Python >= 2.7.9 new SSL option for cert validation and ignore it by default - Added to fix_ssl_version() code
# > Prompt for password if -p - is used as an option (For interactive use)
# > FIX: Exit cleanly when no data is available to process
2016-04-13 06:21 AM
A quick fix for older SA versions that seem to be having problems with an id1 of 0, since we now do a summary call anyway the script will use the returned mid1 value instead. Also updated error handling to return the REST API response on an HTTP Error instead of just the standard error traceback.
# == CHANGELOG ==
# - Version 0.9.18: (11 Apr 2016)
# > Better error handling if query fails with an HTTPError, present returned results from REST call to user
# - Version 0.9.17: (08 Apr 2016)
# > If tracking file doesn't exist start at summary call mid1 instead of 0 as we are doing the call anyway and it causes problems in with some older versions
2016-11-18 08:11 AM
Rui,
Thanks for the script, it is very useful. I'm wondering you could make a minor modification to add to the script the meta by date ranges in the help time1="2016-11-08 00:00:00 " time2="2016-11-09 00:00:00"? since it works with it but isn't part of the current help output.
Guy
2016-11-21 05:46 AM
Hi Guy,
Apologies for the delay getting back to you, I was out of the office!
You can pass time as part of the main query for something like that, you don't need to use the time1 and time2 options. You can use something like the example below:
(service = 80) && time="2016-11-08 00:00:00"-"2016-11-09 00:00:00"
Just make sure you use ' around the -w option so there's no conflict with the "s on the time values.
I'm sure you know there's a built-in relative time function similar to what you get in SA (Last X time) in this case in seconds with the -t option.
It also may be worth considering the -f <tracking file> option if all you need is to continue from where it ended in the last invocation of the script.
Based on all these options please let me know if you still have a need for time1 and time2 options and if you could let know why these aren't suitable I would appreciate it too.
Cheers,
Rui
2016-11-21 08:10 AM
Hi Rui,
Thanks for pointing out the additional options available.
Guy
2016-11-21 08:19 AM
Hopefully they meet your needs, if not please just let me know.
Thank you,
Rui
2016-11-24 08:39 AM
Hi....was wondering if you could help with an issue. I've just started to run the script and am running this problem:
python nwsdk_csv.py -c https://XXXXXXX:50103 -u XXXXXXX -p XXXXXXX -k ip.dst -w "ip.src=XXXXX"
2016-Nov-24 08:34:50 - INFO: Using SSL, applying TLSv1 fix
2016-Nov-24 08:34:50 - ERROR: Reason="'URLError' object has no attribute 'read'" Class="<type 'exceptions.AttributeError'>"
2016-Nov-24 08:34:50 Traceback (most recent call last):
File "nwsdk_csv.py", line 456, in <module>
(mid1, mid2, msize, mmax, pid1, pid2, psize, pmax, time1, time2, ptime1, ptime2, sid1, sid2, ssize, smax, stotalsize, isize, memt, memu, memp, hostname, version) = get_summary(PROTOCOL + "://" + SERVER + ":" + PORT)
File "nwsdk_csv.py", line 335, in get_summary
contents = e.read()
AttributeError: 'URLError' object has no attribute 'read'
The filename, directory name, or volume label syntax is incorrect.
Any ideas?
Thanks,
Philip