Query Aggregates
This section explains the supported aggregate functions.
Supported Aggregate Functions
The following table lists the supported Aggregate Functions.
Aggregate Function | Description | Input data types | Output data types |
---|---|---|---|
count | Returns the count of meta values, which includes duplicate values as well. | Numeric | Numeric |
countdistinct | Returns the total number of distinct or unique values. | Numeric | Numeric |
distinct | Returns all the unique values. | Any | Any |
first | Returns the first occurrence of the meta value. | Any | Same as input |
last | Returns the last occurrence of the meta value. | Any | Same as input |
sum | Returns a sum of all non-NULL values of metaKey in a group. | Numeric | Numeric |
avg (Average) | Returns the average value of all non-NULL values of the metaKey within a group. | Numeric | Numeric |
min (Minimum) | Returns the minimum for all values of metaKey in each group. This value is based on order by field. | Any | Any |
max (Maximum) | Returns the maximum for all values of metaKey in each group. The maximum value is the value that is returned by order by field. | Any | Any |
length | Returns the length of the values of metakey. This is called a "scalar function" in SQL. | Any | Numeric |
Examples of Queries and Results per Function
Count
This function returns the number of values for a specified meta key, that exclude null values but include duplicate ones. .
Example
The following figure shows a sample query for count function used for the destination IP and the respective source IP.
The following figure shows the result for the above query.
Here, for each unique ip.src (source IP), the page returns the total number or count of ip.dst (destination IP) values, which include the duplicate values as well.
Note: If your NetWitness is currently on 10.5 or newer version and any of the NetWitness Core devices are on 10.3 or 10.4 versions, then some of the aggregate functions may display unexpected errors. However, aggregate functions such as sum() and count() are supported in 10.4 version.
Countdistinct
The countdistinct function returns the count of unique or distinct values for the metakey. In other words, countdistinct function can be used to retrieve a number of distinct values for the specified metakey.
The following figure shows a sample query where the countdistinct function is used along with IP source (ip.src) and data size(size).
Example
The following figure shows the result for the above query.
Here, the page displays the data size along with the total number or count of distinct filenames from the respective IP source. Unlike the count function, the countdistinct excludes the duplicate values from the result.
Distinct
This function returns all the unique or distinct values of the metakey.
Example
The following figure shows a sample query for distinct function used to retrieve e-mails, between various source and destination IP (ip.dst).
The following figure shows the result for the above query.
Here, the page displays the list of unique e-mails that were exchanged between the respective IP source and destination.
First
This function is used to retrieve the first value from an ordered sequence of values for a specified metakey.
Example
The following figure shows a sample query for first function used to retrieve the first destination city name.
The following figure shows the result for the above query.
Here, the page displays the the first destination city for the corresponding source and destination IP. You can use the first function to isolate a particular value from a search result.
Last
This function is used to retrieve the last value from an ordered sequence of values for a specified metakey.
Example
The following figure shows a sample query for last function used to retrieve the most recent user name.
The following figure shows the result for the above query.
Here, the page displays the list of most recent or last usernames in full, that were exchanged between the source and destination IP.
Sum
This function returns the total of the non-NULL values of the metaKey within a group.
Example
The following figure shows the query for the Sum function used for packets.
The following figure shows the result of the above query.
Here the page displays the total or sum of the packets along with the size of the data for the respective destination country.
Avg
The average function returns the average of non-NULL values of the meta within a group.
Example
The following figure shows a sample query for average data size transmitted between a source and destination IP.
The following figure shows the result for the above query.
Here, the page displays the average size of data exchanged between source and destination IP:
Max and Min
Max and Min functions provide the maximum and minimum for given values of a meta respectively.
The following figure shows a sample query for max and min functions for various data sizes, for source IP and destination country.
Example
The following figure shows the result for the above query.
Here, the page displays the max(size) and min(size) columns, along with the list of source IP and destination country. The max(size) column lists the maximum data sizes exchanged while the min(size) column lists the minimum data sizes that were exchanged.
Filter aggregate meta results with Max_threshold
You can further filter the results of any function by using the threshold rule action.
Example
Following is a sample query for max_threshold used along with the Max function in the Then field is:
max_threshold(5000,max(size))
The following figure shows the Build Rule screen for the above query.
Here the max_threshold is applied for data size with an upper limit of 5000. The following figure shows the result.
Here, the result page displays the max(size) column, that lists the data sizes lesser than 5000 as this is the maximum threshold in the query, along with the corresponding IP source and the respective directory.
Filter aggregate meta results withMin_threshold
Similarly, min_threshold is used to filter the results for any function. A similar scenario as max_threshold is considered to explain this.
Example
Query for min_threshold used along with the Max function in the Then field is:
min_threshold(5000,max(size))
The following figure shows the Build Rule screen for the above query.
Here the min_threshold is applied for data size with a lower limit of 5000. The following figure shows the result.
Here, the result page displays the max(size) column, that lists the data sizes greater than 5000 as this is the minimum threshold in the query, along with the corresponding IP source and the respective directory.
Note: Max_threshold and Min_threshold rule actions are common across all the functions, and can be used along with the other queries in the Then field to retrieve the respective output.
Length
This function returns the length of a meta value. In other words, Length function returns the number of bytes used to store the actual value.
For instance, for the value "Analytics" it returns the length as 9. Similarly, for an IPv4 ip.src, it returns 4 (representing 4 bytes).
Example
The following figure shows a sample query for the length function used for usernames.
The following figure shows the result for the above query.
Here, the page displays the length of the usernames associated with the user account and their respective source IP.
Additional Information
When you query for aggregates (E.g. sum(size)) with Group By on a meta which has multiple values in a session, then the session with multiple values is accounted for aggregate calculation for each value of that meta.
Example
When you query for the Count aggregate function with Group By on Alias.host and if the column has multiple values in a session, then the session is counted for each occurrence, including the duplicate values.
Consider the following table.
SessionID | Alias.host | Ip.src | Size |
1 |
host-a, host-b, host-a |
a |
10 |
2 |
host-b, host-c, host-a, host-c |
c |
20 |
3 |
host-b, host-c, host-d |
b |
30 |
4 |
host-c, host-a |
a |
40 |
In the above table, alias.host for host-a and host-c has duplicate values listed for a single session. Let us consider the following query:
Select : alias.host, count(ip.src), sum(size)
Group By : alias.host
Here, host-a and host-c are present in 3 sessions and they are duplicated for two different sessions. However, the output is as shown below.
Alias.host | count(Ip.src) | Sum (size) |
host-a |
4 |
80 |
host-b |
3 |
60 |
host-c |
4 |
110 |
host-d |
1 |
30 |
Output table shows that the count for host-a and host-c is 4. This is because for each alias.host value, the entire session is considered. Similarly to calculate sum (size), the same sessions are considered for each alias.host value.
In the report output if the number of rows has reached NWDB maximum aggregate rows defined in RE configuration, then a message Max Aggregate Row Limit Reached is displayed to indicate that there is more information to be displayed. The default limit is 1000, and you can change this value as per your requirement, in the Reporting Engine Configuration page .