QueriesQueries
This topic covers the database query syntax. There are three main mechanisms for performing queries in the database, the
query
,
values
, and
msearch
calls on the
/sdk
folder on each Core service.
The
query
call returns meta items from the meta database, possibly using the index for fast retrieval.
The
values
call returns groups of unique meta values sorted by some criteria. It is optimized to return a subset of the unique values sorted by an aggregate function such as count.
The
msearch
call takes text search terms as it's input, and returns matching sessions that match the search terms. It can search within indexes, meta, raw packets, or raw logs.
query
Syntax
query
Syntax
The
query
message has the following syntax:
query-params = size-param, space, query-param, {space, start-meta-param}, {space, end-meta-param}, {space, search-param} ;
size-param = "size=", ? integer between 0 and 1,677,721 ? ;
query-param = "query=", query-string ;
start-meta-param = "id1=", metaid ;
end-meta-param = "id2=", metaid ;
search-param = "search=", search-string ;
metaid = ? any meta ID from the meta database ? ;
The
id1
,
id2
, and
size
parameters form a paging mechanism for returning a large number of results from the database. Their usage mostly benefits developers who are writing applications directly against the NetWitness Core database. Normally, results are returned in the order of oldest to newest data (higher meta IDs are always more recent). In order to return results from most recent to oldest, reverse the IDs such that
id1
is larger than
id2
. This has a slight performance penalty, because the where clause must be completely evaluated before processing in reverse order can begin.
When size is left off or set to zero, the system streams back all results without paging. For the RESTful interface, this results in the full response to be returned with chunked-encoding. The native protocol returns the results over multiple messages.
The
query
parameter is a
query
command string with its own NetWitness-specific syntax:
query-string = select-clause {, where-clause} {, group-by-clause {, order-by-clause } } ;
select-clause = "select ", ( "*" | meta-or-aggregate {, meta-or-aggregate} ) ;
where-clause = " where ", { where-criteria } ;
meta-or-entity = (meta_key | entity) ;
meta-or-aggregate = meta-or-entity | aggregate_func, "(", meta-or-entity, ")" ;
aggregate_func = "sum" | "count" | "min" | "max" | "avg" | "distinct" | "first" | "last" | "len" | "avglen" | "countdistinct" ;
group-by-clause = " group by ", meta-key-list
meta-key-list = meta-or-entity {, meta-key-list}
order-by-clause = " order by ", order-by-column
order-by-column = meta-or-aggregate { "asc" | "desc" } {, order-by-column}
The
select
clause allows you to specify either
*
to return all the meta in all the sessions that match the where clause, or a set of meta field names and aggregate functions to select a subset of the meta with each session.
The
select
clause may contain entity names in the place of meta key names. If an entity name is in the select clause, meta items returned by the query will have their key name set to the entity name, rather than their actual meta key name stored in the session. Thus, the names of the meta items returned in the query will match the names of the metas in the select clause. For example, if there is an entity
ip
that consists of
ip.dst
and
ip.src
, then a query containing
select ip
will only return
ip
fields, with nothing to distiguish
ip.dst
meta items from
ip.src
meta items in the result set.
The
select
clause may contain renamed meta key names. Any fields appearing in the result set as a result of a renamed key in the
select
clause will be returned with the meta key name matching the name used in the
select
clause. For example, if the key
port_src
is used to rename
tcp.srcport
, then a query containing
select port_src
will only return
port_src
fields, even if the underlying meta had type
tcp.srcport
.
Note: Usage of renamed meta key pairs in the
select
clause cannot be combined with fixed-size result paging for a query. Doing so causes discrepancies in the results returned to Brokers. The reason for the discrepancies is that Concentrators cannot return only one of the key values of a renamed meta key pair and still preserve the correctness of the result set for the requested size. Hence, the Concentrator omits renamed meta key pair results to preserve the correctness of the result set, which causes the Broker to pull the result from the next Concentrator and advance the IDs that are returned.
Example:
select ip.proto,ipv6.proto
cannot be combined with
size=10
(a paging query)
size=10 flags=0 threshold=0 query="select time,ip.src,ip.dst, ip.proto,ipv6.proto,eth.type,size,payload,lifetime,client,did
The aggregate functions have the following effect on the query result set.
Function | Result |
---|---|
sum
|
Add all meta values together; only works on numbers |
count
|
The total number of meta fields that would have been returned |
min
|
The minimum value seen |
max
|
The maximum value seen |
avg
|
The average value for the number |
distinct
|
Returns a list of all unique values seen |
countdistinct
|
Returns the number of unique values seen.
countdistinct
is equivalent to the number of metas that would have been returned by the distinct function.
|
first
|
Returns the first value seen |
last
|
Returns the last value seen |
len
|
Converts all field values to a UInt32 length instead of returning the actual value. This length is the number of bytes to store the actual value, not the length of the structure stored in the meta database. For example, the word "NetWitness" returns a length of 10. All IPv4 fields, like
ip.src
, return 4 bytes.
|
avglen
|
Returns a single value which is the average value returned from the
len
function. The result is always a float64 value.
|
where
Clauses
where
Clauses
The
where
clause is a filter specification that allows you to select sessions out of the collection by using the index.
Syntax:
where-criteria = criteria-or-group, { space, logical-op, space, criteria-or-group } ;
criteria-or-group = criteria | group ;
criteria = (meta-key | entity), ( unary-op | binary-op meta-value-ranges ) ;
group = ["~"], "(" where-clause ")" ;
logical-op = "&&" | "||" ;
unary-op = "exists" | "!exists" ;
binary-op = "=" | "!=" | "<" | ">" | ">=" | "<=" | "begins" | "contains" | "ends" | "regex" ;
meta-value-ranges = meta-value-range, { ",", meta-value-range } ;
meta-value-range = (meta-value | "l" ), [ "-", ( meta-value | "u" ) ] ;
meta-value = number | quoted-value | ip-address | mac-address | relative-time ;
number = ? any numeric value ? | ( '"' text '"' )
quoted-value = ( '"' text '"' ) | ( '"' date-time '"' ) ;
relative-time = "rtp(" , time-boundary , "," , positive-integer , time-unit, ")" ;
time-boundary = "earliest" | "latest" | "now" ;
positive-integer = ? any non-negative integral number ?
time-unit = "s" | "m" | "h" ;
When specifying rule criteria, the
meta-value
part of the clause is expected to match the type of the meta specified by the
meta-key
. For example, if the key is
ip.src
the
meta-value
should be an IPv4 address. Entity names are allowed in any location where a meta-key name is required.
Queries using a
meta-key
name will match meta items corresponding both to the
meta-key
name as well as to the names of any "renames" specified for the key. See "Key Renaming" under the
Index Customization
topic for details on key renaming.
Query OperatorsQuery Operators
The following table describes the function of each operator.
Operator | Function |
---|---|
=
|
Match sessions containing the meta value exactly. If a range of values is specified, any of the values is considered a match. |
!=
|
Matches all sessions that would not match the same clause as if it were written with the
=
operator.
|
<
|
For numeric values, matches sessions containing meta with the numeric value less than the right side. If the right side is a range, the first value in the range is considered. If multiple ranges are specified, the behavior is undefined. For text metas, a lexicographical comparison is performed. |
<=
|
Same behavior as
<
, but sessions containing meta that equals the value exactly are also considered matches.
|
>
|
Similar to the
<
operator, but matches sessions where the numeric value is greater than the right side. If the right side is a range, the last value in the range is considered for the comparison.
|
>=
|
Same behavior as
>
, but sessions containing meta that equals the value exactly are also considered matches.
|
begins
|
Matches sessions that contain text meta value that starts with the same characters as the right side. |
ends
|
Matches sessions that contain text meta that ends with the same characters as the right side. |
contains
|
Matches sessions that contain text meta that contains the substring given on the right side. |
regex
|
Matches sessions that contain text meta that matches the regex given on the right side. The regex parsing is handled by boost::regex. |
exists
|
Matches sessions that contain any meta value with the given meta key. |
!exists
|
Matches sessions that do not contain any meta value with the given meta key. |
length
|
Matches sessions that contain text meta values of a certain length. The expression on the right side must be a non-negative number. |
Text Values
The system expects quoted text values. Unless it can be parsed as a time (see below), a quoted value is interpreted as text.
It is also important to quote any text value that may contain
-
so that it is not interpreted as a range.
For text values, the backslash character
\
is used as an escape value. This character is used when you need to search for a value containing quote characters. If you need to search for a backslash character, then the backslash itself must be escaped, as
\\
. Note that if you are wrapping the query parameters within another language, such as the parameter fields of the REST interface, you may need to add additional escape levels as required by whatever API or interface you are using to interact with the core service.
IP Addresses
IP addresses can be expressed using standard text representations for IPv4 and IPv6 addresses. In addition, the query can use CIDR notation to express a range of addresses. If CIDR notation is used, it is expanded to the equivalent value range.
MAC Addresses
A
MAC address
can be specified using standard MAC address notation:
aa:bb:cc:dd:ee:ff
Numeric Values
In a where clause, you can specify numeric search values. Numbers should not be surrounded by quotes.
Bucketed Numeric Indexes
Meta keys indexed with bucketing can be used like any other numeric search value. Under most situations such searches will return sessions that have a meta value that exactly matches the requested search criteria.
Special behavior is invoked for queries that select only
sessionid
, for example a query of the form
select sessionid where size = 2048
. Selecting
sessionid
explicitly bypasses all meta database read operations, and only returns index information. If selecting
sessionid
only, and if the numeric value specified is exactly equal to one of the bucket values, then the system will return all sessions that match somewhere in the bucket, rather than an exact match. For example, the search term
size = 2048
will match all sessions in the 2 KB bucket, which is the range from 2048 to 3171 bytes. However, if the search values does not match a bucket values, then the system will return only matches for the exact byte value. For example, the search term
size = 2049
will only match sessions with a size meta value exactly 2049. In this mode of operation, specifying a non-bucket value in a where clause is slower than searching within a bucket value. The 'where' clause parameter to the
values
API also invokes this optimization.
Using bucketed values in other forms of
query
does not invoke special behavior. The same is true for the
msearch
API. For those APIs, the use of a bucketed index in the where clause is evaluated accurately, without special meaning applied to bucket values. To search within an entire bucket using these APIs, specify the bucket range explicitly. For example
size=2048-3171
.
More information on how to tell if an index is bucketing is in the topic Index Customization .
Numeric Value Aliases
For numeric values, aliases specified in the index can be used in a query as a quoted string in place of where a literal numeric value would be used; e.g.,
select * where service = "NFS"
Numeric value aliases can be used anywhere a numeric literal might be used: as a single value, as the beginning or end of a range, or in a comma-delimited list of values (and/or ranges).
Refer to the topic Index Customization for details of how value aliases can be specified in the index.
Date and Time Expressions
In NetWitness Platform, dates are represented using Unix epoch time, which is the number of seconds since Jan 1, 1970 UTC. In queries, you can express the time as this number of seconds, or you can use the string representation. The string representation for the date and time is
"YYYY-mmm-DD HH:MM:SS"
. A three-letter abbreviation represents the month. You can also express the Month as a two-digit number, 01-12.
Time values must be quoted.
All times specified in queries are expected to be in UTC.
Relative Time Points
Relative time points allow a where clause to reference a value at some fixed offset relative to the earliest or latest time metas seen in the collection. It can also be used to reference a point in time relative the the current time.
A relative time point expression has the syntax
rtp(boundary, duration)
.
The boundary is either
earliest
,
latest
, or
now
.
The duration is an expression of hours, minutes, or seconds. For example,
24h
,
60m
, or
60s
. When the boundary is
earliest
, the duration represents the amount of time
after
the earliest time present in the collection. If the boundary is
latest
, the duration represents the amount of time
before
the earliest time present in the collection. If the boundary is
now
, the duration represents the amount
before
the current time.
When the boundary is
now
, the system clock of the Core service host is used to determine what time it is.
Boundary can be represented as 0 seconds if you wish to specify the relative time point with no duration offset. This is most useful in the case of the
now
boundary, since it is possible that the highest, latest, time observed in the collection may be much earlier than the current time.
Relative time points can only be used in SDK operations, where there is a collection from which to get the boundaries for earliest and latest time metas.
Relative time points only work on indexed meta types. The default indexed meta types are
time
and
event.time
.
Examples:
Last 90m of collection time:
time = rtp(latest, 90m) - u
First 2 days of event time:
event.time = l - rtp(earliest, 48h)
Events added in the last hour:
time = rtp(now, 60m) - rtp(now,0s)
Special Range Values
Ranges are normally expressed with the syntax *
smallest
*
-
*
largest
*, but there are some special placeholder values you can use in range expressions. You can use the letter
l
to represent the lower-bound of the all meta values as the start of the range, and
u
to represent the upper bound. The bounds are determined by looking at the smallest or largest meta value found in the index out of all the meta values that have already entered the index.
If you use the
l
or
u
tag, it should be unquoted.
For example, the expression
time = "2014-may-20 11:57:00" - u
would match all time from that 2014-may-20 11:57:00 to the most recent time found in the collection.
Notice that it is easy to confuse a range expression with a text string. Make sure that text values that contain
-
are quoted, and that hyphens within range expressions are not within quoted text.
group by
Clause (since 10.5)
The query API has the ability to generate aggregate groups from the results of a query call. This is done using a
group by
clause on the query. When
group by
is specified, the result set for the query is subdivided into groups. Each group of results is uniquely identified by the meta values indicated in the group by clause.
For example, consider the query
select count(ip.dst)
. This query returns a count of all ip.dst metas in the database. However, if you add a
group by
clause, like this:
select count(ip.dst) group by ip.src
, the query returns a count of the ip.dst metas found for each unique ip.src.
As of version 10.5, you can utilize up to 6 meta fields in a
group by
clause.
The
group by
clause shares some of the same functionality as the
values
call, but it offers significantly more advanced groups at the expense of longer query times. Producing the results of a grouped query involves reading the meta from the meta database for all sessions that match the
where
clause, while a values call can produce its aggregates by reading the index only.
The contents of each group returned by the query are defined by the
select
clause. The
select
clause can contain any of the aggregate functions or meta fields selected. If multiple aggregates are selected, the result of the aggregate function is defined for each group. If nonaggregate fields are selected, the meta fields are returned in batches for each group.
The result set of a
group by
query is encoded with the following rules:
- All meta items associated with a group are delivered with the same group number.
-
The first meta items returned to the group identify the group key. For example, if the
group by
clause specifiesgroup by ip.src
, then the first meta item of each group will be anip.src
. -
The normal, nonaggregate meta items are returned after the
group key
, but they all will have the same group number as the group key metas. - The aggregate result meta fields for each group are returned next.
- All fields within a group are returned together. Different group results will not be interleaved.
If one of the
group by
meta items is missing from one of the sessions matched by the
where
clause, that meta field is treated as a NULL for the purposes of that group. When the results for that group are returned, the NULL-valued parts of the group key will be omitted from the group's results, since the database has no concept of NULL.
The semantics of a
group by
query differ from a SQL-like database in terms of what meta fields are returned. SQL databases require you to select the
group by
columns explicitly in the
select
clause if you want them to be returned in the result set. The NetWitness Core database always implicitly returns the group columns first.
A query with a
group by
clause honors the result set
size
parameter if one is provided. However, due to the nature of the grouping, it puts an additional burden on the caller to page and reform groups if a fixed-size result set is requested. For this reason, you should not specify an explicit result size when making a
group by
call. By not specifying an explicit size, the entire result set will be delivered as partial results.
group by
clauses allow results to be grouped by an entity definition.
The following table describes the database honors configuration parameters that limit I/O or memory impact of a group by query.
Parameter | Function |
---|---|
/sdk/config/max.query.groups
|
This is the limit on how many groups can be held in memory to calculate aggregates. This parameter allows you to limit the overall memory usage of the query. |
/sdk/config/max.where.clause.sessions
|
This is the limit on how many sessions from the where clause can be processed in a query. This parameter allows you to set a limit on the number of sessions that have to be read from the meta and session databases to resolve a query. |
order by
Clause (since 10.5)
An
order by
clause can be added to a query that contains a
group by
clause. The
order by
clause causes the set of grouped results to be returned in sorted order.
An
order by
consists of a set of items to sort by in ascending or descending order. Sorting can be performed on any data field that will be returned in the result set. This includes meta specified by the
select
clause, aggregate function results specified by the
select
clause, or
group by
meta fields.
The
order by
clause can sort over many columns. There is no limit on the number of
order by
columns allowed in the query; but a practical limit exists in that each of the
order by
columns must refer to something returned by the
select
clause or
group by
clause. The multiple column sort is imposed lexicographically, meaning that if two groups have equal values for the first column, then they are sorted by the second columns. If they are equal in the second column, they are sorted by the third column, and so on for however many
order by
columns are provided. Groups that do not contain any of the metas referenced by the
order by
clause are sorted first in the result set in the case of an ascending sort, and last in the case of a descending sort.
The NetWitness Core database is unique in that the groups of results returned by a query may each have many values for a selection. For example, it is possible to select all meta items that match a meta type and organize them into groups, and it is possible to use the
distinct()
function to return groups of distinct meta values. If an
order by
clause references one of the fields in the group that has multiple values, the sorting order is applied as follows:
- Within each group, the fields with multiple matching values are ordered by the ordering clause
- All the groups are sorted by comparing the first occurrence of the ordered field found within each group
The
order by
clause is only available in queries that have a
group by
clause, since groups are required to organize the meta fields into distinct records. If you wish to sort an arbitrary query as if there were no grouping applied, use
group by sessionid
. This ensures that results are returned in groups of distinct sessions or events.
group by
clauses are naturally returned in ascending group key order; but, an
order by
clause can be used to return groups in a different order.
If an
order by
column does not specify
asc
or
desc
, the default ordering is ascending.
Examples:
select countdistinct(ip.dst) GROUP BY ip.src ORDER BY countdistinct(ip.dst)
select countdistinct(ip.dst) GROUP BY ip.src ORDER BY countdistinct(ip.dst) desc
select countdistinct(ip.dst),sum(size) GROUP BY ip.src ORDER BY sum(size) desc, countdistinct(ip.dst)
select sum(size) GROUP BY ip.src, ip.dst ORDER BY ip.dst desc
select user.dst,time GROUP BY sessionid ORDER BY user.dst
select * GROUP BY sessionid ORDER BY time
search
parameter
search
parameter
The
query
API supports a
search
parameter to perform free text searching. The syntax of the search parameter is identical to the search parameter utilized by the
msearch
API call. Refer to the
msearch
documentation for a description of the search field syntax.
The
search
parameter acts as an extension of the
where
clause in the
query
parameter. This means that the
query
and
search
parameters work together. Use the
query
parameter to specify the
select
clause, the
group by
clause, or the
order by
clause. Any
where
clause criteria specified in the
query
parameter are combined with the search filter as if they were joined with an
AND
operation.
Searches through the
query
API are always done against indexed meta, in case-insensitive mode. It has the same behavior as specifying flags
si,sm,ci
to the
msearch
API.
values
call
values
call
The index provides a low-level
values
function to access the unique meta values that have been stored in the index. This function allows developers to perform more advanced operations on groups of unique meta values.
The
values
call parameter syntax:
values-params = field-name-param, space, where-param, space, size-param, {space, flags-param} {space, start-meta-param}, {space, end-meta-param}, {space, threshold-param}, {space, aggregate-func-param}, {space, aggregate-field-param}, {space, min-param}, {space, max-param}, {space, search-param} ;
field-name-param = "fieldName=", (meta-key | entity) ;
where-param = "where=", where-clause ;
size-param = "size=", ? integer between 1 and 1,677,721 ? ;
start-meta-param = ? same as query message ?
end-meta-param = ? same as query message ?
flags-param = "flags=", {values-flag, {"," values-flag} } ;
values-flag = "sessions" | "size" | "packets" | "sort-total" | "sort-value" | "order-ascending" | "order-descending" ;
threshold-flag = "threshold=", ? non-negative integer ? ;
aggregate-func-param = "aggregateFunction=", { aggregate-func-flag } ;
aggregate-func-flag = "count" | "sum" ;
aggregate-field-param = "aggregateFieldName=", ( meta-key | entity ) ;
min-param = "min=", meta-value ;
max-param = "max=", meta-value ;
search-param = "search=", search-string ;
The
values
call provides the function of returning a set of unique meta values for a given meta key. For each unique value, the
values
call can provide an aggregate total count. The function used to generate the total is controlled by the flags parameter.
ParametersParameters
The following table describes the function of each parameter.
Parameter | Function |
---|---|
fieldName
|
This is the meta key name for which you retrieve unique values. For example, if
fieldName
is
ip.src
, this function returns the unique source IP values in the collection. Entities can be used for the field name, in which case the result is defined as the combined set of field values for all the referenced meta keys. If the
fieldName
refers to a key with rename references, the result is defined as the combined set of field values for the given meta key name plus all of the references' meta keys.
|
where
|
This is a
where
clause which filters the set of sessions for which the unique values are returned. For example, if the
fieldName
is
ip.src
, and the
where
clause is
ip.src = 192.168.0.0/16
, only values in the range of
192.168.0.0
to
192.168.255.255
are returned. For information on the
where
clause syntax, see
Where Clauses
.
|
size
|
The size of the set of unique values to return. This function is optimized to return a small subset of the possible unique values in the database. |
id1
,
id2 |
These optional parameters limit the scope of the search for unique values to a specific region of the meta database and the index. Setting the
id1
and
id2
parameters to a limited range of the meta database is very important to running searches quickly on large collections.
|
flags
|
Flags control how the values are sorted and totaled. Flags are described in the following Values Flags section. |
threshold
|
Setting the
threshold
parameter allows the
values
call to short-cut collection of the total associated with each value once the threshold is reached. By providing a threshold, the caller can reduce the amount of index and meta items that must be retrieved from the database. If the
threshold
parameter is omitted or set to 0, this optimization is not used.
|
aggregateFunction
|
Optional parameter used to change the default behavior from counting sessions, packets, or size to counting or summing the numeric field defined by
aggregateFieldName
. Both parameters must be specified when either is defined. Pass either
sum
or
count
to specify which behavior to perform.
|
aggregateFieldName
|
The meta field on which to perform the
aggregateFunction
. Both
aggregateFunction
and
aggregateFieldName
parameters must be specified when the
aggregate
flag is set. Performing a
values
call using one of the aggregate functions can be significantly slower than a
values
call that collects totals of sessions, packets, or size. The reason for this is that each session that matches the
where
clause must be retrieved from the meta database. This scan causes a large portion of the query to be I/O bound on the meta DB volumes. The time taken to run an aggregate
values
call is linearly proportional to the number of sessions that match the
where
clause.
|
min
,
max |
The minimum and maximum value that should be returned from the call. These parameters are used to iterate (or page) over an extremely large number of values, typically more values than could be returned from a single call. Primarily used in conjuction with the flags
sort-value,sort-ascending
such that the highest value returned would be used in a subsequent call as the
min
parameter value. The values are exclusive. If
min="rsa"
was specified and
rsa
was a valid value,
rsa
would not be returned; instead, the next highest value would be returned.
|
search
|
Text search pattern to be used to further refine the
where
parameter
|
values
Flags
values
Flags
The
flags
parameter controls how the values call operates. There are three groups of flags that correspond to the different modes of operation as shown in the following table.
Flag | Description |
---|---|
sessions
,
size
,
packets |
The
values
call allows you to specify one of these flags to determine how the total for each value is calculated. If the flag is
sessions
, the
values
call returns a count of sessions that contain each value. If the flag is
size
, the
values
call totals the size of all sessions that contain each unique value, and reports the total size for each unique value. If the flag is
packets
, the values call totals the number of packets in all sessions that contain each unique value, and then reports that total for each unique value.
|
sort-total
,
sort-value |
These flags control how results are sorted. If the flag is
sort-total
, the result set is sorted in order of the totals collected. If the flag is
sort-value
, the results are returned in order of the sorting order of the values.
|
order-ascending
,
order-descending |
These flags control the sort order of the result set. For example, if sorting by total in descending order, the values with the greatest total are returned first. |
suggest
|
Enables suggestion mode for the values API. All other flags are ignored if this flag is set |
database-scan
|
Make the values call bypass the index and instead collect unique values as if where traversing the meta database. This mode is slow on most cases, but it can be fast if the where clause matches very few sessions. |
ignore-cache
,
clear-cache |
These flags control result set caching on the set of values returned by this call. Normally these should not be used. |
values
Call Example
values
Call Example
The
values
call is used extensively by the Navigation view in NetWitness Platform. The default view generates calls that look like this:
/sdk/values id1=198564099173 id2=1542925695937 size=20 flags=sessions,sort-total,order-descending threshold=100000 fieldName=ip.src where="time=\"2014-May-20 13:12:00\"-\"2014-May-21 13:11:59\""
In this example, the Navigation view requests unique values for
ip.src
. It requests unique values of
ip.src
in the time range given. It asks for the count of sessions that match each
ip.src
, and the results are the top 20
ip.src
values when sorted by the number total count of sessions in descending order. In addition, the Navigation view has a meta ID range in order to provide an optimization hint to the query engine.
Values call and bucketing modeValues call and bucketing mode
When a values call is executed with a
fieldName
parameter that specifies a bucketed indexed meta, the system will only return the bucket values present within the rest of the criteria. This has the side effect of producing counts and totals that represent all sessions within each returned bucket. This is useful because it summarizes size meta into groups that represent human-readable ranges like 1 MB, 2 MB, and so one.
When the number of sessions scanned by the values call drops below 1000 sessions, the values call operates in meta-scanning mode, and at that point it returns exact values for numeric value indexes, regardless of the bucketing setting on the index.
Suggestion ModeSuggestion Mode
The
values
call has an additional execution mode that is used to provide suggested search values. In this mode of operation, the
values
call only identifies unique values stored with the given meta key name. It provides these results within milliseconds. To achieve this it does not provide any session counts, it will not utilize any other sort flags. The suggestion mode does utilize the
where
parameter to refine suggestions, but it only utilizes the time range clause if provided. Other portions of the
where
clause are not utilized to refine suggestion.
Suggestion mode is enabled by setting the
suggest
flag in the
flags
parameter.
Suggestion mode gives special meaning to the
min
parameter. The
min
parameter can contain the starting point for the suggested values. The return values of
suggest
mode will only include values that start with the text provided in the
min
parameter.
search
parameter
search
parameter
The
values
API supports a
search
parameter to perform free text searching. The syntax of the search parameter is identical to the search parameter utilized by the
msearch
API call. Refer to the
msearch
documentation for a description of the search field syntax.
The
search
parameter acts as an extension of the
where
parameter. This means that the
where
and
search
parameters work together. Any
where
parameter specified is combined with the search filter as if they were joined with an
AND
operation.
Searches through the
values
API are always done against indexed meta, in case-insensitive mode.
The
values
API only operates on index entries for most requests, in order to provide fast totals over a large numbers of events. When the search parameter operates over in-exact indexes, such as N-Gram indexes, it may include sessions with near matches instead of narrowing the search to exact matches.
msearch
Call
msearch
Call
The index provides a low-level
msearch
function to perform text searches against all meta types. This type of search does not require users to define their queries in terms of known meta types. Instead, it searches all parts of the database for matches.
msearch
is used by the Events view text search. See the "Filter and Search Results in the Events View" topic in the
Investigation and Malware Analysis Guide
for detail on the accepted search forms and examples.
msearch
parameters:
msearch-params = search-param, {space, where-param}, {space, limit-param}, {space, size-param}, {space, flags-param};
search-param = "search=", ? free-form search string ? ;
where-param = "where=", ? optional where clause ? ;
limit-param = "limit=", ? optional session scan limit ? ;
size-param = "size=", ? optional result count limit ? ;
flags = "flags=", {msearch-flag, {"," msearch-flag} };
msearch-flag = "sp" | "sm" | "si" | "ci" | "regex" ;
The
msearch
algorithm works as follows:
- A set of sessions is identified from the index by finding the intersection of three sets:
- (Set 1) All sessions in the database
-
(Set 2) Sessions that match the
where
clause parameter -
(Set 3) If the
si
flag is specified, sessions that indexed values that match the search string parameter. -
If the search specifies the
sm
parameter, all meta items from the set of sessions identified in step 1 are read and scanned to see if they match the search string parameter. The meta items will be read from the service nearest to the point where the search was executed. For example, if the search is performed on a Broker, the meta items may be read from the Concentrator nearest to the broker, but if the search is performed on an Archiver the meta items will be read from the Archiver itself. -
If the search specifies the
sp
parameter, all raw packet or log entries from the set of sessions identified in step 1 are read and scanned to see if they match the search string parameter. The packets will be read from the service nearest to the point where the search was executed. For example, if the search is performed on a Concentrator, the packet data will be read from the Decoder, but if the search is performed on an Archiver, the packet data will be read from the Archiver itself. -
Matches from step 2 and step 3 are returned as they are found, up to the point where the
limit
parameter is reached or thesize
count is reached, whichever occurs first. Thelimit
parameter specifies the maximum number of sessions for which meta and packet data will be scanned. Iflimit
is not specified, the entire set of sessions determined in step 1 is scanned. Thesize
parameter specifies the maximum number of results that will be returned. In practice, thesize
parameter acts more as a suggestion. It is possible that slightly more results than specified will be returned, but fewer results will never be returned. If thesize
parameter is not specified, all results matching the search will be returned.
msearch
Flags
msearch
Flags
Flag | Description |
---|---|
sp
|
Scans raw packet data |
sm
|
Scans all meta data |
si
|
Does index lookups for all search parameters before scanning meta |
ci
|
Performs a case insensitive search. Returned results are case-preserving. |
regex
|
Treats the search parameter as a regular expression. Only a single regular expression can be specified, but the regular expression may be arbitrarily complex. |
msearch
Index Search Mode
msearch
Index Search Mode
Using the index search mode, specified by using the
si
flag, causes results to be returned significantly faster than any other mode. The main limitation of this mode is that it only returns matches on text terms that match value-indexed meta values.
-
The
si
parameter must be combined with thesm
flag. Thesi
parameter implies the search only matches indexed meta. -
The
si
parameter can be used with regex searches, however only text indexed values will match. IP addresses and numbers will not match the regex.
Text Search SyntaxText Search Syntax
The search parameter given to
msearch
is composed of 1 or more words, separated by whitespace. For example, searching for
foo
returns sessions that contain the word foo.
If multiple terms are provided for the search, they are implicitly considered to be an AND operation. For example, searching for
foo bar
returns sessions that contain both foo AND bar. Sessions that contain only foo or only bar are filtered out. If you want to search for sessions containing any of two or more terms, you must explicitly separate the terms with the word OR. For example, searching for
foo OR bar
returns sessions that contain either foo or bar.
Search Syntax And Index ModesSearch Syntax And Index Modes
The searches given to the
msearch
command are interpreted according to the index level on all the indexes.
msearch
works on the value-indexed keys in the index. Search terms provided to
msearch
will find values that are an exact match to values that were indexed.
As of version 11.1, there are new index modes available that allow
msearch
to locate text that is not an exact match to the search input.
msearch
supports wildcard searches on the word meta index, if the word meta index has the ngram option enabled. For details on the ngram option, see the topic
Index Customization.
The wildcard search allows the use of the
*
and
?
characters as wildcards in search terms. The
*
can stand for 0 or more characters, while the
?
may stand for any single character. To search for those characters in an N-gram enabled index, you may escape them with a backslash character.
If the word index has the 'edge' N-gram option enabled, then it can be used to locate searches that end in a wildcard. This means it is only useful for finding text that begins with a known prefix.
If the word index has the 'all' N-gram option enabled, then wildcards may appear anywhere in the search term.
This table summarizes the relationship between word index level, and the types of searches that
msearch
will locate.
Search input | Non-indexed | IndexValues | IndexValues with Edge N-grams | IndexValues with All N-grams |
---|---|---|---|---|
"foo" | no match | "foo" | words starting with "foo" | words containing "foo" |
"foo*" | no match | literal "foo*" | words starting with "foo" | words starting with "foo" |
"*foo" | no match | literal "*foo" | no match | words ending with "foo" |
"*foo*" | no match | literal "*foo*" | words staring with foo | words containing "foo" |
"foo\*" | no match | literal "foo\*" | literal "foo*" | literal "foo*" |
msearch
Tips
msearch
Tips
-
Always use the
where
clause to specify a time range for the search. -
To search for IP address ranges, specify them in the
where
clause. -
Use the
limit
parameter when not using the index search mode. Without it, there will be an extremely large amount of data read by the meta and packet databases.
Stored ProceduresStored Procedures
The
query
and
values
calls provide more low-level search functionality. For more advanced use cases, server-side stored procedures exist.
Use of Quotes in Query SyntaxUse of Quotes in Query Syntax
The query parser does not care whether you use single or double quotes within a query statement. A single- or double-quoted value is treated as text meta.
The query parser attempts to make sense of whatever you put in the statement. It is not very strict about what it will accept.
For example:
reference.id=4752
This clause identifies sessions that have a
reference.id
meta value that has a
numeric
value of 4752.
reference.id='4752'
or
reference.id="4752"
This clause identifies sessions that have a
reference.id
meta value that has a
string
value of
4752
.
However, the query engine implicitly compares numbers and strings that look like numbers as equal when the values are semantically the same. So it works with either syntax.
For most efficient performance, however, it is always a good idea to construct the queries such that the query syntax matches the data types generated by the parser.
For example, if the parser is creating
reference.id
as a numeric data type (such as
uint32
or
uint64
), then use the numeric syntax.
If the parser is creating
reference.id
as a text data type, then use the string syntax.
hierarch
Call
hierarch
Call
The
hierarch
call returns information about the hierarchy of devices attached to the collection represented in this database.
A hierarchy consists of this device, plus any devices that this device is connected to. For each device, the contents of the
/sys/stats
folder is returned. This information includes the device name, it's UUID, and it's version information.
The hierarchy command returns it's information as a MessagePack object, which may be translated into different representations depending on what API you are using to access the Core service. For example, using the REST API it is translated to a JSON object.
For devices that connect to upstream devices, such as a Broker or Concentrator, the hierarchy message will contain a
devices
member. The
devices
member is an array that holds the contents of the
hierarch
message as executed on each upstream device. In this way, the
hierarch
message forms a hierarchical directory of all services that the device connects to, both directly and indirectly.