Index CustomizationIndex Customization
This topic describes how to use the custom index file to customize the index. Each NetWitness NextGen service is installed with a default index configuration that is intended to cover the index needs for most users of the product. However, it is possible to index new meta keys in order to use the index with custom content that generated custom meta.
Index Configuration File LocationsIndex Configuration File Locations
The index customization is accomplished by making changes to the custom index file. The location of this file is /etc/netwitness/ng/index-<servicename>-custom.xml , where <servicename> corresponds to the name of the product that you are customizing. For example, the Concentrator custom index file is /etc/netwitness/ng/index-concentrator-custom.xml.
Concentrator products also include a file that describes the default index configuration: /etc/netwitness/ng/index-concentrator.xml . This file is useful as a template to show how the custom index file is formatted.
If you make customizations to the index in the custom index file, those customizations override any conflict with the default index configuration.
You can make changes to the custom index file while the service is running. When the service receives an index save command, the changes to the custom index file are read and applied to the index.
Changes to the index can only be applied to new incoming data. Data cannot be retroactively reindexed with a new custom index configuration, except by Rebuilding the Index .
Index configuration entriesIndex configuration entries
The custom index file is an XML document. The root element of this document is the
language
element, and inside there are elements for each meta key to describe each custom index. Each element of the custom index configuration looks like this:
<key name="did" description="Decoder Source" level="IndexValues" format="Text" valueMax="100" />
Definitions for each attribute in this element:
Attribute | Description |
---|---|
name | The name of the meta key that will be indexed |
description | A human-readable description for the meta type |
level | The type of index that will be created for this meta key |
valueMax | The maximum unique values that will be stored for this key per slice |
format | The format of the data held by all meta items with this meta key name |
bucket | Enable size bucketing |
ngrams | Enable ngram generation |
threshold | Threshold for approximate value merging on ngram indexes |
defaultAction | Default Navigate view action for this key: Open, Closed, Auto, Hidden |
The next few sections examine these parameters in greater detail.
Meta namesMeta names
The meta name used by the index refers to the meta key name present within every meta item in the meta database. These meta names are generated by the Decoders when parsing. Parsers can choose to generate meta with any meta key name. Therefore, the custom index allows you to choose which of the meta items generated by the Decoder are indexed.
Meta key names can be 16 characters long, and contain only letters or the '.' character.
Data TypesData Types
When the Decoder generates meta items, it assigns a data type. Each parser can choose the data type of the meta it generates. However, there are recommended and standard data types for each of the default meta keys. For example, ip.src and ip.dst are stored as the IPv4 meta type, and alias.host is stored as the Text meta type. Each parser must agree on the data format for each meta key generated by the Decoder.
When adding a custom index to the Concentrator, the data type of the custom index must match the format of the data generated by the Decoder. If the types do not match, the Concentrator attempts to convert the meta generated into the type specified for the custom index. However, these conversions sometimes fail, and the resulting index can produce undefined results.
Likewise, when many Decoders and Concentrators work together as part of a NetWitness installation, they must all agree on the types for each meta key. Conflicts of meta types between NetWitness NextGen services can lead to undefined behavior.
The following table shows the metadata types supported by the NetWitness NextGen services.
Type | Size in bytes | Description |
---|---|---|
Int8 | 1 | Signed 8-bit integer |
UInt8 | 1 | Unsigned 8-bit integer |
Int16 | 2 | Signed 16-bit integer |
UInt16 | 2 | Unsigned 16-bit integer |
Int32 | 4 | Signed 32-bit integer |
UInt32 | 4 | Unsigned 32-bit integer |
Int64 | 8 | Signed 64-bit integer |
UInt64 | 8 | Unsigned 64-bit integer |
UInt128 | 16 | Unsigned 128-bit integer |
Float32 | 4 | 32-bit floating point number, single precision |
Float64 | 8 | 64-bit floating point number, double precision |
TimeT | 8 | Unix epoc timestamp |
Binary | 1-255 | Arbitrary binary data |
Text | 1-255 | UTF-8 Encoded text data |
IPv4 | 4 | IPv4 address bytes |
IPv6 | 16 | IPv6 address bytes |
MAC | 6 | MAC Address bytes |
When defining a custom index, it is important to use the best data type for the meta. For example, never store IP addresses as Text, since the Text representation takes more bytes than the IPv4 representation.
Index LevelsIndex Levels
There are three levels, or types, of indexing: IndexNone, IndexKeys, and IndexValues.
IndexNone
This type of custom index is not really an index at all. Custom index entries with the IndexNone level exist only to define and document the meta key. IndexNone entries can be used in custom Decoder indices to enforce a specific data type for a meta key across all the parsers on a Decoder.
IndexKeys
This type of custom index indicates that the index only keeps track of sessions that contain meta items with this meta key name. However, it does not index any unique values in the meta database for the meta key.
Key-level indices take much less storage space, memory, and CPU time to manage, but they require a lot more work from the query engine when you perform query or values operations using them.
If used in a where clause, a meta key indexed at the key level can only be used to resolve operations such as exists or !exists.
IndexValues
This type of custom index keeps sessions that contain each individual unique value for the meta key. This type of index is also known as a "full index".
This type of index is needed for efficient processing of most where clauses, and for use of this meta key as the fieldName parameter of a values call.
Value MaxValue Max
Value max is a parameter that can have a very significant impact on the accuracy and performance of a Value-level index.
As a Decoder parses packets or logs, it is allowed to create meta of any type with any value. Usually, these meta items are created from data copied directly out of the packet or log. Therefore, anyone can create unique meta values in response to nearly any event.
The performance of the index is directly dependent on the number of unique values it has found for each meta key. As the number of unique values increases, the rate at which new meta is indexed can decrease, and the speed with which queries are completed decreases. Since any person can influence the creation of unique meta values, it is possible for any person to affect the performance of the index.
The value max parameter limits the number of unique values that can enter the index. Therefore, a malicious user cannot flood the system with a large number of unique values in an attempt to make the NetWitness system not work.
It is important to set a value max on any meta key that may have its value influenced directly by incoming packets or logs.
The value max applies only to values added since the last index save operation.
The limit for how high value max can be set varies from version to version and on the amount of RAM available to the NetWitness NextGen service. As of 10.3, the recommended ceiling for value max is 5,000,000 for any meta key. If there are a lot of custom indices, then the value max may have to be lower.
maxLengthmaxLength
The max length parameter is used exclusively on the
word
meta type. The meaning of the
maxLength
parameter depends on whether the index is storing N-grams, as indicated by the
ngrams
parameter. The default and recommended value for
maxLength
is 5.
Max Length without N-Grams
If N-Gram support is turned off, then the
maxLength
parameter indicates that search terms need to be truncated so that they will match truncated values in the index and meta database. If this is the case, the
maxLength
must
be less than or equal to the corresponding setting for
/decoder/parsers/config/token.max.length
on the Log Decoder service that is generating
word
token metas. The index will use the
maxLength
to properly interpret search terms fed into the
msearch
SDK function.
Max Length with N-Grams
If N-Gram support is turned on, by setting
ngrams="Edge"
or
ngrams="All"
, then the maxLength parameter controls the maximum length of N-Grams extracted from the meta item. In this scenario, the maxLength does not have to match the length of
word
meta items generated on the Log Decoder.
minLengthminLength
The minimum length parameter is used exclusively on the
word
meta type. It only has an effect when N-grams are generated. It indicates the smallest length N-gram that will be extracted from the
word
meta items. The default and recommended minimum N-Gram length is 3, which means that searches against the
word
index must have at least 3 characters.
ngramsngrams
The
ngrams
parameter is used exclusively on the
word
meta type. N-gram indexes extract information that allow for fast lookup of searches that only match part of the word. For example they allow for finding 'ball' inside the word 'basketball'. If set to the value of
all
, then the index will create entries for all N-grams within the word meta values. The minimum value of N is specified by
minLength
, and the maximum value of N is specified by
maxLength
.
The
ngrams
parameter also supports the value
edge
, which indicates the index will only store N-grams that appear at the beginning of a word. Edge N-grams are useful for type-ahead search matching, and take less space than storing all N-grams. However they are not useful to locate matches inside the word or at the end of the word.
The ngrams parameter supports the `allvalue` value for the text format meta keys. It means that the index for a meta key will store `all` N-grams within the meta values and also `IndexValues` limited by ValueMax.
This index type enhances the search capability on overflowed index values due to Value Max limits. The N-grams index provides the ability to search any meta value and the Values index provides the ability to retrieve top N available values.
The `minLength` parameter specifies the minimum value of N, the `maxLength` parameter specifies the maximum value of N, and the `ValueMax` parameter specifies maximum unique values. The following are some guidelines to follow while using these parameters:
-
It is recommended to set minLength=3 and maxLength=3 for compact index storage of N-grams and also use ValueMax to limit value index storage. When compared to Text format keys indexed by IndexValues and ValueMax=0 (unlimited) this N-gram index configuration provides better search functionality with compact index storage and memory usage.
-
The `contains` operation in queries runs faster for meta keys indexed with this Ngram index type when compared to IndexValues.
-
As the index type uses both N-grams and IndexValues for the same meta key, it increases the index memory and the index storage usage for the meta key and eventually reduces index retention. Hence it is recommended to choose this index type `only` for desired meta keys to consider storage and index retention.
-
When you switch to this N-gram type and if the new behavior is required on the whole index, you must perform a re-index.
N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of 'all' or 'allvalue' N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.
Note: Note: When you switch from ngrams 'allvalue' or 'all' to IndexValues, then you may need to consider re-index as index slices created before the configuration change would be ngram indexes and the values call would return ngrams.
N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of all
N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.
In the default index configuration, only the word
meta key has N-gram indexing enabled. This meta key is used to index text tokens extracted from unparsed logs on the Log Decoder.
The N-gram index mode supports a 'threshold' tunable parameter that controls the precision of the index. The threshold is used to merge similar index values together depending on how closely the set of indexed sessions matches. Values greater than 0 and less than or equal to 1.0 are accepted. A value of 1.0 means that the index will only merge values if they were found in the same set of sessions. Higher values mean that the index will merge fewer values, at the expense of requiring more time to create the index during aggregation. Lower values mean the index will merge more values together, at the expense of longer search execution time due to more database access. The threshold parameter does not affect search accuracy.
Numeric BucketingNumeric Bucketing
Indexes on meta formats that are unsigned integers, specifically UInt32 and UInt64, can make use of size bucketing to improve performance.
Size bucketing rounds down the size values in the index to their nearest traditional byte unit of information. Enabling this option on a numeric index reduces the number of unique values to track in the index, which improves aggregation and query performance.
The bucketing option is enabled by the boolean parameter
bucket
on the key element.
bucket
may have the value
0
, for off or
1
for on. The default is
0
.
Examples of bucket number values:
Raw Value | Value Stored in Index | Explanation |
---|---|---|
0 - 1,023 | 0 - 1,023 | Values 0-1023 are stored unmodified |
1,024 - 1,048,575 | 1 KB, 2 KB, 3 KB ... 1,023 KB | Values under 1 MB are stored in 1 KB buckets |
1,048,576 - 1,073,741,823 | 1 MB, 2 MB, 3 MB ... 1,023 MB | Values under 1 GB are stored in 1 MB buckets |
1,073,741,824 - 1,099,511,627,775 | 1 GB, 2 GB, 3 GB ... 1,023 GB | Values under 1 TB are stored in 1 GB buckets |
Key Value AliasesKey Value Aliases
Value aliases can be specified for keys. Value aliases are text representations that correspond to specific values for a key. These text representations may be easier to remember and more convenient to display. Aliases can be used in the rule/query language (see Queries) and are accessible via the SDK.
Value aliases are specified using the
aliases
and
alias
elements:
<key description="Service Type" format="UInt32" level="IndexValues" name="service" valueMax="75" defaultAction="Open">
<aliases>
<alias format="$alias" value="0">OTHER</alias>
<alias format="$alias" value="20">FTPD</alias>
<alias format="$alias" value="21">FTP</alias>
<alias format="$alias" value="22">SSH</alias>
<alias format="$alias" value="23">TELNET</alias>
<alias format="$alias" value="25">SMTP</alias>
⋮
</aliases>
</key>
Key RenamingKey Renaming
The index language supports the concept of key renaming. This feature is used to provide backwards compatibility for new key names to deprecate and replace old key names. A renaming is achieved by adding
rename
elements to the key. This has the effect of indicating the parent key renames the key in the rename element. For example, the key definition below defines a new key named
port_src
that renames the key
tcp.srcport
.
<key name="port_src" description="Source Port" format="UInt16" level="IndexValues">
<rename name="tcp.srcport"/>
</key>
The rename element indicates to the database that uses of the parent key, in this case
port_src
, will include both meta items with type port_src and meta items with type tcp.srcport. Thus, new meta items can be added to the database and queried using
port_src
, and such queries will return information that was previously stored in tcp.srcport as well.
The rename element accepts a single attribute,
name
, that refers to a previously defined key.
Keys referred by rename elements must have the same type as the parent key.
Keys referred by rename elements must have the same index level as the parent key.
If a key is redefined in a custom index file, and the redefined key contains rename elements, then those rename elements replace any previously defined rename elements.
Note: Usage of renamed meta key pairs in the
select
clause cannot be combined with fixed-size result paging for a query. For more information, see the
Queries
topic.
EntitiesEntities
The index configuration is used to define entities. Entities provide a convenient way to work with several meta keys at the same time. An entity definition is an alias that groups together the results from other meta keys. You can use an entity definition anywhere you would use a normal meta name. The primary use for entities is to organize similar meta types into a single, easier to use, meta type. For example, the default NextGen database language includes distinct meta types for IP source and IP destination. You could define an entity that represents the combined set of all IP sources and destinations using an
entity
element:
<entity name="ip.all" description="any ip entity">
<keyref name="ip.src"/>
<keyref name="ip.dst"/>
</entity>
The
entity
element accepts the following attributes:
Name | Description |
---|---|
name | (Required) The name of the entity |
description | (Optional) A description of the entity |
defaultAction | (Optional) Navigate view action for this entity: Open, Closed, Auto, Hidden |
Entity definitions create new entries in the NextGen service language. Since they are returned in the SDK
language
call, they can be used by older client applications that are not directly aware of the concept of entities.
Each entity definition must contain one or more
keyref
elements. The keyref element only allows a single
name
attribute that must refer to a real meta
key
element defined somewhere else in the device's language. The
keyref
is also allowed to refer to meta types defined in the default language.
Meta entities can be utilized in application rules, but are not supported in network rules as meta available is too limited.
Entity Definition RulesEntity Definition Rules
- All the keys referenced by an entity must have the same data type
- All the keys referenced by an entity must have the same index level
- An entity name cannot conflict with any existing meta type
- Keyrefs must refer to meta key names that are defined earlier in the index configuration
Entities in BrokersEntities in Brokers
Brokers will inherit entity definitions from up-stream devices, in the same way that meta key definitions are inherited. If the upstream devices attached to the broker do not all have the same set of entities defined, the Broker will log a warning. All upstream devices should have the same entity configuration. A broker operating with mismatched entity definitions may produce undefined behavior.