Index Customization
This topic describes how to use the custom index file to customize the index. Each NetWitness NextGen service is installed with a default index configuration that is intended to cover the index needs for most users of the product. However, it is possible to index new meta keys in order to use the index with custom content that generated custom meta.
Index Configuration File Locations
The index customization is accomplished by making changes to the custom index file. The location of this file is /etc/netwitness/ng/index-<servicename>-custom.xml , where <servicename> corresponds to the name of the product that you are customizing. For example, the Concentrator custom index file is /etc/netwitness/ng/index-concentrator-custom.xml.
Concentrator products also include a file that describes the default index configuration: /etc/netwitness/ng/index-concentrator.xml . This file is useful as a template to show how the custom index file is formatted.
If you make customizations to the index in the custom index file, those customizations override any conflict with the default index configuration.
You can make changes to the custom index file while the service is running. When the service receives an index save command, the changes to the custom index file are read and applied to the index.
Changes to the index can only be applied to new incoming data. Data cannot be retroactively reindexed with a new custom index configuration, except by Rebuilding the Index .
Index configuration entries
The custom index file is an XML document. The root element of this document is the language
element, and inside there are elements for each meta key to describe each custom index. Each element of the custom index configuration looks like this:
<key name="did" description="Decoder Source" level="IndexValues" format="Text" valueMax="100" />
Definitions for each attribute in this element:
Attribute | Description |
---|---|
name | The name of the meta key that will be indexed |
description | A human-readable description for the meta type |
level | The type of index that will be created for this meta key |
valueMax | The maximum unique values that will be stored for this key per slice |
format | The format of the data held by all meta items with this meta key name |
bucket | Enable size bucketing |
ngrams | Enable ngram generation |
threshold | Threshold for approximate value merging on ngram indexes |
defaultAction | Default Navigate view action for this key: Open, Closed, Auto, Hidden |
The next few sections examine these parameters in greater detail.
Meta names
The meta name used by the index refers to the meta key name present within every meta item in the meta database. These meta names are generated by the Decoders when parsing. Parsers can choose to generate meta with any meta key name. Therefore, the custom index allows you to choose which of the meta items generated by the Decoder are indexed.
Meta key names can be 16 characters long, and contain only letters or the '.' character.
Data Types
When the Decoder generates meta items, it assigns a data type. Each parser can choose the data type of the meta it generates. However, there are recommended and standard data types for each of the default meta keys. For example, ip.src and ip.dst are stored as the IPv4 meta type, and alias.host is stored as the Text meta type. Each parser must agree on the data format for each meta key generated by the Decoder.
When adding a custom index to the Concentrator, the data type of the custom index must match the format of the data generated by the Decoder. If the types do not match, the Concentrator attempts to convert the meta generated into the type specified for the custom index. However, these conversions sometimes fail, and the resulting index can produce undefined results.
Likewise, when many Decoders and Concentrators work together as part of a NetWitness installation, they must all agree on the types for each meta key. Conflicts of meta types between NetWitness NextGen services can lead to undefined behavior.
The following table shows the metadata types supported by the NetWitness NextGen services.
Type | Size in bytes | Description |
---|---|---|
Int8 | 1 | Signed 8-bit integer |
UInt8 | 1 | Unsigned 8-bit integer |
Int16 | 2 | Signed 16-bit integer |
UInt16 | 2 | Unsigned 16-bit integer |
Int32 | 4 | Signed 32-bit integer |
UInt32 | 4 | Unsigned 32-bit integer |
Int64 | 8 | Signed 64-bit integer |
UInt64 | 8 | Unsigned 64-bit integer |
UInt128 | 16 | Unsigned 128-bit integer |
Float32 | 4 | 32-bit floating point number, single precision |
Float64 | 8 | 64-bit floating point number, double precision |
TimeT | 8 | Unix epoc timestamp |
Binary | 1-255 | Arbitrary binary data |
Text | 1-255 | UTF-8 Encoded text data |
IPv4 | 4 | IPv4 address bytes |
IPv6 | 16 | IPv6 address bytes |
MAC | 6 | MAC Address bytes |
When defining a custom index, it is important to use the best data type for the meta. For example, never store IP addresses as Text, since the Text representation takes more bytes than the IPv4 representation.
Index Levels
There are three levels, or types, of indexing: IndexNone, IndexKeys, and IndexValues.
IndexNone
This type of custom index is not really an index at all. Custom index entries with the IndexNone level exist only to define and document the meta key. IndexNone entries can be used in custom Decoder indices to enforce a specific data type for a meta key across all the parsers on a Decoder.
IndexKeys
This type of custom index indicates that the index only keeps track of sessions that contain meta items with this meta key name. However, it does not index any unique values in the meta database for the meta key.
Key-level indices take much less storage space, memory, and CPU time to manage, but they require a lot more work from the query engine when you perform query or values operations using them.
If used in a where clause, a meta key indexed at the key level can only be used to resolve operations such as exists or !exists.
IndexValues
This type of custom index keeps sessions that contain each individual unique value for the meta key. This type of index is also known as a "full index".
This type of index is needed for efficient processing of most where clauses, and for use of this meta key as the fieldName parameter of a values call.
Value Max
Value max is a parameter that can have a very significant impact on the accuracy and performance of a Value-level index.
As a Decoder parses packets or logs, it is allowed to create meta of any type with any value. Usually, these meta items are created from data copied directly out of the packet or log. Therefore, anyone can create unique meta values in response to nearly any event.
The performance of the index is directly dependent on the number of unique values it has found for each meta key. As the number of unique values increases, the rate at which new meta is indexed can decrease, and the speed with which queries are completed decreases. Since any person can influence the creation of unique meta values, it is possible for any person to affect the performance of the index.
The value max parameter limits the number of unique values that can enter the index. Therefore, a malicious user cannot flood the system with a large number of unique values in an attempt to make the NetWitness system not work.
It is important to set a value max on any meta key that may have its value influenced directly by incoming packets or logs.
The value max applies only to values added since the last index save operation.
The limit for how high value max can be set varies from version to version and on the amount of RAM available to the NetWitness NextGen service. The recommended ceiling for value max is 5,000,000 for any meta key. If there are a lot of custom indices, then the value max may have to be lower.
maxLength
The max length parameter is used exclusively on the word
meta type. The meaning of the maxLength
parameter depends on whether the index is storing N-grams, as indicated by the ngrams
parameter. The default and recommended value for maxLength
is 5.
Max Length without N-Grams
If N-Gram support is turned off, then the maxLength
parameter indicates that search terms need to be truncated so that they will match truncated values in the index and meta database. If this is the case, the maxLength
must be less than or equal to the corresponding setting for /decoder/parsers/config/token.max.length on the Log Decoder service that is generating word
token metas. The index will use the maxLength
to properly interpret search terms fed into the msearch
SDK function.
Max Length with N-Grams
If N-Gram support is turned on, by setting ngrams="Edge"
or ngrams="All"
, then the maxLength parameter controls the maximum length of N-Grams extracted from the meta item. In this scenario, the maxLength does not have to match the length of word
meta items generated on the Log Decoder.
minLength
The minimum length parameter is used exclusively on the word
meta type. It only has an effect when N-grams are generated. It indicates the smallest length N-gram that will be extracted from the word
meta items. The default and recommended minimum N-Gram length is 3, which means that searches against the word
index must have at least 3 characters.
ngrams
The ngrams
parameter is used exclusively on the word
meta type. N-gram indexes extract information that allow for fast lookup of searches that only match part of the word. For example they allow for finding 'ball' inside the word 'basketball'. If set to the value of all
, then the index will create entries for all N-grams within the word meta values. The minimum value of N is specified by minLength
, and the maximum value of N is specified by maxLength
.
The ngrams
parameter also supports the value edge
, which indicates the index will only store N-grams that appear at the beginning of a word. Edge N-grams are useful for type-ahead search matching, and take less space than storing all N-grams. However they are not useful to locate matches inside the word or at the end of the word.
The ngrams parameter supports the `allvalue` value for the text format meta keys. It means that the index for a meta key will store `all` N-grams within the meta values and also `IndexValues` limited by ValueMax.
This index type enhances the search capability on overflowed index values due to Value Max limits. The N-grams index provides the ability to search any meta value and the Values index provides the ability to retrieve top N available values.
The `minLength` parameter specifies the minimum value of N, the `maxLength` parameter specifies the maximum value of N, and the `ValueMax` parameter specifies maximum unique values. The following are some guidelines to follow while using these parameters:
-
It is recommended to set minLength=3 and maxLength=3 for compact index storage of N-grams and also use ValueMax to limit value index storage. When compared to Text format keys indexed by IndexValues and ValueMax=0 (unlimited) this N-gram index configuration provides better search functionality with compact index storage and memory usage.
-
The `contains` operation in queries runs faster for meta keys indexed with this Ngram index type when compared to IndexValues.
-
As the index type uses both N-grams and IndexValues for the same meta key, it increases the index memory and the index storage usage for the meta key and eventually reduces index retention. Hence it is recommended to choose this index type `only` for desired meta keys to consider storage and index retention.
-
When you switch to this N-gram type and if the new behavior is required on the whole index, you must perform a re-index.
N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of 'all' or 'allvalue' N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.
Note: Note: When you switch from ngrams 'allvalue' or 'all' to IndexValues, then you may need to consider re-index as index slices created before the configuration change would be ngram indexes and the values call would return ngrams.
N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of all
N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.
In the default index configuration, only the word
meta key has N-gram indexing enabled. This meta key is used to index text tokens extracted from unparsed logs on the Log Decoder.
The N-gram index mode supports a 'threshold' tunable parameter that controls the precision of the index. The threshold is used to merge similar index values together depending on how closely the set of indexed sessions matches. Values greater than 0 and less than or equal to 1.0 are accepted. A value of 1.0 means that the index will only merge values if they were found in the same set of sessions. Higher values mean that the index will merge fewer values, at the expense of requiring more time to create the index during aggregation. Lower values mean the index will merge more values together, at the expense of longer search execution time due to more database access. The threshold parameter does not affect search accuracy.
Numeric Bucketing
Indexes on meta formats that are unsigned integers, specifically UInt32 and UInt64, can make use of size bucketing to improve performance.
Size bucketing rounds down the size values in the index to their nearest traditional byte unit of information. Enabling this option on a numeric index reduces the number of unique values to track in the index, which improves aggregation and query performance.
The bucketing option is enabled by the boolean parameter bucket
on the key element. bucket
may have the value 0
, for off or 1
for on. The default is 0
.
Examples of bucket number values:
Raw Value | Value Stored in Index | Explanation |
---|---|---|
0 - 1,023 | 0 - 1,023 | Values 0-1023 are stored unmodified |
1,024 - 1,048,575 | 1 KB, 2 KB, 3 KB ... 1,023 KB | Values under 1 MB are stored in 1 KB buckets |
1,048,576 - 1,073,741,823 | 1 MB, 2 MB, 3 MB ... 1,023 MB | Values under 1 GB are stored in 1 MB buckets |
1,073,741,824 - 1,099,511,627,775 | 1 GB, 2 GB, 3 GB ... 1,023 GB | Values under 1 TB are stored in 1 GB buckets |
Key Value Aliases
Value aliases can be specified for keys. Value aliases are text representations that correspond to specific values for a key. These text representations may be easier to remember and more convenient to display. Aliases can be used in the rule/query language (see Queries) and are accessible via the SDK.
Value aliases are specified using the aliases
and alias
elements:
<key description="Service Type" format="UInt32" level="IndexValues" name="service" valueMax="75" defaultAction="Open">
<aliases>
<alias format="$alias" value="0">OTHER</alias>
<alias format="$alias" value="20">FTPD</alias>
<alias format="$alias" value="21">FTP</alias>
<alias format="$alias" value="22">SSH</alias>
<alias format="$alias" value="23">TELNET</alias>
<alias format="$alias" value="25">SMTP</alias>
⋮
</aliases>
</key>
Key Renaming
The index language supports the concept of key renaming. This feature is used to provide backwards compatibility for new key names to deprecate and replace old key names. A renaming is achieved by adding rename
elements to the key. This has the effect of indicating the parent key renames the key in the rename element. For example, the key definition below defines a new key named port_src
that renames the key tcp.srcport
.
<key name="port_src" description="Source Port" format="UInt16" level="IndexValues">
<rename name="tcp.srcport"/>
</key>
The rename element indicates to the database that uses of the parent key, in this case port_src
, will include both meta items with type port_src and meta items with type tcp.srcport. Thus, new meta items can be added to the database and queried using port_src
, and such queries will return information that was previously stored in tcp.srcport as well.
The rename element accepts a single attribute, name
, that refers to a previously defined key.
Keys referred by rename elements must have the same type as the parent key.
Keys referred by rename elements must have the same index level as the parent key.
If a key is redefined in a custom index file, and the redefined key contains rename elements, then those rename elements replace any previously defined rename elements.
Note: Usage of renamed meta key pairs in the select
clause cannot be combined with fixed-size result paging for a query. For more information, see the Queries topic.
Entities
The index configuration is used to define entities. Entities provide a convenient way to work with several meta keys at the same time. An entity definition is an alias that groups together the results from other meta keys. You can use an entity definition anywhere you would use a normal meta name. The primary use for entities is to organize similar meta types into a single, easier to use, meta type. For example, the default NextGen database language includes distinct meta types for IP source and IP destination. You could define an entity that represents the combined set of all IP sources and destinations using an entity
element:
<entity name="ip.all" description="any ip entity">
<keyref name="ip.src"/>
<keyref name="ip.dst"/>
</entity>
The entity
element accepts the following attributes:
Name | Description |
---|---|
name | (Required) The name of the entity |
description | (Optional) A description of the entity |
defaultAction | (Optional) Navigate view action for this entity: Open, Closed, Auto, Hidden |
Entity definitions create new entries in the NextGen service language. Since they are returned in the SDK language
call, they can be used by older client applications that are not directly aware of the concept of entities.
Each entity definition must contain one or more keyref
elements. The keyref element only allows a single name
attribute that must refer to a real meta key
element defined somewhere else in the device's language. The keyref
is also allowed to refer to meta types defined in the default language.
Meta entities can be utilized in application rules, but are not supported in network rules as meta available is too limited.
Entity Definition Rules
- All the keys referenced by an entity must have the same data type
- All the keys referenced by an entity must have the same index level
- An entity name cannot conflict with any existing meta type
- Keyrefs must refer to meta key names that are defined earlier in the index configuration
Entities in Brokers
Brokers will inherit entity definitions from up-stream devices, in the same way that meta key definitions are inherited. If the upstream devices attached to the broker do not all have the same set of entities defined, the Broker will log a warning. All upstream devices should have the same entity configuration. A broker operating with mismatched entity definitions may produce undefined behavior.
GENEVE Tunnel Options
The index configuration defines the GENEVE Tunnel Options. The GENEVE Option class definition parses GENEVE packets and generates meta corresponding to the option types. The vendor must provide the specifications for GENEVE Tunnel Options, which must be mapped to the NetWitness format. The vendor's payload format for each GENEVE option type must be mapped to NetWitness meta key format.
NetWitness meta key formats supported by GENEVE parser are as follows: Int8, UInt8, Int16, UInt16, Int32, UInt32, Int64, UInt64, UInt128, Float32, Float64, TimeT, Text, IPv4, IPv6, MAC.
For example, if the payload type is 'byte array' then the corresponding NetWitness meta key format will be 'Text'.
The GENEVE definition below defines a GENEVE Option class and types for Vendor ‘ABC’:
<geneve>
<class id="0xFFCC" keyref="vendor" value="ABC">
<type id="0x0C" description="User Name" keyref="user.id"/>
<type id="0x0D" description="Site Id" keyref="loc.desc" disable="true"/>
<type id="0x0E" description="Timestamp" keyref="event.time" units="milliseconds" disable="false"/>
<type id="0x0B" description="Direction" direction="client" override="true"/>
</class>
</geneve>
Each geneve definition can have one or more class element defined. Each class element defines GENEVE options for a given class or vendor. The geneve class element accepts the following attributes:
Name | Description |
---|---|
id | (Required The GENEVE Option Class Id |
keyref | (Optional) Referenced meta key that must be defined earlier in the index configuration. |
value | (Optional) GENEVE Vendor name |
The keyref attribute above refers to the meta key that will have value provided by value attribute. The referred meta key should have Text format. If keyref is defined, but value attribute is not defined, then meta key referred by keyref will have value of id.
Each geneve class element can have 0 or more type element defined. The class type element accepts the following attributes:
Name | Description |
---|---|
id | (Required) The GENEVE Option Class Id |
description | (Optional) Option Class Type description |
keyref | (Optional) Referenced meta key that must be defined earlier in the index configuration. |
units | (Optional) Packet level option type. Applicable for Time format type only: seconds, milliseconds |
direction | (Optional) Packet level option type. Applicable for types that provide information about the packet stream direction: client, server |
override | (Optional) Applicable for Packet level option type - direction |
disable | (Optional) Disables the Option Class Type |
If the GENEVE frame contains GENEVE Option type data for a specific type ID, it will create a meta key referenced by the keyref attribute. The meta key format will correspond to the meta key referenced by keyref attribute. A few GENEVE Option types provide more contextual information about the packet. For example, the timestamp when the packet was captured and direction of the packet originating either from client or server. For timestamp option types, the units attribute provide information about timestamp unit in seconds or milliseconds.
The direction attribute can be set to client or server for option types that provide packet direction. Setting the override attribute to true will set the packet stream direction for the session. This setting will override the NetWitness packet stream direction algorithm. Setting the disable attribute to true turns off meta generation for GENEVE Option type.
Override Configuration in index-decoder-custom.xml File
To add or override a new GENEVE Option configuration, define it in the index-decoder-custom.xml file similar to the meta keys defined. If the GENEVE option exists in index.xml file, then all GENEVE Options configuration for that class will be overridden.
Example of GENEVE configuration for Netskope:
<geneve>
<class id="0xFFCC" keyref="vendor" value="Netskope">
<type id="0x0C" description="User Name" keyref="user"/>
<type id="0x0D" description="Site Id" keyref="loc.desc" disable="false"/>
<type id="0x0E" description="Timestamp" keyref="event.time" units="milliseconds"/>
<type id="0x0B" description="Direction" direction="client" override="true"/>
</class>
</geneve>
The index xml files must define the above-mentioned meta key, or an error occurs. The format of the meta key referenced by class node should be Text.
If the definition of the GENEVE Option is updated in the index XML files, a capture restart is required after performing index save and parser reload for the changes to take effect.
Save Decoder Index
To save the index, do the following:
-
Go to (Admin) > Services.
-
Select the Decoder and click > View > Explore.
- On the left panel, select index and right-click to select properties.
- In the Properties for Decoder (DECODER) /index pane, select save from the drop-down list and click Send to save the index to the disk.
Decoder Parser Reload
To issue a parser reload, do the following: