Index Customization

This topic describes how to use the custom index file to customize the index. Each NetWitness NextGen service is installed with a default index configuration that is intended to cover the index needs for most users of the product. However, it is possible to index new meta keys in order to use the index with custom content that generated custom meta.

Index Configuration File Locations

The index customization is accomplished by making changes to the custom index file. The location of this file is /etc/netwitness/ng/index-<servicename>-custom.xml , where <servicename> corresponds to the name of the product that you are customizing. For example, the Concentrator custom index file is /etc/netwitness/ng/index-concentrator-custom.xml.

Concentrator products also include a file that describes the default index configuration: /etc/netwitness/ng/index-concentrator.xml . This file is useful as a template to show how the custom index file is formatted.

If you make customizations to the index in the custom index file, those customizations override any conflict with the default index configuration.

You can make changes to the custom index file while the service is running. When the service receives an index save command, the changes to the custom index file are read and applied to the index.

Changes to the index can only be applied to new incoming data. Data cannot be retroactively reindexed with a new custom index configuration, except by Rebuilding the Index .

Index configuration entries

The custom index file is an XML document. The root element of this document is the language element, and inside there are elements for each meta key to describe each custom index. Each element of the custom index configuration looks like this:

				
<key name="did" description="Decoder Source" level="IndexValues" format="Text" valueMax="100"  />				
			

Definitions for each attribute in this element:

Attribute Description
name The name of the meta key that will be indexed
description A human-readable description for the meta type
level The type of index that will be created for this meta key
valueMax The maximum unique values that will be stored for this key per slice
format The format of the data held by all meta items with this meta key name
bucket Enable size bucketing
ngrams Enable ngram generation
threshold Threshold for approximate value merging on ngram indexes
defaultAction Default Navigate view action for this key: Open, Closed, Auto, Hidden

The next few sections examine these parameters in greater detail.

Meta names

The meta name used by the index refers to the meta key name present within every meta item in the meta database. These meta names are generated by the Decoders when parsing. Parsers can choose to generate meta with any meta key name. Therefore, the custom index allows you to choose which of the meta items generated by the Decoder are indexed.

Meta key names can be 16 characters long, and contain only letters or the '.' character.

Data Types

When the Decoder generates meta items, it assigns a data type. Each parser can choose the data type of the meta it generates. However, there are recommended and standard data types for each of the default meta keys. For example, ip.src and ip.dst are stored as the IPv4 meta type, and alias.host is stored as the Text meta type. Each parser must agree on the data format for each meta key generated by the Decoder.

When adding a custom index to the Concentrator, the data type of the custom index must match the format of the data generated by the Decoder. If the types do not match, the Concentrator attempts to convert the meta generated into the type specified for the custom index. However, these conversions sometimes fail, and the resulting index can produce undefined results.

Likewise, when many Decoders and Concentrators work together as part of a NetWitness installation, they must all agree on the types for each meta key. Conflicts of meta types between NetWitness NextGen services can lead to undefined behavior.

The following table shows the metadata types supported by the NetWitness NextGen services.

Type Size in bytes Description
Int8 1 Signed 8-bit integer
UInt8 1 Unsigned 8-bit integer
Int16 2 Signed 16-bit integer
UInt16 2 Unsigned 16-bit integer
Int32 4 Signed 32-bit integer
UInt32 4 Unsigned 32-bit integer
Int64 8 Signed 64-bit integer
UInt64 8 Unsigned 64-bit integer
UInt128 16 Unsigned 128-bit integer
Float32 4 32-bit floating point number, single precision
Float64 8 64-bit floating point number, double precision
TimeT 8 Unix epoc timestamp
Binary 1-255 Arbitrary binary data
Text 1-255 UTF-8 Encoded text data
IPv4 4 IPv4 address bytes
IPv6 16 IPv6 address bytes
MAC 6 MAC Address bytes

When defining a custom index, it is important to use the best data type for the meta. For example, never store IP addresses as Text, since the Text representation takes more bytes than the IPv4 representation.

Index Levels

There are three levels, or types, of indexing: IndexNone, IndexKeys, and IndexValues.

IndexNone

This type of custom index is not really an index at all. Custom index entries with the IndexNone level exist only to define and document the meta key. IndexNone entries can be used in custom Decoder indices to enforce a specific data type for a meta key across all the parsers on a Decoder.

IndexKeys

This type of custom index indicates that the index only keeps track of sessions that contain meta items with this meta key name. However, it does not index any unique values in the meta database for the meta key.

Key-level indices take much less storage space, memory, and CPU time to manage, but they require a lot more work from the query engine when you perform query or values operations using them.

If used in a where clause, a meta key indexed at the key level can only be used to resolve operations such as exists or !exists.

IndexValues

This type of custom index keeps sessions that contain each individual unique value for the meta key. This type of index is also known as a "full index".

This type of index is needed for efficient processing of most where clauses, and for use of this meta key as the fieldName parameter of a values call.

Value Max

Value max is a parameter that can have a very significant impact on the accuracy and performance of a Value-level index.

As a Decoder parses packets or logs, it is allowed to create meta of any type with any value. Usually, these meta items are created from data copied directly out of the packet or log. Therefore, anyone can create unique meta values in response to nearly any event.

The performance of the index is directly dependent on the number of unique values it has found for each meta key. As the number of unique values increases, the rate at which new meta is indexed can decrease, and the speed with which queries are completed decreases. Since any person can influence the creation of unique meta values, it is possible for any person to affect the performance of the index.

The value max parameter limits the number of unique values that can enter the index. Therefore, a malicious user cannot flood the system with a large number of unique values in an attempt to make the NetWitness system not work.

It is important to set a value max on any meta key that may have its value influenced directly by incoming packets or logs.

The value max applies only to values added since the last index save operation.

The limit for how high value max can be set varies from version to version and on the amount of RAM available to the NetWitness NextGen service. As of 10.3, the recommended ceiling for value max is 5,000,000 for any meta key. If there are a lot of custom indices, then the value max may have to be lower.

maxLength

The max length parameter is used exclusively on the word meta type. The meaning of the maxLength parameter depends on whether the index is storing N-grams, as indicated by the ngrams parameter. The default and recommended value for maxLength is 5.

Max Length without N-Grams

If N-Gram support is turned off, then the maxLength parameter indicates that search terms need to be truncated so that they will match truncated values in the index and meta database. If this is the case, the maxLengthmust be less than or equal to the corresponding setting for /decoder/parsers/config/token.max.length on the Log Decoder service that is generating word token metas. The index will use the maxLength to properly interpret search terms fed into the msearch SDK function.

Max Length with N-Grams

If N-Gram support is turned on, by setting ngrams="Edge" or ngrams="All" , then the maxLength parameter controls the maximum length of N-Grams extracted from the meta item. In this scenario, the maxLength does not have to match the length of word meta items generated on the Log Decoder.

minLength

The minimum length parameter is used exclusively on the word meta type. It only has an effect when N-grams are generated. It indicates the smallest length N-gram that will be extracted from the word meta items. The default and recommended minimum N-Gram length is 3, which means that searches against the word index must have at least 3 characters.

ngrams

The ngrams parameter is used exclusively on the word meta type. N-gram indexes extract information that allow for fast lookup of searches that only match part of the word. For example they allow for finding 'ball' inside the word 'basketball'. If set to the value of all , then the index will create entries for all N-grams within the word meta values. The minimum value of N is specified by minLength , and the maximum value of N is specified by maxLength .

The ngrams parameter also supports the value edge , which indicates the index will only store N-grams that appear at the beginning of a word. Edge N-grams are useful for type-ahead search matching, and take less space than storing all N-grams. However they are not useful to locate matches inside the word or at the end of the word.

The ngrams parameter supports the `allvalue` value for the text format meta keys. It means that the index for a meta key will store `all` N-grams within the meta values and also `IndexValues` limited by ValueMax.

This index type enhances the search capability on overflowed index values due to Value Max limits. The N-grams index provides the ability to search any meta value and the Values index provides the ability to retrieve top N available values.

The `minLength` parameter specifies the minimum value of N, the `maxLength` parameter specifies the maximum value of N, and the `ValueMax` parameter specifies maximum unique values. The following are some guidelines to follow while using these parameters:

  • It is recommended to set minLength=3 and maxLength=3 for compact index storage of N-grams and also use ValueMax to limit value index storage. When compared to Text format keys indexed by IndexValues and ValueMax=0 (unlimited) this N-gram index configuration provides better search functionality with compact index storage and memory usage.

  • The `contains` operation in queries runs faster for meta keys indexed with this Ngram index type when compared to IndexValues.

  • As the index type uses both N-grams and IndexValues for the same meta key, it increases the index memory and the index storage usage for the meta key and eventually reduces index retention. Hence it is recommended to choose this index type `only` for desired meta keys to consider storage and index retention.

  • When you switch to this N-gram type and if the new behavior is required on the whole index, you must perform a re-index.

N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of 'all' or 'allvalue' N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.

Note: Note: When you switch from ngrams 'allvalue' or 'all' to IndexValues, then you may need to consider re-index as index slices created before the configuration change would be ngram indexes and the values call would return ngrams.

N-gram indexing has a major impact on the functionality of the text indexes. Using the N-gram settings of all N-grams with maxLength 3, a meta key index will consume approximately 2 times more space than if N-grams were not enabled.

In the default index configuration, only the word meta key has N-gram indexing enabled. This meta key is used to index text tokens extracted from unparsed logs on the Log Decoder.

The N-gram index mode supports a 'threshold' tunable parameter that controls the precision of the index. The threshold is used to merge similar index values together depending on how closely the set of indexed sessions matches. Values greater than 0 and less than or equal to 1.0 are accepted. A value of 1.0 means that the index will only merge values if they were found in the same set of sessions. Higher values mean that the index will merge fewer values, at the expense of requiring more time to create the index during aggregation. Lower values mean the index will merge more values together, at the expense of longer search execution time due to more database access. The threshold parameter does not affect search accuracy.

Numeric Bucketing

Indexes on meta formats that are unsigned integers, specifically UInt32 and UInt64, can make use of size bucketing to improve performance.

Size bucketing rounds down the size values in the index to their nearest traditional byte unit of information. Enabling this option on a numeric index reduces the number of unique values to track in the index, which improves aggregation and query performance.

The bucketing option is enabled by the boolean parameter bucket on the key element. bucket may have the value 0 , for off or 1 for on. The default is 0 .

Examples of bucket number values:

Raw Value Value Stored in Index Explanation
0 - 1,023 0 - 1,023 Values 0-1023 are stored unmodified
1,024 - 1,048,575 1 KB, 2 KB, 3 KB ... 1,023 KB Values under 1 MB are stored in 1 KB buckets
1,048,576 - 1,073,741,823 1 MB, 2 MB, 3 MB ... 1,023 MB Values under 1 GB are stored in 1 MB buckets
1,073,741,824 - 1,099,511,627,775 1 GB, 2 GB, 3 GB ... 1,023 GB Values under 1 TB are stored in 1 GB buckets

Key Value Aliases

Value aliases can be specified for keys. Value aliases are text representations that correspond to specific values for a key. These text representations may be easier to remember and more convenient to display. Aliases can be used in the rule/query language (see Queries) and are accessible via the SDK.

Value aliases are specified using the aliases and alias elements:

				
<key description="Service Type" format="UInt32" level="IndexValues" name="service" valueMax="75" defaultAction="Open">
	<aliases>
		<alias format="$alias" value="0">OTHER</alias>
		<alias format="$alias" value="20">FTPD</alias>
		<alias format="$alias" value="21">FTP</alias>
		<alias format="$alias" value="22">SSH</alias>
		<alias format="$alias" value="23">TELNET</alias>
		<alias format="$alias" value="25">SMTP</alias>
		⋮
	</aliases>
</key>				
			

Key Renaming

The index language supports the concept of key renaming. This feature is used to provide backwards compatibility for new key names to deprecate and replace old key names. A renaming is achieved by adding rename elements to the key. This has the effect of indicating the parent key renames the key in the rename element. For example, the key definition below defines a new key named port_src that renames the key tcp.srcport .

				
<key name="port_src" description="Source Port" format="UInt16" level="IndexValues">
	<rename name="tcp.srcport"/>
</key>				
			

The rename element indicates to the database that uses of the parent key, in this case port_src , will include both meta items with type port_src and meta items with type tcp.srcport. Thus, new meta items can be added to the database and queried using port_src , and such queries will return information that was previously stored in tcp.srcport as well.

The rename element accepts a single attribute, name , that refers to a previously defined key.

Keys referred by rename elements must have the same type as the parent key.

Keys referred by rename elements must have the same index level as the parent key.

If a key is redefined in a custom index file, and the redefined key contains rename elements, then those rename elements replace any previously defined rename elements.

Note: Usage of renamed meta key pairs in the select clause cannot be combined with fixed-size result paging for a query. For more information, see the Queries topic.

Entities

The index configuration is used to define entities. Entities provide a convenient way to work with several meta keys at the same time. An entity definition is an alias that groups together the results from other meta keys. You can use an entity definition anywhere you would use a normal meta name. The primary use for entities is to organize similar meta types into a single, easier to use, meta type. For example, the default NextGen database language includes distinct meta types for IP source and IP destination. You could define an entity that represents the combined set of all IP sources and destinations using an entity element:

				
<entity name="ip.all" description="any ip entity">
	<keyref name="ip.src"/>
	<keyref name="ip.dst"/>
</entity>				
			

The entity element accepts the following attributes:

Name Description
name (Required) The name of the entity
description (Optional) A description of the entity
defaultAction (Optional) Navigate view action for this entity: Open, Closed, Auto, Hidden

Entity definitions create new entries in the NextGen service language. Since they are returned in the SDK language call, they can be used by older client applications that are not directly aware of the concept of entities.

Each entity definition must contain one or more keyref elements. The keyref element only allows a single name attribute that must refer to a real meta key element defined somewhere else in the device's language. The keyref is also allowed to refer to meta types defined in the default language.

Meta entities can be utilized in application rules, but are not supported in network rules as meta available is too limited.

Entity Definition Rules

  • All the keys referenced by an entity must have the same data type
  • All the keys referenced by an entity must have the same index level
  • An entity name cannot conflict with any existing meta type
  • Keyrefs must refer to meta key names that are defined earlier in the index configuration

Entities in Brokers

Brokers will inherit entity definitions from up-stream devices, in the same way that meta key definitions are inherited. If the upstream devices attached to the broker do not all have the same set of entities defined, the Broker will log a warning. All upstream devices should have the same entity configuration. A broker operating with mismatched entity definitions may produce undefined behavior.