Log Decoder Service Configuration Parameters

This topic lists and describes the available configuration parameters for Log Decoder configuration settings.

Log Decoder Setting Field Description
Database

/database/config refer to "Database Configuration Nodes" in the NetWitness Core Database Tuning Guide.

Decoder

/decoder/config refer to Decoder Configuration Parameters.

Index

/index/config refer to "Index Configuration Nodes" in the NetWitness Core Database Tuning Guide.

Logs

/logs/config refer to Core Service Logging Configuration Parameters.

REST

/rest/config refer to REST Interface Configuration Parameters.

SDK

/sdk/config refer to "SDK Configuration Nodes" in the NetWitness Core Database Tuning Guideand NetWitness Platform Core Service system.roles Modes.

System

/sys/config refer to Core Service System Configuration Parameters.

Log Tokenizer Configuration Settings

The Log Decoder has a set of configuration items that control how the automatic log tokenizer creates meta items from unparsed logs. The log tokenizer is implemented as a set of built-in parsers that each scan for a subset of recognizable tokens. The functionality of each of these native parsers is shown in the table below. These word items form a full-text index when they are fed to the indexing engine on the Concentrator and Archiver. By manipulating the parsers.disabled configuration entry, you can control which Log Tokenizers are enabled.

Parser Name Description Configuration Parameters
Log Tokens Scans for runs of consecutive characters to produce 'word' meta items. token.device.types, token.char.classes, token.max.length, token.min.length, token.unicode
IPSCAN Scans for text that appears to be an IPv4 address to produce ip.addr meta items. token.device.types
IPV6SCAN Scans for text that appears to be an IPv6 address to produce ipv6 meta items. token.device.types
URLSCAN Scans for text that appears to be a URL to produce alias.host, filename, username, and password meta items. token.device.types
DOMAINSCAN Scans for text that appears to be a domain name to produce alias.host, tld, cctld, and sld meta items. token.device.types
EMAILSCAN Scans for text that appears to be an email address to produce email and username meta items. token.device.types
SYSLOGTIMESTAMPSCAN Scans for text that appears to be syslog-format timestamps. Syslog is missing the year and time zone. When such text is located, it is normalized into UTC time to create event.time meta items. token.device.types
INTERNETTIMESTAMPSCAN Scans for text that appears to be RFC 3339-format timestamps to create event.time meta items. token.device.types

Log Tokenizer Configuration Parameters.

Log Decoder Parser Setting Field Description
token.device.types

The set of device types that will be scanned for raw text tokens. By default, this is set to unknown, which means only logs that were not parsed will be scanned for raw text. You can add additional log types here to enrich parsed logs with text token information.

If this field is empty, then log tokenization is disabled.

token.char.classes

This field controls the type of tokens that are generated. It can be any combination of the values alpha, digit, space, and punct. The default value is alpha.

  • alpha: Tokens may contain alphabetic characters
  • digit: Tokens may contain numbers
  • space: Tokens may contain spaces and tabs
  • punct: Tokens may contain punctuation marks
token.max.length

This field puts a limit on the length of the tokens. The default value is 5 characters. The maximum length setting allows the Log Decoder to limit the space needed to store the word metadata.
Using longer tokens requires more meta database space, but may provide slightly faster raw text searches. Using shorter tokens causes the text query resolver to have to perform more reads from the raw logs during searches, but it has the effect of using much less space in the metadb and index.

token.min.length

This is the minimum length of a searchable text token. The minimum token length will correspond to the minimum number of characters a user may type into the search box in order to locate results. The recommended value is the default, 3.

token.unicode

This boolean setting controls whether unicode classification rules are applied when classifying characters according to the token.char.classes setting.
When this is set to true, each log is treated as a sequence of UTF-8 encoded code points and then classification is performed after the UTF-8 decoding is performed.
When this is set to false, each log is treated as ASCII characters and only ASCII character classification is done.
Unicode character classification requires more CPU resources on the Log Decoder. If you do not need non-English text indexing, you can disable this setting to reduce CPU utilization on the Log Decoder. The default is enabled.