Recommended Configurations

This topic describes the recommended data privacy implementation for NetWitness and several additional use cases for managing exposure of privacy-sensitive data in NetWitness. Administrators can set up the NetWitness hosts and services to meet data privacy requirements for their environment. This section provides recommended configurations for data privacy and data retention.

Recommended Data Privacy Configuration

The recommended configuration to obtain the best analytical value with data obfuscation enabled is to define privacy-sensitive metadata and keep both original and obfuscated (hash) values of privacy-sensitive data on disk for Decoders, Log Decoders, Concentrators, and Brokers.

The assumption is that only a handful of metadata (approximately 10 meta keys) will be classified as protected and a FIPS 140-compliant algorithm for hashing will be used along with a salt to make reverse engineering the original value difficult. The recommended solution is SHA-256 with a salt length of at least 16 characters and up to 60 characters.

Note: By default, hash values are stored in binary format for faster response times and because it requires less storage space in the database when compared to saving them in string format. The recommended storage method is text/string.

Brokers and Investigate may have original and obfuscated data in cache due to data privacy officers using Investigate to confirm the original value to which the obfuscated value maps during investigations. Downstream services can also limit the use of the original sensitive values to in-memory processing so that data does not persist on disk in those downstream systems; this holds true for ESA and Malware Analysis.

The recommended solution to delete data when ready is the built-in and automatic data retention enforcement, which deletes data at a certain threshold. You can use this method for the following components: Decoder, Log Decoder, Log Collector, Archiver, Malware Analysis, NetWitness Respond, and Reporting Engine. You can manually configure Event Stream Analysis to support similar automatic data retention enforcement.

To manage cache storage, the NetWitness Server clears cache related to investigations of events every 24 hours. The Broker can also be configured to execute a periodic removal of locally stored cache.

Options for Data Retention Configurations

NetWitness provides alternative controls that the administrator can apply to enforce stronger restrictions on privacy-sensitive data storage when data obfuscation is enabled.

Data Storage With Data Retention Options in Effect

The following table summarizes where data is stored in the default configuration with no data privacy as well as for each data retention alternative. A checkmark indicates that privacy-sensitive data is saved on the component; a blank indicates that no privacy-sensitive data is stored on the component.

Component Default Configuration Data Storage Options
Original Data Stored Original Data and Hash Stored (recommended) Only Hash Stored No Data Stored (all metadata is transient)
Ingestion
Decoder netwitness_checkmark-best_16x16.png netwitness_checkmark-best.png
Log Decoder netwitness_checkmark-best.png netwitness_checkmark-best.png
Meta Aggregation
Concentrator netwitness_checkmark-best.png netwitness_checkmark-best.png
Broker netwitness_checkmark-best.png (Cache only) netwitness_checkmark-best.png (Cache only)
Real-Time Analysis
Investigate netwitness_checkmark-best.png netwitness_checkmark-best.png (Cache only)
Event Stream Analysis netwitness_checkmark-best.png
Malware Analysis netwitness_checkmark-best.png
Respond netwitness_checkmark-best.png
Reporting
Reporting Engine netwitness_checkmark-best.png netwitness_checkmark-best.png (Optional)
Long-Term Analytics
Archiver netwitness_checkmark-best.png (Optional) netwitness_checkmark-best.png (Optional)
Warehouse netwitness_checkmark-best.png (Optional) netwitness_checkmark-best.png (Optional)
Content
Live n/a n/a n/a n/a
Fraud Analysis
RSA Fraud and Risk Intelligence Suite n/a n/a n/a n/a
End Point Protection

NetWitness Endpoint

n/a n/a n/a n/a

Notes:
Cache Only means that sensitive data is in the Broker or NetWitness Server cache. Configure Data Retention provides details about automated and manual clearing of cache.
Optional means that sensitive data storage does occur, but can be limited by optional configurations. For example, to limit where sensitive data is stored, do not enable DPO access for Reporting and do not aggregate original protected data into the Archiver.

Option 1: No Original Data Saved to Disk, Only Hash Stored

Administrators can eliminate the persistence of sensitive data to disk and store only an obfuscated value if the risk of exposure is too great. In this scenario, metadata generated during parsing on the Decoders and Log Decoders is used only in memory and not written to disk. Administrators can configure individual meta keys on a Decoder or Log Decoder as transient to ensure that sensitive metadata is not written to disk. Downstream services do not see original values and must use obfuscated values to conduct investigation and analytics.

To configure this data privacy scheme, data obfuscation must be enabled with hash values configured. You can configure individual meta keys on a Decoder or Log Decoder as transient to ensure that original values are not written to disk.

  • Original values identified as sensitive are extracted from the raw data during parsing on the Decoder and Log Decoder and are accessible to the system during parsing (parsers, rules, feeds).
  • The Decoder does not save the original values for meta keys identified as sensitive, storing only the hash of original values along with other non-sensitive metadata related to the event.

A side effect of these options is some loss in analytical capability, but you can configure these to suit the needs of your environment.

  • By configuring all sensitive data as Transient, sensitive values are not persisted to disk, and the analytic capabilities using the original value are available at parse time only (parsers, rules, feeds).
  • Event stream analysis (ESA) and Malware Analysis systems must rely only on the obfuscated meta values when doing their correlation and scoring respectively.
  • Reporting Engine is limited to pulling reports using the non-sensitive and obfuscated values.
  • The data privacy officer cannot view the original value, but can use the configured hash and salt to determine if an obfuscated value represents a specific known original value.

Option 2: No Original or Obfuscated Values Stored: Not Recommended

Administrators can eliminate the persistence of the original value to disk entirely if the risk of exposure is too great. As in Option 1, in this scenario, metadata generated during parsing on the Decoders and Log Decoders is used only in memory and not written to disk. Administrators can configure individual meta keys on a Decoder or Log Decoder as transient to ensure that sensitive metadata is not written to disk. Downstream services do not see original values and have no obfuscated values to conduct investigation and analytics.

To configure this data privacy scheme, configure individual meta keys on a Decoder or Log Decoder as transient to ensure that original values are not written to disk.

  • Original values identified as sensitive are extracted from the raw data during parsing on the Decoder and Log Decoder and are accessible to the system during parsing (parsers, rules, feeds).
  • The Decoder does not save not save the original values for meta keys identified as sensitive, storing only non-sensitive metadata related to the event.

A side effect of these options is significant loss in analytical capability, but you can configure these to suit the needs of your environment.

  • By configuring all sensitive data as Transient, sensitive values are not persisted to disk, and the analytic capabilities using the original value are available at parse time only (parsers, rules, feeds). See Configure Data Retention.
  • All downstream components have no visibility in the original values, obfuscated or otherwise.
  • The data privacy officer has no visibility into the original value obfuscated or otherwise.

Optional Data Overwriting Options

Several options for overwriting data are available, and you should thoroughly understand each one before implementing data overwriting.

Option 1: Limit Disk Space for Continuous Overwriting of Older Data

If the desired data retention period to store the data, and therefore the amount of storage required for that data, is known, the size of the underlying hardware or the partition can be limited to that size. By reducing the hard drive storage or the partition size, the amount of free space available that has to be filled before new data overwrites it would also be limited. The newly ingested data continually overwrites the older data. Either solution must be done at deployment time to be effective.

Side effects of this option are:

  • The removal of some disks will limit the number of resources available to distribute the I/O, causing some degradation in performance.
  • The smaller partition size may cause some degradation in performance, but would alleviate some of the performance impact of removing disks.

Option 2: Use Tiered Storage to Overwrite Data on a Scheduled Basis

If overwriting of data is required on a scheduled automatic basis, you can configure the Decoders and Concentrators to use tiered storage. The tiered storage configuration provides a mechanism for invoking a script after a database file has been removed from the application but prior to its removal from the file system. If necessary, instead of moving the file to the second tier, or cold storage, (the intended function in a tiered storage use case), the script can use a utility like the CentOS shred utility to overwrite the file. This tool is less effective when the database is stored in a journaling file system like XFS, in which the Core database resides, and on a RAID logical drive like the ones with which the Core hosts connect.

Most other NetWitness components do not have this option; their data is stored in a database that does not support the tiered storage mechanism. The only other component that could use this overwrite method is the Reporting Engine since it saves reports and alerts as individual files. However, the Reporting Engine charts are stored in a database so they would be immune to this technique.

Option 3: Purge Data Using String and Pattern Redaction Option

Data purging provides a mechanism to strategically overwrite a specific subset of data from the system in case any sensitive data has been persisted either on purpose or by accident. The NetWitness wipe utility allows for unique patterns to be written over the data in the meta and packet databases for Core services, which may contain RAW packets or logs for existing sessions, based on a session identifier. All Core components have the capability to overwrite a subset of data that has been found by executing a query string, including regex patterns. The session identifiers resulting from the query are fed into the NetWitness wipe utility.

Note: This option is not available if the data in the Core database has been compressed (as typically done in Archiver deployments).

In most NetWitness components the database in use does not provide a built-in redaction or secure deletion mechanism. The Malware Analysis component can overwrite the data object in the database with the value private instead of deleting it during the data retention management process, but this is not meant to be a secure deletion mechanism.

Caution: Using this method on a large number of sessions has two drawbacks: it can be time-consuming and impact performance.

See Purge Data Using String and Pattern Redaction Option for procedure.

Limitations to Data Overwriting

There are limitations to the overwriting techniques described as Option 2 and 3. To perform the overwrite of the data in the disk sectors, the above options for overwriting and the overwrite command line tool provided as an alternative method (shred, a function of CentOS) make assumptions about the disk layout. NetWitness hosts use SSD drives and RAID configurations for performance and reliability reasons, and these inhibit the functionality of the overwrite techniques. If overwrite techniques alter SSD drives and RAID configurations in an attempt to increase security, there will inevitably be an associated performance cost reflected in ingest rates, query speeds, and potentially other areas. The command line tools available for overwrite are recommended only for special use cases when it is necessary to redact specific data. The tools are not for use in a real-time continuous method because of the potential performance cost that will be incurred.