Troubleshoot Upgrade Issues
This section describes the error messages displayed in the Hosts view when it encounters problems updating host versions and installing services on hosts in the Hosts view. If you cannot resolve an upgrade or installation issue using the following troubleshooting solutions, contact NetWitness Customer Support.
Troubleshooting instructions for the following errors that may occur during the upgrade are described in this section.
- AlmaLinux OS Troubleshooting Information
-
Migration of Lockbox to SecureStore failure on Admin Server, Reporting Engine, and SMS
- deploy_admin Password Expired Error
- Downloading Error
- Error Deploying Version <version-number> Missing Update Packages
- External Repo Update Error
- Host Update Failed Error
- Missing Update Packages Error
- Patch Update to Non-NW Server Error
- Reboot Host After Update from Command Line Error
Troubleshooting instructions are also provided for errors for the following hosts and services that may occur during or after an upgrade.
- Log Collector Service
- NW Server
- Orchestration
- Reporting Engine
- Event Stream Analysis
- Legacy Windows Log Collector
Problem | Unable to boot the appliance after upgrading |
Wokaround |
|
AlmaLinux OS Troubleshooting Information
For better understanding, AlmaLinux OS Upgrade can be divided into 4 parts:
-
Running the precheck utility to ensure the health of the system and detect any upgrade issues. This can be done any time before the upgrade using the standalone precheck-tool rpm. (required only on NW Server)
Logs are recorded in this path - /var/log/netwitness/precheck-tool/checklist.log
-
Initialization or init phase (happens only on NW Server)
For any issues during init phase, check these logs.
-
salt minion logs - /var/log/salt/minion
-
deployment-upgrade logs - /var/log/netwitness/deployment-upgrade/chef-solo.log
Note: Please perform the init only when you plan to do the actual upgrade. It is not recommended to perform an init without upgrading the system in the same change window.
-
-
OS Upgrade from CentOS to AlmaLinux
As the first step of OS Upgrade, salt gets upgraded. You can execute the below command to see that salt is upgraded to version 3006:
cat /var/log/yum.log | grep salt
You can view similar to the below update where xxx represents the current datetime stamp:
xxx Updated: salt-master-3006.2-0.x86_64
xxx Updated: salt-api-3006.2-0.x86_64
xxx Updated: salt-minion-3006.2-0.x86_64
For any issues, with salt-upgrade, please check:
-
/var/log/netwitness/node-infra-server/node-infra-server.log
-
/var/log/salt/master
-
/var/log/salt/minion
Once salt has been upgraded, the leapp process will begin.
The logs can be viewed in /var/log/salt/minion:
xxx [salt.loaded.ext.module.nw_platform:445 ][INFO ][139407] [1/5] Searching for leapp config for version: 12.5.0.0
xxx [salt.loaded.ext.module.nw_platform:453 ][INFO ][139407] [2/5] Retrieving leapp config for version: 12.5.1.0
xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'config/12.5.1.0-pre-upgrade.repo'
xxx [salt.loaded.ext.module.nw_platform:467 ][INFO ][139407] [3/5] Running pre-requisites required to perform leapp upgrade
xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/netwitnessmigrate/actor.py'
xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/netwitnessmigrate/libraries/netwitnessmigrate.py'
xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/netwitnessmigrate.py'
xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/addupgradebootentry.py'
xxx [salt.loaded.ext.module.nw_platform:500 ][INFO ][139407] [4/5] Running leapp pre-upgrade
xxx [salt.loaded.ext.module.nw_platform:503 ][INFO ][139407] [5/5] Running leapp upgrade
For any issues encountered during OS Upgrade, the logs below will be helpful in troubleshooting.
-
/var/log/salt/minion
-
If Preupgrade fails - /var/log/leapp/leapp-preupgrade.log
-
If Leapp upgrade fails - /var/log/leapp/leapp-upgrade.log
If leapp fails, then /var/log/leapp/leapp-report.txt will provide you with details about inhibitors.
A few minutes after this log “Running leapp upgrade” in /var/log/salt/minion, the system will reboot and may take 20 to 30 minutes to return.
Once it is up, you can confirm the OS using the command cat /etc/almalinux-release. If it does not show Alma Linux release, please call Customer Support before taking any action.
Also, if you have triggered the upgrade through UI and see the status "Performing OS Migration" on any NodeX for more than an hour, please check the leapp logs and reach out to Customer Support.
-
-
NW Software upgrade to 12.5.1.0
Once the OS Migration has completed, The NW software upgrade begins and takes up to 30 mins before the UI is functional.
You can see these logs in /var/log/salt/minion when NW software upgrade starts:
xxx [salt.loaded.ext.module.nw_platform:276 ][INFO ][14035] Preparing node for upgrade to 12.5.1.0
xxx [salt.loaded.ext.module.nw_platform:280 ][INFO ][14035] [1/2] Searching for yum config for version: 12.5.1.0
xxx [salt.loaded.ext.module.nw_platform:287 ][INFO ][14035] [2/2] Retrieving yum config for version: 12.5.1.0
xxx [salt.fileclient :1333][INFO ][14035] Fetching file from saltenv 'base', ** done ** 'config/12.5.1.0-pre-upgrade.repo'
xxx [salt.loaded.ext.module.nw_platform:300][INFO ][14035] Upgrading chef package
xxx [salt.loaded.ext.module.nw_platform:300][INFO ][14035] Upgrading rsa-nw-config-management package
You can also refer to config management logs at /var/log/netwitness/config-management/chef-solo.log or UI logs /var/netwitness/uax/logs/sa.log
Migration of Lockbox to SecureStore failure on Admin Server, Reporting Engine, and SMS
For Admin Server or Jetty
Problem |
The migration of LockBox to SecureStore has failed in the Admin Server. |
Cause | Due to incomplete migration of SSV values. |
Solution |
If you are unable to access the admin server, perform the following steps to resolve the issue:
|
For Reporting Engine
Problem |
The migration of LockBox to SecureStore has failed in the Reporting Engine. |
Cause | Due to incomplete SSV values migration. |
Solution |
If you are unable to access the reporting engine, perform the following steps to resolve the issue:
|
For SMS
Problem |
The migration of LockBox to SecureStore has failed in the SMS. |
Cause | Due to incomplete SSV values migration. |
Solution |
If you are unable to access the SMS service, perform the following steps to resolve the issue:
|
deploy_admin User Password Has Expired Error
Error Message |
|
Cause | The deploy_admin user password has expired. |
Solution |
Reset your deploy_admin password password. Do the following.
|
Downloading Error
Error Message |
|
Problem | When you select an update version and click Update >Update Host, the download starts but fails to complete. |
Cause | Version download files can be large and take a long time to download. If there are communication issues during the download it will fail. |
Solution |
|
Error Deploying Version <version-number> Missing Update Packages
Error Message |
|
Problem |
Error deploying version <version-number> is displayed in the Initialize Update Package for NetWitness Platform dialog after you click on Initialize Update if the update package is corrupted. |
Solution |
|
External Repo Update Error
Error Message |
You will receive an error similar to the following error while trying to update to a new version from the : |
Cause | Incorrect path specified. |
Solution |
Make sure that:
|
Host Update Failed Error
Error Message |
|
Problem | When you select an update version and click Update > Update Host, the download process is successful, but the update process fails. |
Solution |
|
Error Message |
|
Problem | When you select an update version and click Update > Check for Updates, the Unauthorized error message is displayed. As a result, the connection to the live service fails. |
Solution |
|
Missing Update Packages Error
Error Message |
Initialize Update for Version xx.x.x.x Download Packages from NetWitness Link |
Problem | Missing the following update package(s) is displayed in the Initialize Update Package for NetWitness Platform dialog when you are updating a host from the Hosts view offline and there are packages missing in the staging folder. |
Solution |
|
Patch Update to Non-NW Server Error
Error Message |
The /var/log/netwitness/orchestration-server/orchestration-server.log has an error similar to the following error: |
Problem | After you update the NW Server host to a version, you must update all non-NW Server hosts to the same version. For example, if you update the NW Server from 12.2.0.0 to 12.5.1.0 or later, the only update path for the non-NW Server hosts is the same version (that is, 12.5.1.0). If you try to update any non-NW Server host to a different version (for example, from 12.2.0.0 to an 12.3.x.x) you will get this error. |
Solution |
Do any of the following:
|
Reboot Host After Update from Command Line Error
Error Message |
You will receive a message in the User Interface to reboot the host after you update and reboot the host offline. |
Cause | The above error occurs when you use CLI to reboot the host. You must use the User Interface to reboot the host. |
Solution |
Reboot the host in the Host View in the User Interface. |
Log Collector Service (nwlogcollector)
Log Collector installation logs posted to /var/log/install/nwlogcollector_install.log on the host running the nwlogcollector service.
Error Message | <timestamp>.NwLogCollector_PostInstall: Lockbox Status : Failed to open lockbox: The lockbox stable value threshold was not met because the system fingerprint has changed. To reset the system fingerprint, open the lockbox using the passphrase. |
Cause | The Log Collector Lockbox failed to open after the update. |
Solution | Log in to NetWitness and reset the system fingerprint by resetting the stable system value password for the Lockbox as described in the Reset the Stable System Value topic under Configure Lockbox Security Settings topic in the Log Collection Configuration Guide. |
Error Message | <timestamp> NwLogCollector_PostInstall: Lockbox Status : Not Found |
Cause | The Log Collector Lockbox is not configured after the update. |
Solution | If you use a Log Collector Lockbox, log in to NetWitness and configure the Lockbox as described in the Configure Lockbox Security Settings topic in the Log Collection Configuration Guide. |
Error Message | <timestamp>: NwLogCollector_PostInstall: Lockbox Status : Lockbox maintenance required: The lockbox stable value threshold requires resetting. To reset the system fingerprint, select Reset Stable System Value on the settings page of the Log Collector. |
Cause | You need to reset the stable value threshold field for the Log Collector Lockbox. |
Solution | Log in to NetWitness and reset the stable system value password for the Lockbox as described in the Reset the Stable System Value topic under Configure Lockbox Security Settings topic in the Log Collection Configuration Guide. |
Error Message |
Decoder tries to start capture events but fails. |
Cause |
The decoder capture config will not be valid for customers using PF_RING capture (CentOS) and directly upgrading to 12.5.1.0 (AlmaLinux). First, they must migrate PF_RING devices to DPDK and then upgrade. |
Solution |
To resolve the issue: Refer to Migrate PF_RING Devices to DPDK for migration instructions. |
NW Server
These logs are posted to /var/netwitness/uax/logs/sa.log on the NW Server Host.
Problem |
After upgrade, you will notice one of the following:
|
Cause | NW Server Global Audit setup migration failed to migrate from 12.2.x.x or 12.3.x.x. to 12.5.1.0. |
Solution |
|
Orchestration
The orchestration server logs are posted to /var/log/netwitness/orchestration-server/orchestration-server.log on the NW Server Host.
Problem |
You will see the following message in the orchestration-server.log. |
Cause | Salt minion may have been upgraded and never restarted on failed non-NW Server host |
Solution |
|
Problem |
When you install and orchestrate a fresh 12.5.1.0 core Node-X to the Admin server (Node-0) upgraded from 12.0 or older versions to 12.5.1.0, the core services such as Concentrator, Log Decoder, Log Collector, Archiver, Decoder, Appliance, Workbench, Warehouse Connector, and Broker appear inactive under the Services column in the Admin > Hosts view. As a result, you cannot access the core services in the UI. This is not applicable if you are orchestrating a fresh 12.5.1.0 core Node-X to the fresh-Installed 12.5.1.0 Admin Server (not upgraded from 12.0 or older versions to 12.5.1.0). |
Cause | The 12.5.1.0 core Node-X uses a dedicated SA-server certificate instead of the common Node-0 node certificate under its trustpeers if it is orchestrated directly to an upgraded 12.5.1.0 Admin Server host. |
Solution |
|
Reporting Engine Service
Reporting Engine Update logs are posted to to/var/log/re_install.log file on the host running the Reporting Engine service.
Error Message | <timestamp> : Available free space in /var/netwitness/re-server/rsa/soc/reporting-engine [ ><existing-GB ] is less than the required space [ <required-GB> ] |
Cause | Update of the Reporting Engine failed because you do not have enough disk space. |
Solution | Free up the disk space to accommodate the required space shown in the log message. See the Add Additional Space for Large Reports topic in the Reporting Engine Configuration Guide for instructions on how to free up disk space. |
Event Stream Analysis
Problem | After upgrading to version 12.5.1.0 or later, the ESA correlation server does not aggregate events from the configured data sources. |
Error Message | Invalid username or password at com.rsa.netwitness.streams.base.RecordSourceSubscription.run(RecordSourceSubscription.java:173) |
Solution |
To resolve the issue In the NetWitness user interface,
Note: Do the above procedure for all the configured data sources.
|
Legacy Windows Log Collector
Problem |
Legacy Windows Log Collector appears as inactive when the stack is upgraded to 12.5.1.0. |
Cause | Certificate update in the Admin Server node. |
Solution |
Refer Legacy Windows Log Collector section in the Perform Post Upgrade Tasks. |
ESA Troubleshooting Information
ESA Rules are Not Creating Alerts
If you are not seeing any alerts, check the status of the ESA rule deployments.
- Go to (CONFIGURE) > Policies > Content > Event Stream Analysis > ESA Deployments.
The ESA Deployment panel is displayed. - Select required deployment from the list and click Deployment Stats tab.
- Deployment Stats page is displayed, which shows the status of your ESA services and deployments.
- For each ESA rule deployment:
- In the Engine Stats section, look at the Events Offered and the Offered Rate. They confirm that the data is being aggregated and analyzed properly. If you see 0 for Events Offered, nothing is coming in for the deployment.
- In the Rule Stats section, look at the Rules Enabled and Rules Disabled. If there are any disabled rules, look in the Deployed Rule Stats section below to view the details of the disabled rules. Disabled rules show a red circle. Enabled rules show a green circle.
- If you notice any disabled rules that should be enabled:
- Go to (Configure) > ESA Rules > Rules tab and redeploy the ESA rule deployments that contain disabled rules.
- Go back to the Services tab and check to see if the rules are still disabled. If the rules are still disabled, check the ESA Correlation service log files, which are located at /var/log/netwitness/correlation-server/correlation-server.log.
Note: To avoid unnecessary processing overhead, the Ignore Case option has been removed from the ESA Rule Builder - Build a Statement dialog for meta keys that do not contain text data values. During the upgrade to latest version, NetWitness Platform does not modify existing rules for the Ignore Case option. If an existing Rule Builder rule has the Ignore Case option selected for a meta key that no longer has the option available, an error occurs if you try to edit the statement and try to save it again without clearing the checkbox.
Example ESA Correlation Server Warning Message for Missing Meta Keys
If you see a warning message in the ESA Correlation server error logs that means there is a difference between the default-multi-valued parameter and multi-valued parameter meta key values, the new Endpoint, UEBA, and Live content rules will not work. Completing the Update the Multi-Valued and Single-Valued Parameter Meta Keys for the latest Endpoint, UEBA, and RSA Live Content Rules procedure in the ESA Configuration Guide should fix the issue.
Multi-Valued Warning Message Example
2019-08-23 08:55:07,602 [ deployment-0] WARN Stream|[alert, alert_id, browserprint, cert_thumbprint, checksum, checksum_all, checksum_dst, checksum_src, client_all, content, context, context_all, context_dst, context_src, dir_path, dir_path_dst, dir_path_src, directory, directory_all, directory_dst, directory_src, email_dst, email_src, feed_category, feed_desc, feed_name, file_cat, file_cat_dst, file_cat_src, filename_dst, filename_src, filter, function, host_all, host_dst, host_orig, host_src, host_state, ip_orig, ipv6_orig, OS, param, param_dst, param_src, registry_key, registry_value, risk, risk_info, risk_suspicious, risk_warning, threat_category, threat_desc, threat_source, user_agent] are still MISSING from multi-valued
Single Value Warning Message Example
2019-08-23 08:55:07,602 [ deployment-0] WARN Stream|[accesses, context_target, file_attributes, logon_type_desc, packets] are still MISSING from single-valued