Troubleshoot Upgrade Issues

This section describes the error messages displayed in the Hosts view when it encounters problems updating host versions and installing services on hosts in the Hosts view. If you cannot resolve an upgrade or installation issue using the following troubleshooting solutions, contact NetWitness Customer Support.

Troubleshooting instructions for the following errors that may occur during the upgrade are described in this section.

Troubleshooting instructions are also provided for errors for the following hosts and services that may occur during or after an upgrade.

 

Problem Unable to boot the appliance after upgrading
Wokaround
  1. Manually modify the GRUB boot line to FIPS=0 to get it to boot.

  2. From here, disable FIPS using the following command:

    manage-stig-controls --disable-control-groups 3 --host-all

  3. Verify the line FIPS=1 is removed from /boot/grub2/grub.cfg

    • If not, run the following command:

      grub2-mkconfig -o /boot/grub2/grub.cfg

  4. Reboot.

  5. Run the following command to enable FIPS:

    manage-stig-controls --enable-control-groups 3 --host-all

  6. Reboot again.

AlmaLinux OS Troubleshooting Information

For better understanding, AlmaLinux OS Upgrade can be divided into 4 parts:

  1. Running the precheck utility to ensure the health of the system and detect any upgrade issues. This can be done any time before the upgrade using the standalone precheck-tool rpm. (required only on NW Server)

    Logs are recorded in this path - /var/log/netwitness/precheck-tool/checklist.log

  1. Initialization or init phase (happens only on NW Server)

    For any issues during init phase, check these logs.

    • salt minion logs - /var/log/salt/minion

    • deployment-upgrade logs - /var/log/netwitness/deployment-upgrade/chef-solo.log

    Note: Please perform the init only when you plan to do the actual upgrade. It is not recommended to perform an init without upgrading the system in the same change window.

  1. OS Upgrade from CentOS to AlmaLinux

    As the first step of OS Upgrade, salt gets upgraded. You can execute the below command to see that salt is upgraded to version 3006:

    cat /var/log/yum.log | grep salt

    You can view similar to the below update where xxx represents the current datetime stamp:

    xxx Updated: salt-master-3006.2-0.x86_64

    xxx Updated: salt-api-3006.2-0.x86_64

    xxx Updated: salt-minion-3006.2-0.x86_64

    For any issues, with salt-upgrade, please check:

    • /var/log/netwitness/node-infra-server/node-infra-server.log

    • /var/log/salt/master

    • /var/log/salt/minion

    Once salt has been upgraded, the leapp process will begin.

    The logs can be viewed in /var/log/salt/minion:

    xxx [salt.loaded.ext.module.nw_platform:445 ][INFO ][139407] [1/5] Searching for leapp config for version: 12.5.0.0

    xxx [salt.loaded.ext.module.nw_platform:453 ][INFO ][139407] [2/5] Retrieving leapp config for version: 12.5.0.0

    xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'config/12.5.0.0-pre-upgrade.repo'

    xxx [salt.loaded.ext.module.nw_platform:467 ][INFO ][139407] [3/5] Running pre-requisites required to perform leapp upgrade

    xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/netwitnessmigrate/actor.py'

    xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/netwitnessmigrate/libraries/netwitnessmigrate.py'

    xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/netwitnessmigrate.py'

    xxx [salt.fileclient :1333][INFO ][139407] Fetching file from saltenv 'base', ** done ** 'leapp/addupgradebootentry.py'

    xxx [salt.loaded.ext.module.nw_platform:500 ][INFO ][139407] [4/5] Running leapp pre-upgrade

    xxx [salt.loaded.ext.module.nw_platform:503 ][INFO ][139407] [5/5] Running leapp upgrade

    For any issues encountered during OS Upgrade, the logs below will be helpful in troubleshooting.

    • /var/log/salt/minion

    • If Preupgrade fails - /var/log/leapp/leapp-preupgrade.log

    • If Leapp upgrade fails - /var/log/leapp/leapp-upgrade.log

    If leapp fails, then /var/log/leapp/leapp-report.txt will provide you with details about inhibitors.

    A few minutes after this log “Running leapp upgrade” in /var/log/salt/minion, the system will reboot and may take 20 to 30 minutes to return.

    Once it is up, you can confirm the OS using the command cat /etc/almalinux-release. If it does not show Alma Linux release, please call Customer Support before taking any action.

    Also, if you have triggered the upgrade through UI and see the status "Performing OS Migration" on any NodeX for more than an hour, please check the leapp logs and reach out to Customer Support.

  1. NW Software upgrade to 12.5.0.0

    Once the OS Migration has completed, The NW software upgrade begins and takes up to 30 mins before the UI is functional.

    You can see these logs in /var/log/salt/minion when NW software upgrade starts:

    xxx [salt.loaded.ext.module.nw_platform:276 ][INFO ][14035] Preparing node for upgrade to 12.5.0.0

    xxx [salt.loaded.ext.module.nw_platform:280 ][INFO ][14035] [1/2] Searching for yum config for version: 12.5.0.0

    xxx [salt.loaded.ext.module.nw_platform:287 ][INFO ][14035] [2/2] Retrieving yum config for version: 12.5.0.0

    xxx [salt.fileclient :1333][INFO ][14035] Fetching file from saltenv 'base', ** done ** 'config/12.5.0.0-pre-upgrade.repo'

    xxx [salt.loaded.ext.module.nw_platform:300][INFO ][14035] Upgrading chef package

    xxx [salt.loaded.ext.module.nw_platform:300][INFO ][14035] Upgrading rsa-nw-config-management package

    You can also refer to config management logs at /var/log/netwitness/config-management/chef-solo.log or UI logs /var/netwitness/uax/logs/sa.log

 

Migration of Lockbox to SecureStore failure on Admin Server, Reporting Engine, and SMS

For Admin Server or Jetty

Problem

The migration of LockBox to SecureStore has failed in the Admin Server.

Cause Due to incomplete migration of SSV values.
Solution

If you are unable to access the admin server, perform the following steps to resolve the issue:

  1. SSH to the Admin Server / Node Zero.

  2. Stop the Jetty service using the following command:

    systemctl stop jetty

  3. Move the lockbox.ss and lockbox.ss.lock files from the following paths to a separate backup folder:

    • /var/netwitness/uax

    • /root/uaxbackup

  4. Start the Jetty service using the following command:

    systemctl start jetty

 

For Reporting Engine

Problem

The migration of LockBox to SecureStore has failed in the Reporting Engine.

Cause Due to incomplete SSV values migration.
Solution

If you are unable to access the reporting engine, perform the following steps to resolve the issue:

  1. SSH to the Admin Server / Node Zero.

  2. Stop the Reporting Engine service using the following command:

    systemctl stop rsasoc_re

  3. Move the lockbox.ss and lockbox.ss.lock files from the /var/netwitness/re-server/rsa/soc/reporting-engine path to a backup folder.

  4. Start the Reporting Engine service using the following command:

    systemctl start rsasoc_re

 

For SMS

Problem

The migration of LockBox to SecureStore has failed in the SMS.

Cause Due to incomplete SSV values migration.
Solution

If you are unable to access the SMS service, perform the following steps to resolve the issue:

  1. SSH to the Admin Server / Node Zero.

  2. Stop the SMS service using the following command:

    systemctl stop rsa-sms

  3. Move the lockbox.ss and lockbox.ss.lock files from the /root/rsa/home path to a backup folder.

  4. Start the SMS service using the following command:

    systemctl start rsa-sms

deploy_admin User Password Has Expired Error

Error Message

credential-expired.png

Cause The deploy_admin user password has expired.
Solution

Reset your deploy_admin password password. Do the following.

  1. On the NW Server host only, run the following command.
    nw-manage --update-deploy-admin-pw
    Please enter the new deploy_admin account password: <new-deploy-admin-password>
    Please confirm the new deploy_admin account password: <new-deploy-admin-password>
  2. Review the output of the nw-manage --update-deploy-admin-pw command to verify the deploy_admin password was successfully updated on all hosts. If an NW host is down or fails for any reason as displayed by the output of the nw-manage --update-deploy-admin-pw command, run nw-manage --sync-deploy-admin-pw --host-key <host-identifier> to synchronize the password between the NW Server and the host that failed once the communication failure is resolved.
  3. On the host that failed installation or orchestration, run the nwsetup-tui command and use the new deploy_admin password in response to the Deployment Password prompt.

Downloading Error

Error Message

 Download_Error.PNG

Problem When you select an update version and click Update >Update Host, the download starts but fails to complete.
Cause Version download files can be large and take a long time to download. If there are communication issues during the download it will fail.
Solution
  1. Try to update again.
  2. If it fails again with the same error, try to update using the offline methods as described in "Offline Method from Hosts View" or "Offline Method Using Command Line Interface" in the Upgrade Guide for NetWitness Platform. Go to the NetWitness All Versions Documents page and find NetWitness Platform guides to troubleshoot issues.

  3. If you are still not able to update, contact NetWitness Customer Support.

Error Deploying Version <version-number> Missing Update Packages

Error Message

Offline-UI-Update-ErrorDeploying version.PNG

Problem

Error deploying version <version-number> is displayed in the Initialize Update Package for NetWitness Platform dialog after you click on Initialize Update if the update package is corrupted.

Solution
  1. Click Close to close the dialog.

  2. Remove the version folder from staging folder.

  3. Make sure that the salt-master service is running.

  4. Recopy the update package zip file to the staging folder.
  5. In the Hosts view toolbar, select Check for Updates again.
    Chk4Upds.PNG

  6. Click Initialize Update.
  7. Click Update > Update Hosts from the toolbar.
  8. Click Begin Update from the Update Available dialog.
    After the host is updated, it prompts you to reboot the host.
  9. Click Reboot from the toolbar.

External Repo Update Error

Error Message

You will receive an error similar to the following error while trying to update to a new version from the :
.Repository 'nw-rsa-base': Error parsing config: Error parsing "baseurl = 'https://nw-node-zero/nwrpmrepo /<version-number>/RSA'": URL must be http, ftp, file or https not ""

Cause Incorrect path specified.
Solution

Make sure that:

  • the URL does exist on the NW Server host.
  • you used the correct path and remove any spaces from it.

Host Update Failed Error

Error Message


hstupdfailed.png

Problem When you select an update version and click Update > Update Host, the download process is successful, but the update process fails.
Solution
  1. Try to apply the version update to the host again.
    Often this is all you need to do.
  2. If you still cannot apply the new version update:
    Monitor the following logs on NW Server as it progresses (for example, run the tail -f command from the command line):
    /var/netwitness/uax/logs/sa.log
    /var/log/netwitness/orchestration-server/orchestration-server.log
    /var/log/netwitness/deployment-upgrade/chef-solo.log
    /var/log/netwitness/config-management/chef-solo.log

    /var/lib/netwitness/config-management/cache/chef-stacktrace.out
    The error appears in one or more of these logs.
  3. If you still cannot apply the update, gather the logs from step 2 above and contact NetWitness Customer Support.
Error Message


unauthorized_error_11.7.2.png

Problem When you select an update version and click UpdateCheck for Updates, the Unauthorized error message is displayed. As a result, the connection to the live service fails.
Solution
  1. Make sure the Live test connection passes.

  2. Update https://update.netwitness.com/RSA-netwitness in AdminIcon.png (Admin) > SystemUpdates.

  3. SSH to the Admin Server and backup /etc/default/jetty.

  4. Update the following entry at the end of the JAVA_OPTIONS in the /etc/default/jetty.

    JAVA_OPTIONS="${JAVA_OPTIONS} -Drsa.nw.legacy.web.server.system.update.repo.url=https://update.netwitness.com/RSA-netwitness/ -Drsa.nw.legacy.system.update.auth.url=https://update.netwitness.com/authenticate "

  5. Restart the jetty service by running the following command:

    service jetty restart

Missing Update Packages Error

Error Message

Initialize Update for Version xx.x.x.x
Missing the following update package(s)

Download Packages from NetWitness Link

Problem Missing the following update package(s) is displayed in the Initialize Update Package for NetWitness Platform dialog when you are updating a host from the Hosts view offline and there are packages missing in the staging folder.
Solution
  1. Click Download Packages from NetWitness Community in the Initialize Update Package for NetWitness Platform dialog.
    The NetWitness Community page that contains the update files for the selected version is displayed.

  2. Select the missing packages from the staging folder.
    The Initialize Update Package for NetWitness Platform dialog is displayed telling you that it is ready to initialize the update packages.

Patch Update to Non-NW Server Error

Error Message

The /var/log/netwitness/orchestration-server/orchestration-server.log has an error similar to the following error:
API|Failure /rsa/orchestration/task/update-config-management [counter=10 reason=IllegalArgumentException::Version '12.x.x.n' is not supported

Problem After you update the NW Server host to a version, you must update all non-NW Server hosts to the same version. For example, if you update the NW Server from 12.2.0.0 to 12.5.0.0 or later, the only update path for the non-NW Server hosts is the same version (that is, 12.5.0.0). If you try to update any non-NW Server host to a different version (for example, from 12.2.0.0 to an 12.3.x.x) you will get this error.
Solution

Do any of the following:

  • Update the non-NW Server host to 12.5.0.0 or later, or
  • Do not update the non-NW Server host (keep it at its current version)

Reboot Host After Update from Command Line Error

Error Message

You will receive a message in the User Interface to reboot the host after you update and reboot the host offline.
ASOC-50839.png

Cause The above error occurs when you use CLI to reboot the host. You must use the User Interface to reboot the host.
Solution

Reboot the host in the Host View in the User Interface.

Log Collector Service (nwlogcollector)

Log Collector installation logs posted to /var/log/install/nwlogcollector_install.log on the host running the nwlogcollector service.

Error Message <timestamp>.NwLogCollector_PostInstall: Lockbox Status : Failed to open lockbox: The lockbox stable value threshold was not met because the system fingerprint has changed. To reset the system fingerprint, open the lockbox using the passphrase.
Cause The Log Collector Lockbox failed to open after the update.
Solution Log in to NetWitness and reset the system fingerprint by resetting the stable system value password for the Lockbox as described in the Reset the Stable System Value topic under  Configure Lockbox Security Settings topic in the Log Collection Configuration Guide.

 

Error Message <timestamp> NwLogCollector_PostInstall: Lockbox Status : Not Found
Cause The Log Collector Lockbox is not configured after the update.
Solution If you use a Log Collector Lockbox, log in to NetWitness and configure the Lockbox  as described in the Configure Lockbox Security Settings topic in the Log Collection Configuration Guide.

 

Error Message <timestamp>: NwLogCollector_PostInstall: Lockbox Status : Lockbox maintenance required: The lockbox stable value threshold requires resetting. To reset the system fingerprint, select Reset Stable System Value on the settings page of the Log Collector.
Cause You need to reset the stable value threshold field for the Log Collector Lockbox.
Solution Log in to NetWitness and reset the stable system value password for the Lockbox  as described in the Reset the Stable System Value topic under  Configure Lockbox Security Settings topic in the Log Collection Configuration Guide.

 

Error Message

Decoder tries to start capture events but fails.
Image_3.png

Cause

The decoder capture config will not be valid for customers using PF_RING capture (CentOS) and directly upgrading to 12.5.0.0 (AlmaLinux). First, they must migrate PF_RING devices to DPDK and then upgrade.

Solution

To resolve the issue:

Refer to Migrate PF_RING Devices to DPDK for migration instructions.

NW Server

These logs are posted to /var/netwitness/uax/logs/sa.log on the NW Server Host.

Problem

After upgrade, you will notice one of the following:

  • Audit logs are not getting forwarded to the configured Global Audit Setup.

  • The following message seen in the sa.log.
    Syslog Configuration migration failed. Restart jetty service to fix this issue

Cause NW Server Global Audit setup migration failed to migrate from 12.2.x.x or 12.3.x.x. to 12.5.0.0 or later.
Solution
  1. SSH to the NW Server.
  2. Submit the following command.
    orchestration-cli-client --update-admin-node

Orchestration

The orchestration server logs are posted to /var/log/netwitness/orchestration-server/orchestration-server.log on the NW Server Host.

Problem
  1. Tried to upgrade a non-NW Server host and it failed.
  2. Retried the upgrade for this host and it failed again.

 

You will see the following message in the orchestration-server.log.
"'file' _virtual_ returned False: cannot import name HASHES""

Cause Salt minion may have been upgraded and never restarted on failed non-NW Server host
Solution
  1. SSH to the non-NW Server host that failed to upgrade.
  2. Submit the following commands.
    systemctl unmask salt-minion
    systemctl restart salt-minion
  3. Retry the upgrade of the non-NW Server host.
Problem

When you install and orchestrate a fresh 12.5.0.0 core Node-X to the Admin server (Node-0) upgraded from 12.0 or older versions to 12.5.0.0, the core services such as Concentrator, Log Decoder, Log Collector, Archiver, Decoder, Appliance, Workbench, Warehouse Connector, and Broker appear inactive under the Services column in the Admin > Hosts view. As a result, you cannot access the core services in the UI.

This is not applicable if you are orchestrating a fresh 12.5.0.0 core Node-X to the fresh-Installed 12.5.0.0 Admin Server (not upgraded from 12.0 or older versions to 12.5.0.0).

Cause The 12.5.0.0 core Node-X uses a dedicated SA-server certificate instead of the common Node-0 node certificate under its trustpeers if it is orchestrated directly to an upgraded 12.5.0.0 Admin Server host.
Solution
  1. Before you bootstrap and orchestrate the 12.5.0.0 core Node-X host, run the following commands.

    mkdir -p /etc/netwitness/platform

    touch /etc/netwitness/platform/nw-upgrade-mode

  2. Perform this workaround only if you skip the above workaround (Workaround 1). Run the following commands after you bootstrap and orchestrate the 12.5.0.0 core Node-X host.

    touch /etc/netwitness/platform/nw-upgrade-mode

    nw-manage --refresh-host --host-key <core-node-x-salt-minion-uuid>

    systemctl restart <core-service-name>

    Note:
    - Refer the file /etc/salt/minion to find <core-node-x-salt-minion-uuid>.
    - You must enter the core service name such as nwarchiver (Archiver), nwdecoder (Decoder), nwlogcollector (Log Collector), nwappliance (Appliance), nwconcentrator (Concentrator), nwlogdecoder (Log Decoder), nwbroker (Broker), nwworkbench (Workbench), and nwwarehouseconnector (Warehouse Connector) in <core-service-name>.

Reporting Engine Service 

Reporting Engine Update logs are posted to to/var/log/re_install.log file on the host running the Reporting Engine service.

Error Message <timestamp> : Available free space in /var/netwitness/re-server/rsa/soc/reporting-engine [ ><existing-GB ] is less than the required space [ <required-GB> ]
Cause Update of the Reporting Engine failed because you do not have enough disk space. 
Solution Free up the disk space to accommodate the required space shown in the log message. See the Add Additional Space for Large Reports topic in the Reporting Engine Configuration Guide for instructions on how to free up disk space.

Event Stream Analysis

Problem After upgrading to version 12.5.0.0 or later, the ESA correlation server does not aggregate events from the configured data sources.
Error Message Invalid username or password at com.rsa.netwitness.streams.base.RecordSourceSubscription.run(RecordSourceSubscription.java:173)
Solution

To resolve the issue

In the NetWitness user interface,

  1. Go to ConfigureIcon.png (CONFIGURE) > Policies > Content > Event Stream Analysis > Data Sources.
    The Data Sources panel is displayed.
  2. Select the data source and click Edit Datasource in the toolbar.

    The Edit Datasource dialog is displayed.

  3. In the Edit Datasource dialog, do one of the following:

    • Select Trusted Authentication.

    • Select Use Credentials and enter the Username and Password.

  4. Click Test Connection to make sure that it can communicate with the ESA service and then click OK.

Note: Do the above procedure for all the configured data sources.

  1. Deploy all the deployments associated with the edited data sources in the Data Sources panel after you finish making changes to the data sources.

Legacy Windows Log Collector

Problem

Legacy Windows Log Collector appears as inactive when the stack is upgraded to 12.5.0.0.

Cause Certificate update in the Admin Server node.
Solution

Refer Legacy Windows Log Collector section in the Perform Post Upgrade Tasks.

ESA Troubleshooting Information

ESA Rules are Not Creating Alerts

If you are not seeing any alerts, check the status of the ESA rule deployments.

  1. Go to ConfigureIcon.png(CONFIGURE) > Policies > Content > Event Stream Analysis > ESA Deployments.
    The ESA Deployment panel is displayed.
  2. Select required deployment from the list and click Deployment Stats tab.DeploymentStats_12.3.png
  3. Deployment Stats page is displayed, which shows the status of your ESA services and deployments.
  4. For each ESA rule deployment:
    1. In the Engine Stats section, look at the Events Offered and the Offered Rate. They confirm that the data is being aggregated and analyzed properly. If you see 0 for Events Offered, nothing is coming in for the deployment.
    2. In the Rule Stats section, look at the Rules Enabled and Rules Disabled. If there are any disabled rules, look in the Deployed Rule Stats section below to view the details of the disabled rules. Disabled rules show a red circle. Enabled rules show a green circle.

      125_ESA.png

  5. If you notice any disabled rules that should be enabled:
    1. Go to ConfigureIcon.png (Configure) > ESA Rules > Rules tab and redeploy the ESA rule deployments that contain disabled rules.
    2. Go back to the Services tab and check to see if the rules are still disabled. If the rules are still disabled, check the ESA Correlation service log files, which are located at /var/log/netwitness/correlation-server/correlation-server.log.

Note: To avoid unnecessary processing overhead, the Ignore Case option has been removed from the ESA Rule Builder - Build a Statement dialog for meta keys that do not contain text data values. During the upgrade to latest version, NetWitness Platform does not modify existing rules for the Ignore Case option. If an existing Rule Builder rule has the Ignore Case option selected for a meta key that no longer has the option available, an error occurs if you try to edit the statement and try to save it again without clearing the checkbox.

Example ESA Correlation Server Warning Message for Missing Meta Keys

If you see a warning message in the ESA Correlation server error logs that means there is a difference between the default-multi-valued parameter and multi-valued parameter meta key values, the new Endpoint, UEBA, and Live content rules will not work. Completing the Update the Multi-Valued and Single-Valued Parameter Meta Keys for the latest Endpoint, UEBA, and RSA Live Content Rules procedure in the ESA Configuration Guide should fix the issue.

Multi-Valued Warning Message Example

2019-08-23 08:55:07,602 [ deployment-0] WARN Stream|[alert, alert_id, browserprint, cert_thumbprint, checksum, checksum_all, checksum_dst, checksum_src, client_all, content, context, context_all, context_dst, context_src, dir_path, dir_path_dst, dir_path_src, directory, directory_all, directory_dst, directory_src, email_dst, email_src, feed_category, feed_desc, feed_name, file_cat, file_cat_dst, file_cat_src, filename_dst, filename_src, filter, function, host_all, host_dst, host_orig, host_src, host_state, ip_orig, ipv6_orig, OS, param, param_dst, param_src, registry_key, registry_value, risk, risk_info, risk_suspicious, risk_warning, threat_category, threat_desc, threat_source, user_agent] are still MISSING from multi-valued

Single Value Warning Message Example

2019-08-23 08:55:07,602 [ deployment-0] WARN Stream|[accesses, context_target, file_attributes, logon_type_desc, packets] are still MISSING from single-valued