Troubleshoot Upgrade Issues

This section describes the error messages displayed in the Hosts view when it encounters problems updating host versions and installing services on hosts in the Hosts view. If you cannot resolve an upgrade or installation issue using the following troubleshooting solutions, contact Customer Support.

Troubleshooting instructions for the following errors that may occur during the upgrade are described in this section.

Troubleshooting instructions are also provided for errors for the following hosts and services that may occur during or after an upgrade.

 

 

Problem Unable to boot the appliance after upgrading
Wokaround
  1. Manually modify the GRUB boot line to FIPS=0 to get it to boot.

  2. From here, disable FIPS using the following command:

    manage-stig-controls --disable-control-groups 3 --host-all

  3. Verify the line FIPS=1 is removed from /boot/grub2/grub.cfg

    • If not, run the following command:

      grub2-mkconfig -o /boot/grub2/grub.cfg

  4. Reboot.

  5. Run the following command to enable FIPS:

    manage-stig-controls --enable-control-groups 3 --host-all

  6. Reboot again.

deploy_admin User Password Has Expired Error

Error Message

credential-expired.png

Cause The deploy_admin user password has expired.
Solution

Reset your deploy_admin password password. Do the following.

  1. On the NW Server host only, run the following command.
    nw-manage --update-deploy-admin-pw
    Please enter the new deploy_admin account password: <new-deploy-admin-password>
    Please confirm the new deploy_admin account password: <new-deploy-admin-password>
  2. Review the output of the nw-manage --update-deploy-admin-pw command to verify the deploy_admin password was successfully updated on all hosts. If an NW host is down or fails for any reason as displayed by the output of the nw-manage --update-deploy-admin-pw command, run nw-manage --sync-deploy-admin-pw --host-key <host-identifier> to synchronize the password between the NW Server and the host that failed once the communication failure is resolved.
  3. On the host that failed installation or orchestration, run the nwsetup-tui command and use the new deploy_admin password in response to the Deployment Password prompt.

Downloading Error

Error Message

Download_Error.PNG

Problem When you select an update version and click Update >Update Host, the download starts but fails to complete.
Cause Version download files can be large and take a long time to download. If there are communication issues during the download it will fail.
Solution
  1. Try to update again.
  2. If it fails again with the same error, try to update using the offline methods as described in "Offline Method from Hosts View" or "Offline Method Using Command Line Interface" in the Upgrade Guide for NetWitness Platform. Go to the NetWitness All Versions Documents page and find NetWitness Platform guides to troubleshoot issues.

  3. If you are still not able to update, contact Customer Support.

Error Message

If you are upgrading from NetWitness Platform 11.x.x.x to 11.6.x.x or later, offline UI upgrade fails with the Download error message.

Solution
  1. In the Command Line Interface (CLI), do the following:

    1. SSH to NW Server.

    2. Run the following command:
      upgrade-cli-client --upgrade --host-key <ID, IP address, hostname or display name of host> --version <version number>
      For example:
      upgrade-cli-client --upgrade --host-key <ID, IP address, hostname or display name of host> --version 11.6.0.0
  2. After the NW Server is successfully updated, log in to the NW Server user interface and go to AdminIcon_25x22.png(Admin) > Hosts, where you are prompted to reboot the host.
  3. Click Reboot Host from the toolbar.

    To upgrade all the other hosts directly from the user interface:

    1. Click Begin Update from the Update Available dialog.
      After the host is upgraded, it prompts you to reboot the host.
    2. Click Reboot Host from the toolbar.

Error Deploying Version <version-number> Missing Update Packages

Error Message

Offline-UI-Update-ErrorDeploying version.PNG

Problem

Error deploying version <version-number> is displayed in the Initialize Update Package for NetWitness Platform dialog after you click on Initialize Update if the update package is corrupted.

Solution
  1. Click Close to close the dialog.

  2. Remove the version folder from staging folder.

  3. Make sure that the salt-master service is running.

  4. Recopy the update package zip file to the staging folder.
  5. In the Hosts view toolbar, select Check for Updates again.
    Chk4Upds.PNG

  6. Click Initialize Update.
  7. Click Update > Update Hosts from the toolbar.
  8. Click Begin Update from the Update Available dialog.
    After the host is updated, it prompts you to reboot the host.
  9. Click Reboot from the toolbar.

Upgrade Failed Error

Error Message

You will receive an error in the error log similar to the following while trying to update to version 11.6 or later:
error_log.PNG

Cause Custom builds/rpms installed for certain components installed on hosts, such as in the case of installing Hotfixes.
Solution

To resolve the issue:

  1. SSH to Admin Server.
  2. Locate the component descriptor file by running the following command.
    cd /etc/netwitness/component-descriptor/
  3. Open the component descriptor file by running the following command.
    vi nw-component-descriptor. json
  4. Search for “packages” section for the component you have custom build/rpm. For example, below shown is the package details for “concentrator” host that has custom build/rpm.
    “concentrator”: {
    “cookbook_name”: “rsa-concentrator”,
    “service_names”: [“rsa-nw-concentrator”],
    “family”: “launch”,
    “default_port”: xxxx, “description”: “Concentrator”,
    packages”:[{ “name”: “rsa-nw-concentrator”,
    “version” : “11.6.0.0-2003001075220.5.cecf24b.e.17.centos”
    },
  5. Delete the complete version details including (,) character in the packages section. For example, it should look like as shown below after you delete the version details.
    “packages”: [{
    “name”: “rsa-nw-concentrator”
    },

Note: You must delete the version details for all the host that has custom builds/rpms in the component descriptor of the admin server.

  1. Run the upgrade process again.

External Repo Update Error

Error Message

You will receive an error similar to the following error while trying to update to a new version from the :
.Repository 'nw-rsa-base': Error parsing config: Error parsing "baseurl = 'https://nw-node-zero/nwrpmrepo /<version-number>/RSA'": URL must be http, ftp, file or https not ""

Cause Incorrect path specified.
Solution

Make sure that:

  • the URL does exist on the NW Server host.
  • you used the correct path and remove any spaces from it.

Host Update Failed Error

Error Message


hstupdfailed.png

Problem When you select an update version and click Update > Update Host, the download process is successful, but the update process fails.
Solution
  1. Try to apply the version update to the host again.
    Often this is all you need to do.
  2. If you still cannot apply the new version update:
    Monitor the following logs on NW Server as it progresses (for example, run the tail -f command from the command line):
    /var/netwitness/uax/logs/sa.log
    /var/log/netwitness/orchestration-server/orchestration-server.log
    /var/log/netwitness/deployment-upgrade/chef-solo.log
    /var/log/netwitness/config-management/chef-solo.log

    /var/lib/netwitness/config-management/cache/chef-stacktrace.out
    The error appears in one or more of these logs.
  3. If you still cannot apply the update, gather the logs from step 2 above and contact Customer Support.
Error Message

Unauthorized

Problem When you select an update version and click UpdateCheck for Updates, the Unauthorized error message is displayed. As a result, the connection to the live service fails.
Solution
  1. Make sure the Live test connection passes.

  2. Update https://update.netwitness.com/RSA-netwitness in AdminIcon_25x22.png(Admin) > SystemUpdates.

  3. SSH to the Admin Server and backup /etc/default/jetty.

  4. Update the following entry at the end of the JAVA_OPTIONS in the /etc/default/jetty.

    JAVA_OPTIONS="${JAVA_OPTIONS} -Drsa.nw.legacy.web.server.system.update.repo.url=https://update.netwitness.com/RSA-netwitness/ -Drsa.nw.legacy.system.update.auth.url=https://update.netwitness.com/authenticate "

  5. Restart the jetty service. Run the following command.

    service jetty restart

Missing Update Packages Error

Error Message

Initialize Update for Version xx.x.x.x
Missing the following update package(s)

Download Packages from NetWitness Link

Problem Missing the following update package(s) is displayed in the Initialize Update Package for NetWitness Platform dialog when you are updating a host from the Hosts view offline and there are packages missing in the staging folder.
Solution
  1. Click Download Packages from NetWitness Community in the Initialize Update Package for NetWitness Platform dialog.
    The NetWitness Community page that contains the update files for the selected version is displayed.

  2. Select the missing packages from the staging folder.
    The Initialize Update Package for NetWitness Platform dialog is displayed telling you that it is ready to initialize the update packages.

OpenSSL 1.1.x

Error Message

The following example illustrates an ssh error that can occurs when the ssh client is run from a host with OpenSSL 1.1.x installed:
$ ssh root@10.1.2.3
ssh_dispatch_run_fatal: Connection to 10.1.2.3 port 22: message authentication code incorrect

Problem

Advanced users who want to ssh to a NetWitness Platform host from a client that is using OpenSSL 1.1.x encounter this error because of incompatibility between CENTOS 7.x and OpenSSL 1.1.x. For example:

$ rpm -q openssl
openssl-1.1.1-8.el8.x86_64

Solution

Specify the compatible cipher list on the command line. For example:

$ ssh -oCiphers=aes128-ctr,aes192-ctr,aes256-ctr root@10.1.2.3

I've read & consent to terms in IS user agreement.

root@10.1.2.3's password:

Last login: Mon Oct 21 19:03:23 2019

Patch Update to Non-NW Server Error

Error Message

The /var/log/netwitness/orchestration-server/orchestration-server.log has an error similar to the following error:
API|Failure /rsa/orchestration/task/update-config-management [counter=10 reason=IllegalArgumentException::Version '11.x.x.n' is not supported

Problem After you update the NW Server host to a version, you must update all non-NW Server hosts to the same version. For example, if you update the NW Server from 11.4.0.0 to 11.6.0.0 or later, the only update path for the non-NW Server hosts is the same version (that is, 11.6.0.0). If you try to update any non-NW Server host to a different version (for example, from 11.4.0.0 to an 11.4.x.x) you will get this error.
Solution

Do any of the following:

  • Update the non-NW Server host to 11.6.0.0 or later, or
  • Do not update the non-NW Server host (keep it at its current version)

Reboot Host After Update from Command Line Error

Error Message

You will receive a message in the User Interface to reboot the host after you update and reboot the host offline.
ASOC-50839.png

Cause The above error occurs when you use CLI to reboot the host. You must use the User Interface to reboot the host.
Solution

Reboot the host in the Host View in the User Interface.

Reporting Engine Restarts After Upgrade

Problem

In some cases, after you upgrade to 11.6 or later from versions of 11.x, such as 11.4, the Reporting Engine service attempts to restart continuously without success.

Cause

The database files for live charts, alert status, or report status may not be loaded successfully as the files may be corrupted.

Solution

To resolve the issue:

  1. Check which database files are corrupted:

    Navigate to the file located at /var/netwitness/reserver/rsa/soc/reporting-engine/logs/reporting-engine.log and check the following blocks:

    • If the live charts db file is corrupted, the following logs are displayed:

      livecharts_650x195.png

    • If the alert status db file is corrupted, the following logs are displayed:

      alertstatus_636x238.png

    • If the report status db file is corrupted, the following logs are displayed:

      org.h2.jdbc.JdbcSQLException: File corrupted while reading record: null. Possible solution: use the recovery tool [90030-196]

  2. To resolve the live charts database file corruption, do the following:

    1. Stop the Reporting Engine service.

    2. Move the livechart.mv.db file from /var/netwitness/reserver/rsa/soc/reporting-engine/livecharts folder to a temporary location.

    3. Restart the Reporting Engine service.

      Note: Some live charts data may be lost on performing the above steps.

  3. To resolve the alert status or report status database file corruption, perform the following steps:

    1. Stop the Reporting Engine service.
    2. Replace the corrupted db file with the latest alertstatusmanager.mv.db or reportstatusmanager.mv.db file from /var/netwitness/reserver/rsa/soc/reporting-engine/archives folder.
    3. Restart the Reporting Engine service.

For more information, see the Knowledge Base article Reporting Engine restarts After upgrade to NetWitness Platform 11.4.

 

Problem After you upgrade to version 11.6 or later, the Reporting Engine service does not restart.
Cause The Reporting Engine service may not start due to any of the following reasons.
- workspace.xml not updated.
- Time is not converted properly in livechart h2 database.
- JCR (Jackrabbit repository) is corrupted with primary key violation.
Solution

To resolve the issue, run the Reporting Engine Migration Recovery tool (rsa-nw-re-migration-recovery.sh) on the Admin Server where the Reporting Engine service is installed.

Note: You can find the Reporting Engine Migration Recovery tool in the below location.
/opt/rsa/soc/reporting-engine-<version number>-<Tag>/nwtools
For example:
/opt/rsa/soc/reporting-engine-11.6.0.0-<Tag>/nwtools

1. SSH to Admin Server.

2. Untar the RE (Reporting Engine) tool, run the following command.
tar -xvf rsa-nw-re-recovery-tool-bundle.tar

3. (Optional) If you want to untar the RE tool file in some other directory, you can create a directory and untar the RE tool. Run the following commands.

mkdir <NAME OF THE DIRECTORY>
tar -xvf rsa-nw-re-recovery-tool-bundle.tar --directory <PATH OF THE DIRECTORY>

4. Run the script, run the following command.
./<PATH OF THE DIRECTORY>/rsa-nw-re-recovery-tool.sh

For more information, see the Knowledge Base article Reporting Engine Migration Recovery Tool.

Log Collector Service (nwlogcollector)

Log Collector installation logs posted to /var/log/install/nwlogcollector_install.log on the host running the nwlogcollector service.

Error Message <timestamp>.NwLogCollector_PostInstall: Lockbox Status : Failed to open lockbox: The lockbox stable value threshold was not met because the system fingerprint has changed. To reset the system fingerprint, open the lockbox using the passphrase.
Cause The Log Collector Lockbox failed to open after the update.
Solution Log in to NetWitness and reset the system fingerprint by resetting the stable system value password for the Lockbox as described in the Reset the Stable System Value topic under  Configure Lockbox Security Settings topic in the Log Collection Configuration Guide.

 

Error Message <timestamp> NwLogCollector_PostInstall: Lockbox Status : Not Found
Cause The Log Collector Lockbox is not configured after the update.
Solution If you use a Log Collector Lockbox, log in to NetWitness and configure the Lockbox  as described in the Configure Lockbox Security Settings topic in the Log Collection Configuration Guide.

 

Error Message <timestamp>: NwLogCollector_PostInstall: Lockbox Status : Lockbox maintenance required: The lockbox stable value threshold requires resetting. To reset the system fingerprint, select Reset Stable System Value on the settings page of the Log Collector.
Cause You need to reset the stable value threshold field for the Log Collector Lockbox.
Solution Log in to NetWitness and reset the stable system value password for the Lockbox  as described in the Reset the Stable System Value topic under  Configure Lockbox Security Settings topic in the Log Collection Configuration Guide.

 

Error Message

Decoder tries to start capture events but fails.
Image_3.png

Solution

To resolve the issue:

  1. SSH to the Decoder host.
  2. Run the following commands.
    yum reinstall pfring*
    systemctl restart nwdecoder

NW Server

These logs are posted to /var/netwitness/uax/logs/sa.log on the NW Server Host.

Problem

After upgrade, you will notice one of the following:

  • Audit logs are not getting forwarded to the configured Global Audit Setup.

  • The following message seen in the sa.log.
    Syslog Configuration migration failed. Restart jetty service to fix this issue

Cause NW Server Global Audit setup migration failed to migrate from 11.4.x.x or 11.5.x.x. to 11.6.0.0 or later.
Solution
  1. SSH to the NW Server.
  2. Submit the following command.
    orchestration-cli-client --update-admin-node

Orchestration

The orchestration server logs are posted to /var/log/netwitness/orchestration-server/orchestration-server.log on the NW Server Host.

Problem
  1. Tried to upgrade a non-NW Server host and it failed.
  2. Retried the upgrade for this host and it failed again.

 

You will see the following message in the orchestration-server.log.
"'file' _virtual_ returned False: cannot import name HASHES""

Cause Salt minion may have been upgraded and never restarted on failed non-NW Server host
Solution
  1. SSH to the non-NW Server host that failed to upgrade.
  2. Submit the following commands.
    systemctl unmask salt-minion
    systemctl restart salt-minion
  3. Retry the upgrade of the non-NW Server host.
Problem

When you install and orchestrate a fresh 12.3.1.0 core Node-X to the Admin server (Node-0) upgraded from 12.0 or older versions to 12.3.1.0, the core services such as Concentrator, Log Decoder, Log Collector, Archiver, Decoder, Appliance, Workbench, Warehouse Connector, and Broker appear inactive under the Services column in the Admin > Hosts view. As a result, you cannot access the core services in the UI.

This is not applicable if you are orchestrating a fresh 12.3.1.0 core Node-X to the fresh-Installed 12.3.1.0 Admin Server (not upgraded from 12.0 or older versions to 12.3.1.0).

Cause The 12.3.1.0 core Node-X uses a dedicated SA-server certificate instead of the common Node-0 node certificate under its trustpeers if it is orchestrated directly to an upgraded 12.3.1.0 Admin Server host.
Solution
  1. Before you bootstrap and orchestrate the 12.3.1.0 core Node-X host, run the following commands.

    mkdir -p /etc/netwitness/platform

    touch /etc/netwitness/platform/nw-upgrade-mode

  2. Perform this workaround only if you skip the above workaround (Workaround 1). Run the following commands after you bootstrap and orchestrate the 12.3.1.0 core Node-X host.

    touch /etc/netwitness/platform/nw-upgrade-mode

    nw-manage --refresh-host --host-key <core-node-x-salt-minion-uuid>

    systemctl restart <core-service-name>

    Note:
    - Refer the file /etc/salt/minion to find <core-node-x-salt-minion-uuid>.
    - You must enter the core service name such as nwarchiver (Archiver), nwdecoder (Decoder), nwlogcollector (Log Collector), nwappliance (Appliance), nwconcentrator (Concentrator), nwlogdecoder (Log Decoder), nwbroker (Broker), nwworkbench (Workbench), and nwwarehouseconnector (Warehouse Connector) in <core-service-name>.

Reporting Engine Service 

Reporting Engine Update logs are posted to to/var/log/re_install.log file on the host running the Reporting Engine service.

Error Message <timestamp> : Available free space in /var/netwitness/re-server/rsa/soc/reporting-engine [ ><existing-GB ] is less than the required space [ <required-GB> ]
Cause Update of the Reporting Engine failed because you do not have enough disk space. 
Solution Free up the disk space to accommodate the required space shown in the log message. See the Add Additional Space for Large Reports topic in the Reporting Engine Configuration Guide for instructions on how to free up disk space.

Event Stream Analysis

Problem After upgrading to version 12.3.1.0 or later, the ESA correlation server does not aggregate events from the configured data sources.
Error Message Invalid username or password at com.rsa.netwitness.streams.base.RecordSourceSubscription.run(RecordSourceSubscription.java:173)
Solution

To resolve the issue:

In the NetWitness user interface,

  1. Go to ConfigureIcon_12x10.png(CONFIGURE) > Policies > Content > Event Stream Analysis > Data Sources.
    The Data Sources panel is displayed.
  2. Select the data source and click Edit Datasource in the toolbar.

    The Edit Datasource dialog is displayed.

  3. In the Edit Datasource dialog, do one of the following:

    • Select Trusted Authentication.

    • Select Use Credentials and enter the Username and Password.

  4. Click Test Connection to make sure that it can communicate with the ESA service and then click OK.

Note: Do the above procedure for all the configured data sources.

  1. Deploy all the deployments associated with the edited data sources in the Data Sources panel after you finish making changes to the data sources.

Legacy Windows Log Collector

Problem
  • Legacy Windows Log Collector appears as inactive post upgrade of SA to 12.3.1.0 version and Legacy Windows Log Collector to 11.6.x or 11.7.x versions.

  • Legacy Windows Log Collector appears as inactive when the stack is upgraded to 12.3.1.0.

Cause Certificate update in the SA node.
Solution

Refer Legacy Windows Log Collector section in the Perform Post Upgrade Tasks.

User Entity Behavior Analytics

Problem

The Context Hub Server Config page (AdminIcon_20x16.png(Admin)Services > select the ContextHub Server > View > Config) keeps loading post upgrade, if the RSA Endpoint (ECAT Data Sources) is not removed from the Context Hub Server before upgrading from 11.7 and older versions to 12.0, 12.1, or 12.1.x.x versions.

Cause In 12.0 and later versions, the ECAT Integration such as the RSA Endpoint (ECAT Data Sources) in the ContextHub Server is not supported by NetWitness Platform.
Solution

Do the following to access the Context Hub Server Config page.

  1. SSH to the Admin Server.

  2. Log in to the MongoDB.

  3. Go to the ContextHub collection and search for the RSA Endpoint document.

  4. Delete the entry RSA Endpoint from the Admin Server Mongo.

  5. Restart the Mongo. Run the following command.

    service mongo restart

  6. Restart the Context Hub service. Run the following command.

    service rsa-nw-contexthub-server restart

     

    The Config page is loaded properly.

    Note: You must restart the Context Hub service from the ESA box.

ESA Troubleshooting Information

ESA Rules are Not Creating Alerts

If you are not seeing any alerts, check the status of the ESA rule deployments.

  1. Go to ConfigureIcon_12x10.png(CONFIGURE) > Policies > Content > Event Stream Analysis > ESA Deployments.
    The ESA Deployment panel is displayed.
  2. Select required deployment from the list and click Deployment Stats tab.DeploymentStats_12.3_1372x749.png
  3. Deployment Stats page is displayed, which shows the status of your ESA services and deployments.
  4. For each ESA rule deployment:
    1. In the Engine Stats section, look at the Events Offered and the Offered Rate. They confirm that the data is being aggregated and analyzed properly. If you see 0 for Events Offered, nothing is coming in for the deployment.
    2. In the Rule Stats section, look at the Rules Enabled and Rules Disabled. If there are any disabled rules, look in the Deployed Rule Stats section below to view the details of the disabled rules. Disabled rules show a white circle. Enabled rules show a green circle.

    DeploymentStatsDisplay_12.3_1379x755.png

  5. If you notice any disabled rules that should be enabled:
    1. Go to ConfigureIcon_24x21.png(Configure) > ESA Rules > Rules tab and redeploy the ESA rule deployments that contain disabled rules.
    2. Go back to the Services tab and check to see if the rules are still disabled. If the rules are still disabled, check the ESA Correlation service log files, which are located at /var/log/netwitness/correlation-server/correlation-server.log.

Note: To avoid unnecessary processing overhead, the Ignore Case option has been removed from the ESA Rule Builder - Build a Statement dialog for meta keys that do not contain text data values. During the upgrade to 11.4 or later, NetWitness Platform does not modify existing rules for the Ignore Case option. If an existing Rule Builder rule has the Ignore Case option selected for a meta key that no longer has the option available, an error occurs if you try to edit the statement and try to save it again without clearing the checkbox.

Example ESA Correlation Server Warning Message for Missing Meta Keys

If you see a warning message in the ESA Correlation server error logs that means there is a difference between the default-multi-valued parameter and multi-valued parameter meta key values, the new Endpoint, UEBA, and Live content rules will not work. Completing the Update the Multi-Valued and Single-Valued Parameter Meta Keys for the latest Endpoint, UEBA, and RSA Live Content Rules procedure in the ESA Configuration Guide should fix the issue.

Multi-Valued Warning Message Example

2019-08-23 08:55:07,602 [ deployment-0] WARN Stream|[alert, alert_id, browserprint, cert_thumbprint, checksum, checksum_all, checksum_dst, checksum_src, client_all, content, context, context_all, context_dst, context_src, dir_path, dir_path_dst, dir_path_src, directory, directory_all, directory_dst, directory_src, email_dst, email_src, feed_category, feed_desc, feed_name, file_cat, file_cat_dst, file_cat_src, filename_dst, filename_src, filter, function, host_all, host_dst, host_orig, host_src, host_state, ip_orig, ipv6_orig, OS, param, param_dst, param_src, registry_key, registry_value, risk, risk_info, risk_suspicious, risk_warning, threat_category, threat_desc, threat_source, user_agent] are still MISSING from multi-valued

Single Value Warning Message Example

2019-08-23 08:55:07,602 [ deployment-0] WARN Stream|[accesses, context_target, file_attributes, logon_type_desc, packets] are still MISSING from single-valued