The following sections outline potential errors that may arise during the deployment of SASE. Each section includes example outputs of these errors. It is essential to consult the relevant installation and deployment steps for additional details before attempting to resolve the issues.
Missing cloud credentials
Output example
nw-create-cloud-hybrid was aborted since the token file was not found in its default location.
[root@js116-adminserver errorCapture]# nw-create-cloud-hybrid --enable-cloud-sase --cloud-provider gcp --namespace js116
[2024-10-29T15:44:17+00:00] <16789> (INFO) Creating /var/log/netwitness/sase directory.
[2024-10-29T15:44:25+00:00] <16789> (INFO) Using cloud provider: gcp
[2024-10-29T15:44:25+00:00] <16789> (ERROR) Please specify the gcp Service Account Key Token file.
[root@js116-adminserver errorCapture]#
Solution
-
Generate the cloud credentials.
-
Issue appropriate permissions to the identity of the credentials.
-
Install the credentials on the Administration server.
Invalid cloud credentials
Output example
This is an example of an initial attempt to deploy SASE with a malformed or bad token.
[2024-10-29T15:44:25+00:00] <16789> (INFO) Using cloud provider: gcp
[2024-10-29T15:44:25+00:00] <16789> (ERROR) Please specify the gcp Service Account Key Token file.
[2024-10-29T15:49:53+00:00] <19177> (ERROR) Failed to get attribute: hybrid_deployment from Ohai node: nw_host
[2024-10-29T15:51:18+00:00] <19670> (INFO) Using cloud provider: gcp
[2024-10-29T15:51:18+00:00] <19670> (INFO) Template File: /root/.sase/sase-deployment-models.yml is properly formatted.
[2024-10-29T15:51:19+00:00] <19670> (INFO) Template File: /root/.sase/host-models.yml is properly formatted.
[2024-10-29T15:51:45+00:00] <19670> (ERROR) Unable to set the gcp project for the gcloud cli
[2024-10-29T15:53:39+00:00] <21137> (INFO) Using cloud provider: gcp
[2024-10-29T15:53:40+00:00] <21137> (INFO) Template File: /root/.sase/sase-deployment-models.yml is properly formatted.
[2024-10-29T15:53:40+00:00] <21137> (INFO) Template File: /root/.sase/host-models.yml is properly formatted.
[2024-10-29T15:53:40+00:00] <21137> (ERROR) Unable to set the gcp project for the gcloud cli
[2024-10-29T15:56:58+00:00] <79418> (INFO) Using cloud provider: gcp
[2024-10-29T15:56:58+00:00] <79418> (INFO) Template File: /root/.sase/sase-deployment-models.yml is properly formatted.
[2024-10-29T15:56:58+00:00] <79418> (INFO) Template File: /root/.sase/host-models.yml is properly formatted.
[2024-10-29T15:56:59+00:00] <79418> (ERROR) Unable to set the gcp project for the gcloud cli
Subsequent attempts that use a malformed or bad token display this behavior.
nw-create-cloud-hybrid --enable-cloud-sase --cloud-provider gcp --namespace js116
[2024-10-29T15:56:58+00:00] <79418> (INFO) Using cloud provider: gcp
[2024-10-29T15:56:58+00:00] <79418> (INFO) Template File: /root/.sase/sase-deployment-models.yml is properly formatted.
[2024-10-29T15:56:58+00:00] <79418> (INFO) Template File: /root/.sase/host-models.yml is properly formatted.
parse error: Invalid literal at line 2, column 0
[2024-10-29T15:56:59+00:00] <79418> (ERROR) Unable to set the gcp project for the gcloud cli
[root@js116-adminserver ~]#
Solution
-
For GCP
-
Check with your cloud administrator to ensure the token was correctly generated and transmitted to you.
-
Install a corrected token.
-
Ensure no world or group read on the token: chmod 600/root/.gcp/gcp-auth-token.json
Improperly formed sase-deployment-models.yml file
Output example
An example syntax error (a missing single quote) in the models file results in the below error.
vpn_provider: 'Netskope
The output resulting from an error in the models file.
...
File "/usr/lib64/python3.6/site-packages/yaml/parser.py", line 439, in parse_block_mapping_key
"expected <block end>, but found %r" % token.id, token.start_mark)
yaml.parser.ParserError: while parsing a block mapping
in "/root/.sase/sase-deployment-models.yml", line 18, column 7
expected <block end>, but found '<scalar>'
in "/root/.sase/sase-deployment-models.yml", line 61, column 23
[2024-10-29T16:11:44+00:00] <84831> (ERROR) Unable to parse yaml to json
[root@js116-adminserver ~]#
Solution
-
Correct the structure of the model file found in /root/.sase/sase-deployment-models.yml.
-
Ensure any host file key references in the model file are found in the host file at /root/.sase/host-models.yml, and correct any discrepancies.
Improperly formed host-models.yml file
Output example
A missing colon in the disk name raises the below error.
decodersmall:
disk_name decodersmall
disk_type: pd-standard
disk_size: 690
An incorrect host-models file will result in the error below.
...
File "/usr/lib64/python3.6/site-packages/yaml/scanner.py", line 116, in check_token
self.fetch_more_tokens()
File "/usr/lib64/python3.6/site-packages/yaml/scanner.py", line 220, in fetch_more_tokens
return self.fetch_value()
File "/usr/lib64/python3.6/site-packages/yaml/scanner.py", line 576, in fetch_value
self.get_mark())
yaml.scanner.ScannerError: mapping values are not allowed here
in "/root/.sase/host-models.yml", line 13, column 20
[2024-10-29T17:03:28+00:00] <101933> (ERROR) Unable to parse yaml to json
Solution
Missing image file (lite)
Output example
An example of the error message displayed when the image file is missing. The ‘lite’ image is used to create the specified nodes.
...
Plan: 6 to add, 0 to change, 0 to destroy.
Changes to Outputs:
+ network = "js116-ppn-network"
+ ppn_server_ext_ip = (known after apply)
╷
│ Error: error retrieving image information: googleapi: Error 404: The resource 'projects/nw-nwp-dev/global/images/rsa-nw-12-5-1-0-21738-lite' was not found, notFound
│
│ with data.google_compute_image.nw_image,
│ on compute.tf line 12, in data "google_compute_image" "nw_image":
│ 12: data "google_compute_image" "nw_image" {
│
╵
[2024-10-29T17:08:35+00:00] <104309> (ERROR) Deployment of js116-ppn-server failed
[2024-10-29T17:08:35+00:00] <103741> (ERROR) Failed to create ppn-server in region us-east1
[root@js116-adminserver ~]#
Solution
-
Ensure that a lite image exists in your image repository.
-
Contact support to ensure that the correct image has been copied to your project's cloud image repository.
-
Contact your Cloud administrator to ensure the identity of the token has rights to use the image.
Failure of nw-create-cloud-hybrid --disable-cloud-sase
Output example
In this example, a simulated failure was created by terminating the the ssh connection to the adminserver while nw-create-cloud-hybrid --disable-cloud-sase was running. In this situation running the command again would see the already removed assets and continue on successfully. However, for resons below, a failure could occur, leaving assets unremoved.
A simulated network failure occurred while a subnet was being removed.
[2024-10-29T20:08:05+00:00] <193469> (ERROR) Undeployment of subnet failed
[2024-10-29T20:08:05+00:00] <187807> (ERROR) Failed to delete subnet 10.10.20.0/24 in region us-east1
Solution
-
Depending upon the point of failure of --disable-cloud-sase, various cloud assets may be left unremoved.
-
Possible cause:
-
Manual removal of cloud assets should repeat runs of --disable-cloud-sase fail.
-
The following is list of assets that might be left unremoved. Using your projects cloud console, carefully remove the cloud assets. The asset names will be prefixed with a namespace “nw”. This helps identify cloud assets created by sase automation. Depending upon your model file, you may have assets in multiple regions. Ensure you check all regions deployed to for assurance that all assets are removed. If unsure, check your model file to see where your hosts were configured to be deployed.
Firewall Rule(s) to allow UDP 4242 egress from adminserver to ppn-server not present or are malformed
Output Example
nw-create-cloud-hybrid --enable-cloud-sase
… (truncated)
The error output seen when the adminserver cannot reach the ppn-server.
Waiting for nebula service to start. in 5 seconds.....[2024-11-05T20:08:53+00:00] <95647> (INFO) nebula service is running
[2024-11-05T20:08:53+00:00] <96262> (INFO) nebula service is running
[2024-11-05T20:08:53+00:00] <96262> (INFO) Successfully connected to 172.30.30.2 on port 22.
[2024-11-05T20:08:58+00:00] <96262> (ERROR) Unable to connect to 172.30.30.1 on ssh port
[2024-11-05T20:08:58+00:00] <93656> (ERROR) Failed to validate NetWitness Overlay Network connections
[root@js116-adminserver ~]#
Solution
Firewall Rule(s) to allow TCP 443 egress to cloud api endpoints are not present or are malformed
Output Example
nw-create-cloud-hybrid --enable-cloud-sase
… (truncated)
The output appears frozen when TCP 443 egress is not allowed from the adminserver. Sometimes this occurs at the line Template File: /root/.sase/host-models.yml is properly formatted.
[root@js116-adminserver ~]# nw-create-cloud-hybrid --enable-cloud-sase --cloud-provider gcp --namespace js116
[2024-11-05T20:30:28+00:00] <106035> (INFO) Using cloud provider: gcp
[2024-11-05T20:30:28+00:00] <106035> (INFO) Template File: /root/.sase/sase-deployment-models.yml is properly formatted.
[2024-11-05T20:30:28+00:00] <106035> (INFO) Template File: /root/.sase/host-models.yml is properly formatted.
[2024-11-05T20:30:29+00:00] <106412> (INFO) ssh key-pair is already created, skipping..
[2024-11-05T20:30:29+00:00] <106412> (INFO) terraform rpm already installed, skipping...
[2024-11-05T20:30:30+00:00] <106412> (INFO) google-cloud-cli rpm already installed, skipping...
[2024-11-05T20:30:31+00:00] <106412> (INFO) Installing package: nebula
Solution
Insufficient permissions/roles on the cloud service account
Output
... (truncated)
google_compute_subnetwork.nw_ppn_server_subnetwork: Creating...
google_compute_firewall.nw_ppn_ingress: Creating...
google_compute_firewall.nw_ssh: Creating...
google_compute_firewall.nw_ppn_egress: Creating...
google_compute_firewall.nw_ppn_ingress: Still creating... [10s elapsed]
google_compute_subnetwork.nw_ppn_server_subnetwork: Still creating... [10s elapsed]
google_compute_firewall.nw_ssh: Still creating... [10s elapsed]
google_compute_firewall.nw_ppn_egress: Still creating... [10s elapsed]
╷
│ Error: Failed to save state
│
│ Error saving state: Failed to upload state to
│ gs://nw-cloud-artifacts-1c1013f4/terraform/js116-ppn-server/default.tfstate: googleapi: Error 403:
│ nw-sase-automation@nw-nwp-dev.iam.gserviceaccount.com does not have storage.objects.create access to the Google
│ Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., forbidden
╵
╷
│ Error: Failed to persist state to backend
│
│ The error shown above has prevented Terraform from writing the updated state to the configured backend. To
│ allow for recovery, the state has been written to the file "errored.tfstate" in the current working directory.
│
│ Running "terraform apply" again at this point will create a forked state, making it harder to recover.
│
│ To retry writing this state, use the following command:
│ terraform state push errored.tfstate
│
╵
╷
│ Error: Error waiting to create Subnetwork: Error waiting for Creating Subnetwork: error while retrieving operation:
googleapi: Error 403: Required 'compute.regionOperations.get' permission for 'projects/nw-nwp-dev/regions/us-east1/operations/operation-1730910031081-62640e58a6e7f-3b2b21c4-c8362697', forbidden
│
│ with google_compute_subnetwork.nw_ppn_server_subnetwork,
│ on network.tf line 11, in resource "google_compute_subnetwork" "nw_ppn_server_subnetwork":
│ 11: resource "google_compute_subnetwork" "nw_ppn_server_subnetwork" {
│
╵
╷
│ Error: Error waiting to create Firewall: Error waiting for Creating Firewall: error while retrieving operation: googleapi: Error 403: Required 'compute.globalOperations.get' permission for 'projects/nw-nwp-dev/global/operations/operation-1730910031083-62640e58a78a8-1055e424-a8fbfd4a', forbidden
│
│ with google_compute_firewall.nw_ssh,
│ on network.tf line 21, in resource "google_compute_firewall" "nw_ssh":
│ 21: resource "google_compute_firewall" "nw_ssh" {
│
╵
╷
│ Error: Error waiting to create Firewall: Error waiting for Creating Firewall: error while retrieving operation: googleapi: Error 403: Required 'compute.globalOperations.get' permission for 'projects/nw-nwp-dev/global/operations/operation-1730910031083-62640e58a78ad-e0ed2c5a-83b29a5a', forbidden
│
│ with google_compute_firewall.nw_ppn_ingress,
│ on network.tf line 36, in resource "google_compute_firewall" "nw_ppn_ingress":
│ 36: resource "google_compute_firewall" "nw_ppn_ingress" {
│
╵
╷
│ Error: Error waiting to create Firewall: Error waiting for Creating Firewall: error while retrieving operation: googleapi: Error 403: Required 'compute.globalOperations.get' permission for 'projects/nw-nwp-dev/global/operations/operation-1730910031084-62640e58a7b66-08f858df-24b6bfc1', forbidden
│
│ with google_compute_firewall.nw_ppn_egress,
│ on network.tf line 51, in resource "google_compute_firewall" "nw_ppn_egress":
│ 51: resource "google_compute_firewall" "nw_ppn_egress" {
│
╵
Releasing state lock. This may take a few moments...
╷
│ Error: Error releasing the state lock
│
│ Error message: 2 errors occurred:
│ * googleapi: Error 403: nw-sase-automation@nw-nwp-dev.iam.gserviceaccount.com does not have
│ storage.objects.delete access to the Google Cloud Storage object. Permission 'storage.objects.delete' denied on
│ resource (or it may not exist)., forbidden
│ * googleapi: got HTTP response code 403 with body: <?xml version='1.0'
│ encoding='UTF-8'?><Error><Code>AccessDenied</Code><Message>Access
│ denied.</Message><Details>nw-sase-automation@nw-nwp-dev.iam.gserviceaccount.com does not have
│ storage.objects.get access to the Google Cloud Storage object. Permission 'storage.objects.get' denied on
│ resource (or it may not exist).</Details></Error>
│
│
│
│ Terraform acquires a lock when accessing your state to prevent others
│ running Terraform to potentially modify the state at the same time. An
│ error occurred while releasing this lock. This could mean that the lock
│ did or did not release properly. If the lock didn't release properly,
│ Terraform may not be able to run future commands since it'll appear as if
│ the lock is held.
│
│ In this scenario, please call the "force-unlock" command to unlock the
│ state manually. This is a very dangerous operation since if it is done
│ erroneously it could result in two people modifying state at the same time.
│ Only call this command if you're certain that the unlock above failed and
│ that no one else is holding a lock.
╵
[2024-11-06T16:20:42+00:00] <19843> (ERROR) Deployment of js116-ppn-server failed
[2024-11-06T16:20:42+00:00] <19348> (ERROR) Failed to create ppn-server in region us-east1
[root@js116-adminserver ~]#
Shows insufficient permission by the service account, in particular, storage (bucket) permissions are missing: ' does not have storage.objects.create'.
Solution