This article provides resolution instructions in case the routing layer of Nex!™ is not responding anymore and your browser displays that the "site can't be reached" (connection timeout).

1 - Context

If you are getting a connection timeout error when attempting to access an app on Nex!™ it is likely that something wrong is going on with all the routing racks. When this issue occurs you are likely to observe the following error in Chrome:

2 - Checking Nginx on the routing racks

The following steps assume that you have setup the Nex!™ command line client and that you have administrative access to the platform.

You can display the list of routing racks and access one of them using the following commands:

# Display all routing racks
nex-cli racks --type routing

# Access rack by internal IP
nex-cli racks:ssh 10.1.1.1

The first thing to do is to check that Nginx is up. If it's not attempt to restart it.

# Check if any nginx process is running
ps -ef | grep nginx

# Perform a full restart of nginx
service nginx restart

# Check if nginx is up
ps -ef | grep nginx

If Nginx has not come back up after restarting then you will need to start it in the foreground to understand the issue:

# Run the following command
# This will start nginx in foreground mode. Any startup issue will appear here.
/usr/local/openresty/nginx/sbin/nginx

Nginx not starting is usually related to default SSL certificates not being setup properly. If the above command yields an error related to SSL, Private Key, Public Key then go to section 3.

3 - Fixing the default SSL certificates

If the default Nginx certificates are not properly setup - e.g. missing public or private key, keys not matching - Nginx will simply refuse to start. This problem typically indicates that the Nex!™ Orchestrator configuration is incorrect.

3.1 - Immediate resolution

The fastest way to resolve this issue is to manually re-setup the Nginx default SSL certificates. For that you will need access to:

Your wildcard certificate public key: e.g. mydomain.crt
Your wildcard certificate private key: e.g. mydomain.key
You Certificate Authority (CA) chain: e.g. my_cert_provider_bundle.crt

On the routing rack copy the content of your certificate private key into the following file (e.g. using vim):

/etc/nginx/default-certificates/default.key

On your local computer concatenate your certificate public key and CA chain:

# Cat both files to obtain a chained certificate
cat mydomain.crt my_cert_provider_bundle.crt

On the routing rack copy the content of the above into the following file (e.g. using vim):

/etc/nginx/default-certificates/default.crt

Once done start Nginx in the foreground to check that everything is fine:

# This will start nginx in foreground mode. Any startup issue will appear here.
# Use Ctrl-C to exit the process
/usr/local/openresty/nginx/sbin/nginx

Finally restart the Nginx service to bring it back up:

service nginx restart

After bringing Nginx back up and verifying that your web applications were accessible again proceed to section 3.2 to permanently resolve the issue.

3.2 - Permanent resolution

As we mentioned at the beginning of the section this kind of SSL issue is symptomatic of a configuration issue with the orchestrator itself. You should therefore review the SSL certificates configuration variables in Nex!™ to ensure they match your certificate keys.

a) Using Ansible

If you are using Maestrano's Ansible framework to deploy the Nex!™ orchestrator then you should review your ssl configuration in the *_secret.yml file. See this example of configuration file to understand what to look for:

https://github.com/maestrano/mno-deploy-nexmin/blob/master/ansible/vars/nexmin_prd_secrets.yml#L111

The following variables must be set properly:

cert_key_cube_default: your wildcard certificate private key with newline characters ("\n"). The content of the variable must be enclosed with SINGLE quotes
cert_chained_cube_default: the concatenation of your certificate and your Certificate Authority bundle. The variable must be a single line with newline characters ("\n") and must be enclosed with SINGLE quotes

Once your mno-deploy-* configuration package has been rebuilt (e.g. using Codeship) redeploy the orchestrator itself to update its configuration. You can do this through your favorite deployment tool (e.g. Rundeck) or by running the following commands on the Nex!™ orchestrator boxes directly:

# E.g on AWS
bash <(curl -s http://169.254.169.254/latest/user-data)

# E.g. on Azure
bash /opt/maestrano/redeploy.sh

b) Using Rails configuration

If you have deployed the Nex!™ orchestrator manually or using any other deployment framework (e.g. Chef) then you will need to modify your deployment variables to ensure that the Nex!™ config/application.yml file is setup properly.

On one of the orchestrator boxes navigate to the Nex!™ configuration folder:

# Go to the Rails config folder
cd /apps/nex/current/config

# Edit the application.yml file
vi application.yml

Ensure that the following configuration parameters are set properly:

ssl_cert_key_cube_default: your wildcard certificate private key with newline characters ("\n"). The content of the variable must be enclosed with DOUBLE quotes.
ssl_cert_chained_cube_default: the concatenation of your certificate and your Certificate Authority bundle. The variable must be a single line with newline characters ("\n") and must be enclosed with DOUBLE quotes

c) Final steps

Finally you need to instruct the Nex!™ orchestrator to reconfigure the routing racks. You can do so through the Nex!™ orchestrator console:

# Access one of the Nex!™ orchestrator boxes via SSH

# Access the rails console
cd /apps/nex/current
bundle exec rails c <uat|production>

# Reconfigure all routing racks
RoutingRack.where(status:'running).each(&:sync_base_config)

Nex!™ Routing - Site can't be reached