Investigating container issues

This articles provides a general methodology for investigating container issues on Nex!™. It lists the main log files and system commands to use in case you have difficulties with a given container.





1 - High Level investigation using the CLI

Retrieve the container ID that crashed from the app name

# For an app
nex-cli cubes --app my_app

# For an addon
nex-cli cubes --addon my_app

###### Example of output
+--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+
|                                                   Cube Instances                                                    |
+--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+
| ID                                   | STATUS  | CONTAINER | REGION         | STORAGE | IP         | PORT | CLUSTER |
+--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+
| 89526d2e-0e4b-4aac-ba75-5d10c49e6282 | running | running   | ap-southeast-1 | false   | 10.0.1.1   | -    | -       |
| 1945a65c-b79f-428e-b820-0dc4d3878bd1 | running | running   | ap-southeast-1 | false   | 10.0.1.1   | -    | -       |
+--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+


Lookup the container events from the CLI. If a container failed on start you will find crash logs in the event log.

# Retrieve events for a given container ID
nex-cli cubes:events 7618b01c-fb14-4f8a-b08b-2281f9c5c27c

###### Example of output
2017-05-18 06:01:38 UTC | container    | ubuntu          | 1296   | INFO   | Nginx has been reloaded and has deregistered 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:38 UTC | container    | ubuntu          | 1296   | INFO   | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been deregistered successfully
2017-05-18 06:01:38 UTC | container    | ubuntu          | -      | INFO   | Log drain has stopped watching 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:38 UTC | container    | ubuntu          | 8114   | INFO   | Stopping container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:44 UTC | container    | ubuntu          | 8114   | INFO   | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been stopped successfully
2017-05-18 06:01:44 UTC | container    | ubuntu          | 26596  | INFO   | Destroying 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:44 UTC | container    | ubuntu          | 26596  | INFO   | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been destroyed successfully
2017-05-18 06:02:01 UTC | status       | system          | -      | INFO   | Node is now stopped...


2 - Looking up the system logs

In the event where the `cubes:events` logs do not provide enough information you will need to jump on the Compute Rack and investigate the system logs.

# Jump onto the rack (adapt the IP address) 
nex-cli racks:ssh 10.0.0.1

# Put your container ID in a variable
ctx_id=7618b01c-fb14-4f8a-b08b-2281f9c5c27c


2.1 - Process grep-ping

If your container is starting, provisioning or running you may want to check the current process activity

# Grep processes
ps -ef | grep $ctx_id

###### Example of output
# Processes below are related to the log drain
xxxx 23627     1  0 May18 ?        00:00:00 /bin/bash /usr/local/bin/nex-docker-log 8a2a21ad-168c-45aa-9537-fc32d90c69a3
xxxx 23652 23627  0 May18 ?        00:00:22 /bin/bash /usr/local/bin/nex-docker-log-stream 8a2a21ad-168c-45aa-9537-fc32d90c69a3
xxxx 23650 23627  0 May18 ?        00:00:06 docker logs --tail 50 --follow=true --timestamps=true 8a2a21ad-168c-45aa-9537-fc32d90c69a3

# This is monit performing a health check on the container
xxxx     28644  1793  0 01:32 ?        00:00:00 /bin/bash /etc/monit/bin/check-cube 8a2a21ad-168c-45aa-9537-fc32d90c69a3

# This is someone logged in inside the container
johnd+ 32188 32183  0 May18 pts/80   00:00:00 /bin/bash /usr/local/bin/nex-attach 8a2a21ad-168c-45aa-9537-fc32d90c69a3
xxxx   32200 32188  0 May18 pts/80   00:00:00 sudo docker exec -it 8a2a21ad-168c-45aa-9537-fc32d90c69a3 /bin/sh -c if command -v bash > /dev/null; then bash; else sh; fi
xxxx   32201 32200  0 May18 pts/80   00:00:12 docker exec -it 8a2a21ad-168c-45aa-9537-fc32d90c69a3 /bin/sh -c if command -v bash > /dev/null; then bash; else sh; fi


2.2 Nex!™ Agent logs

Check the Nex!™ agent logs. The agent logs provide insights on the control actions (start, restart etc.) recently taken for this container.  

# Check the Nex!™ agent logs for this container
cat /var/log/nex/nex.log | grep $ctx_id

##### Example of output
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Creating local container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 (using nex-create)
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-create | Container for 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been created successfully
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been created
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Starting cube...
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Starting container 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:02:14 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-log-drain | Log drain is now watching 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:02:14 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Waiting for container to be up and running... (healthcheck)
2017-05-18 06:05:05 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Container is now up.
2017-05-18 06:05:05 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-monit-monitor | Local container /containers/8a2a21ad-168c-45aa-9537-fc32d90c69a3 exists
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-monit-monitor | Monit is now monitoring 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been started successfully
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been started successfully
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | No persistent storage. Skipping container backup.
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Local container /containers/8a2a21ad-168c-45aa-9537-fc32d90c69a3 exists
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Checked that container was running
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Cube IP is: 172.17.0.59
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | Host DNS is: 296fd6b7-e0d2-4516-8f8a-da874abb9089.app.internal
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | File /etc/nginx/sites-enabled/app-296fd6b7-e0d2-4516-8f8a-da874abb9089 has been created
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | Nginx virtual host for cluster app-296fd6b7-e0d2-4516-8f8a-da874abb9089 has been updated successfully
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | * Reloading nginx configuration nginx
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | ...done.
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Nginx has been reloaded and is now proxying traffic to 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been registered successfully
2017-05-18 06:05:10 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been registered successfully
2017-05-18 06:05:10 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Cube 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been setup successfully


2.3 Monit logs

Check the monit logs. The monit logs provide some level of information on why a container got restarted (e.g. healthcheck fail)

# Grep monit logs (grep out verbose information)
cat /var/log/monit.log | grep $ctx_id | egrep -v "(monitor on|monitor action|unmonitor on|unmonitor action)"


2.4 Docker container logs 

If your container is running you may want to inspect it with Docker and tail the logs

# Inspect container
docker inspect $ctx_id

# Tail last 100 log entries (--tail=100) and follow log trail (-f)
docker logs --tail=100 -f $ctx_id


2.5 Docker daemon logs 

If the above still does not help understanding the issue you may want to have a look at the Docker daemon logs directly

# Retrieve the internal Docker (short) id of your container
dk_ctx_id=$(docker inspect --format="{{ .Config.Hostname }}" $ctx_id)

# Grep docker daemon logs based on container name or internal id
cat /var/log/upstart/docker.log | egrep "($ctx_id|$dk_ctx_id)"