Investigating container issues
This articles provides a general methodology for investigating container issues on Nex!™. It lists the main log files and system commands to use in case you have difficulties with a given container.
1 - High Level investigation using the CLI
Retrieve the container ID that crashed from the app name
# For an app nex-cli cubes --app my_app # For an addon nex-cli cubes --addon my_app ###### Example of output +--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+ | Cube Instances | +--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+ | ID | STATUS | CONTAINER | REGION | STORAGE | IP | PORT | CLUSTER | +--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+ | 89526d2e-0e4b-4aac-ba75-5d10c49e6282 | running | running | ap-southeast-1 | false | 10.0.1.1 | - | - | | 1945a65c-b79f-428e-b820-0dc4d3878bd1 | running | running | ap-southeast-1 | false | 10.0.1.1 | - | - | +--------------------------------------+---------+-----------+----------------+---------+------------+------+---------+
Lookup the container events from the CLI. If a container failed on start you will find crash logs in the event log.
# Retrieve events for a given container ID nex-cli cubes:events 7618b01c-fb14-4f8a-b08b-2281f9c5c27c ###### Example of output 2017-05-18 06:01:38 UTC | container | ubuntu | 1296 | INFO | Nginx has been reloaded and has deregistered 7618b01c-fb14-4f8a-b08b-2281f9c5c27c 2017-05-18 06:01:38 UTC | container | ubuntu | 1296 | INFO | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been deregistered successfully 2017-05-18 06:01:38 UTC | container | ubuntu | - | INFO | Log drain has stopped watching 7618b01c-fb14-4f8a-b08b-2281f9c5c27c 2017-05-18 06:01:38 UTC | container | ubuntu | 8114 | INFO | Stopping container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c 2017-05-18 06:01:44 UTC | container | ubuntu | 8114 | INFO | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been stopped successfully 2017-05-18 06:01:44 UTC | container | ubuntu | 26596 | INFO | Destroying 7618b01c-fb14-4f8a-b08b-2281f9c5c27c 2017-05-18 06:01:44 UTC | container | ubuntu | 26596 | INFO | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been destroyed successfully 2017-05-18 06:02:01 UTC | status | system | - | INFO | Node is now stopped...
2 - Looking up the system logs
In the event where the `cubes:events` logs do not provide enough information you will need to jump on the Compute Rack and investigate the system logs.
# Jump onto the rack (adapt the IP address) nex-cli racks:ssh 10.0.0.1 # Put your container ID in a variable ctx_id=7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2.1 - Process grep-ping
If your container is starting, provisioning or running you may want to check the current process activity
# Grep processes ps -ef | grep $ctx_id ###### Example of output # Processes below are related to the log drain xxxx 23627 1 0 May18 ? 00:00:00 /bin/bash /usr/local/bin/nex-docker-log 8a2a21ad-168c-45aa-9537-fc32d90c69a3 xxxx 23652 23627 0 May18 ? 00:00:22 /bin/bash /usr/local/bin/nex-docker-log-stream 8a2a21ad-168c-45aa-9537-fc32d90c69a3 xxxx 23650 23627 0 May18 ? 00:00:06 docker logs --tail 50 --follow=true --timestamps=true 8a2a21ad-168c-45aa-9537-fc32d90c69a3 # This is monit performing a health check on the container xxxx 28644 1793 0 01:32 ? 00:00:00 /bin/bash /etc/monit/bin/check-cube 8a2a21ad-168c-45aa-9537-fc32d90c69a3 # This is someone logged in inside the container johnd+ 32188 32183 0 May18 pts/80 00:00:00 /bin/bash /usr/local/bin/nex-attach 8a2a21ad-168c-45aa-9537-fc32d90c69a3 xxxx 32200 32188 0 May18 pts/80 00:00:00 sudo docker exec -it 8a2a21ad-168c-45aa-9537-fc32d90c69a3 /bin/sh -c if command -v bash > /dev/null; then bash; else sh; fi xxxx 32201 32200 0 May18 pts/80 00:00:12 docker exec -it 8a2a21ad-168c-45aa-9537-fc32d90c69a3 /bin/sh -c if command -v bash > /dev/null; then bash; else sh; fi
2.2 Nex!™ Agent logs
Check the Nex!™ agent logs. The agent logs provide insights on the control actions (start, restart etc.) recently taken for this container.
# Check the Nex!™ agent logs for this container cat /var/log/nex/nex.log | grep $ctx_id ##### Example of output 2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Creating local container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 (using nex-create) 2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-create | Container for 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been created successfully 2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been created 2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Starting cube... 2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Starting container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 2017-05-18 06:02:14 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-log-drain | Log drain is now watching 8a2a21ad-168c-45aa-9537-fc32d90c69a3 2017-05-18 06:02:14 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Waiting for container to be up and running... (healthcheck) 2017-05-18 06:05:05 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Container is now up. 2017-05-18 06:05:05 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-monit-monitor | Local container /containers/8a2a21ad-168c-45aa-9537-fc32d90c69a3 exists 2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-monit-monitor | Monit is now monitoring 8a2a21ad-168c-45aa-9537-fc32d90c69a3 2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been started successfully 2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been started successfully 2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | No persistent storage. Skipping container backup. 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Local container /containers/8a2a21ad-168c-45aa-9537-fc32d90c69a3 exists 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Checked that container was running 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Cube IP is: 172.17.0.59 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | Host DNS is: 296fd6b7-e0d2-4516-8f8a-da874abb9089.app.internal 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | File /etc/nginx/sites-enabled/app-296fd6b7-e0d2-4516-8f8a-da874abb9089 has been created 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | Nginx virtual host for cluster app-296fd6b7-e0d2-4516-8f8a-da874abb9089 has been updated successfully 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | * Reloading nginx configuration nginx 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | ...done. 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Nginx has been reloaded and is now proxying traffic to 8a2a21ad-168c-45aa-9537-fc32d90c69a3 2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been registered successfully 2017-05-18 06:05:10 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been registered successfully 2017-05-18 06:05:10 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Cube 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been setup successfully
2.3 Monit logs
Check the monit logs. The monit logs provide some level of information on why a container got restarted (e.g. healthcheck fail)
# Grep monit logs (grep out verbose information) cat /var/log/monit.log | grep $ctx_id | egrep -v "(monitor on|monitor action|unmonitor on|unmonitor action)"
2.4 Docker container logs
If your container is running you may want to inspect it with Docker and tail the logs
# Inspect container docker inspect $ctx_id # Tail last 100 log entries (--tail=100) and follow log trail (-f) docker logs --tail=100 -f $ctx_id
2.5 Docker daemon logs
If the above still does not help understanding the issue you may want to have a look at the Docker daemon logs directly
# Retrieve the internal Docker (short) id of your container dk_ctx_id=$(docker inspect --format="{{ .Config.Hostname }}" $ctx_id) # Grep docker daemon logs based on container name or internal id cat /var/log/upstart/docker.log | egrep "($ctx_id|$dk_ctx_id)"