This articles provides a general methodology for investigating container issues on Nex!™. It lists the main log files and system commands to use in case you have difficulties with a given container.
Image Added
...
...
1 - High Level investigation using the CLI
Retrieve the container ID that crashed from the app name
...
Code Block |
---|
|
# Retrieve events for a given container ID
nex-cli cubes:events 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
###### Example of output
2017-05-18 06:01:38 UTC | container | ubuntu | 1296 | INFO | Nginx has been reloaded and has deregistered 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:38 UTC | container | ubuntu | 1296 | INFO | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been deregistered successfully
2017-05-18 06:01:38 UTC | container | ubuntu | - | INFO | Log drain has stopped watching 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:38 UTC | container | ubuntu | 8114 | INFO | Stopping container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:44 UTC | container | ubuntu | 8114 | INFO | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been stopped successfully
2017-05-18 06:01:44 UTC | container | ubuntu | 26596 | INFO | Destroying 7618b01c-fb14-4f8a-b08b-2281f9c5c27c
2017-05-18 06:01:44 UTC | container | ubuntu | 26596 | INFO | Container 7618b01c-fb14-4f8a-b08b-2281f9c5c27c has been destroyed successfully
2017-05-18 06:02:01 UTC | status | system | - | INFO | Node is now stopped... |
2 - Looking up the system logs
In the event where the `cubes:events` logs do not provide enough information you will need to jump on the Compute Rack and investigate the system logs.
Code Block |
---|
|
# Jump onto the rack (adapt the IP address)
nex-cli racks:ssh 10.0.0.1
# Put your container ID in a variable
ctx_id=7618b01c-fb14-4f8a-b08b-2281f9c5c27c |
2.1 - Process grep-ping
If your container is starting, provisioning or running you may want to check the current process activity
Code Block |
---|
|
# Grep processes
ps -ef | grep $ctx_id
###### Example of output
# Processes below are related to the log drain
xxxx 23627 1 0 May18 ? 00:00:00 /bin/bash /usr/local/bin/nex-docker-log 8a2a21ad-168c-45aa-9537-fc32d90c69a3
xxxx 23652 23627 0 May18 ? 00:00:22 /bin/bash /usr/local/bin/nex-docker-log-stream 8a2a21ad-168c-45aa-9537-fc32d90c69a3
xxxx 23650 23627 0 May18 ? 00:00:06 docker logs --tail 50 --follow=true --timestamps=true 8a2a21ad-168c-45aa-9537-fc32d90c69a3
# This is monit performing a health check on the container
xxxx 28644 1793 0 01:32 ? 00:00:00 /bin/bash /etc/monit/bin/check-cube 8a2a21ad-168c-45aa-9537-fc32d90c69a3
# This is someone logged in inside the container
johnd+ 32188 32183 0 May18 pts/80 00:00:00 /bin/bash /usr/local/bin/nex-attach 8a2a21ad-168c-45aa-9537-fc32d90c69a3
xxxx 32200 32188 0 May18 pts/80 00:00:00 sudo docker exec -it 8a2a21ad-168c-45aa-9537-fc32d90c69a3 /bin/sh -c if command -v bash > /dev/null; then bash; else sh; fi
xxxx 32201 32200 0 May18 pts/80 00:00:12 docker exec -it 8a2a21ad-168c-45aa-9537-fc32d90c69a3 /bin/sh -c if command -v bash > /dev/null; then bash; else sh; fi |
2.2 Nex!™ Agent logs
Check the Nex!™ agent logs. The agent logs provide insights on the control actions (start, restart etc.) recently taken for this container.
Code Block |
---|
# Check the Nex!™ agent logs for this container
cat /var/log/nex/nex.log | grep $ctx_id
##### Example of output
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Creating local container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 (using nex-create)
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-create | Container for 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been created successfully
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been created
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Starting cube...
2017-05-18 06:02:13 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Starting container 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:02:14 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-log-drain | Log drain is now watching 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:02:14 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Waiting for container to be up and running... (healthcheck)
2017-05-18 06:05:05 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Container is now up.
2017-05-18 06:05:05 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-monit-monitor | Local container /containers/8a2a21ad-168c-45aa-9537-fc32d90c69a3 exists
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-monit-monitor | Monit is now monitoring 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-start | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been started successfully
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been started successfully
2017-05-18 06:05:08 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | No persistent storage. Skipping container backup.
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Local container /containers/8a2a21ad-168c-45aa-9537-fc32d90c69a3 exists
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Checked that container was running
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Cube IP is: 172.17.0.59
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | Host DNS is: 296fd6b7-e0d2-4516-8f8a-da874abb9089.app.internal
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | File /etc/nginx/sites-enabled/app-296fd6b7-e0d2-4516-8f8a-da874abb9089 has been created
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register-host | Nginx virtual host for cluster app-296fd6b7-e0d2-4516-8f8a-da874abb9089 has been updated successfully
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | * Reloading nginx configuration nginx
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | ...done.
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Nginx has been reloaded and is now proxying traffic to 8a2a21ad-168c-45aa-9537-fc32d90c69a3
2017-05-18 06:05:09 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-register | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been registered successfully
2017-05-18 06:05:10 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Container 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been registered successfully
2017-05-18 06:05:10 UTC | INFO | 8a2a21ad-168c-45aa-9537-fc32d90c69a3 | nex-setup | Cube 8a2a21ad-168c-45aa-9537-fc32d90c69a3 has been setup successfully |
2.3 Monit logs
Check the monit logs. The monit logs provide some level of information on why a container got restarted (e.g. healthcheck fail)
Code Block |
---|
|
# Grep monit logs (grep out verbose information)
cat /var/log/monit.log | grep $ctx_id | egrep -v "(monitor on|monitor action|unmonitor on|unmonitor action)" |
2.4 Docker container logs
If your container is running you may want to inspect it with Docker and tail the logs
Code Block |
---|
|
# Inspect container
docker inspect $ctx_id
# Tail last 100 log entries (--tail=100) and follow log trail (-f)
docker logs --tail=100 -f $ctx_id |
2.5 Docker daemon logs
If the above still does not help understanding the issue you may want to have a look at the Docker daemon logs directly
...