Working around deployment timeout

This article explains how to work around long running deployments with Nex!™ (e.g. migration taking more than 10 minutes).

 





1 - Deployment time limit

Nex!™ uses Docker health-checks to check the status of starting/running containers. When deploying a container (new app, scale up etc.) it is expected that the container will not be immediately become healthy. Nex!™ allows containers to be unhealthy for 10 mins upon deployment before killing the container. This limit typically prevents misconfigured changes on database settings (e.g. stale database pool) or associated web service configuration (e.g. developer platform config).

The deployment time limit of 10 mins only applies to applications with a Docker health-check configured - which is typically the case for applications relying on maestrano/web-ruby or maestrano/web-jruby. Applications that do not have a health-check are always considered "healthy" by the Nex!™ orchestrator (even if they are not).

2 - Working around the limit

When a deployment involves a long running task - e.g. long running migration - it can be useful to be able to deactivate this 10min deployment limit.

The simplest way to deactivate this limitation is by forcing the container health-check to return a zero status code. By doing so the container will be considered healthy and will not be killed by the orchestrator after reaching the 10mins timeout.

Deactivating the Docker health-check on maestrano/web-ruby and maestrano/web-jruby images is easy. Simply set the NO_HEALTHCHECK=true environment variable on your application:

# Deactivate Docker health-checks on my_app
nex-cli apps:vars my_app -a NO_HEALTHCHECK=true

Once your migration is done you can remove this environment variable by running the following command:

# Re-activate Docker health-checks on my_app (delete env var)
nex-cli apps:vars my_app -d NO_HEALTHCHECK

Potential Application Outage

By forcing the healthcheck to be successful Nex!™ will consider that your application is up and running even though your migrations / long running tasks are still running. This is likely to generate an outage because your web processes - e.g. puma processes - will certainly not start until the task is finished but Nex!™ will route traffic to your containers as they are flagged as healthy.