Switch Language
Toggle Theme

Docker Compose Production Deployment: Health Checks, Restart Policies, and Resource Limits

At 3 AM, a server alert blasted me out of bed.

I opened my laptop to find all containers showing running status, with healthy green dots. But when I tried to access the service? 502 Bad Gateway.

The database container hadn’t finished starting yet, but the application container had already frantically tried to connect. Connection failed, service down. The container was still “running”, but the service was long dead.

That was my real experience the first time I deployed Docker Compose to production.

Getting it running and keeping it running are two completely different things. Your docker-compose up might start everything with a single command, but that doesn’t mean it can stand tall when memory explodes or processes crash in the middle of the night.

The three essentials of Docker Compose production deployment—health checks, restart policies, and resource limits—are what fill this gap. This article shares the configuration methods I’ve learned from trial and error, complete with copy-paste ready YAML templates to help you transform your containers from “barely running” to “rock solid.”

1. Health Checks: Determining If a Container Is Truly Ready

The running status shown by docker ps only tells you the container process exists. But can it actually serve requests? You don’t know.

Health checks are Docker’s way of giving your containers regular “physical exams”: sending HTTP requests, testing database connections, or running scripts to see if the service is genuinely alive.

How Health Checks Work

Docker sends check commands to your container at intervals you define. A return code of 0 means healthy; non-zero means unhealthy. After consecutive failures, the container gets marked as unhealthy.

Here’s the key: Health check failures don’t automatically trigger restarts. They just reveal the status, telling you “this one has issues.” To make it self-healing, you need to combine it with depends_on conditional startup and restart policies.

Four Parameters You Need to Understand

healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
  interval: 30s      # How often to check
  timeout: 10s       # Timeout for each check
  retries: 3         # Consecutive failures to mark unhealthy
  start_period: 60s  # Grace period during startup (failures don't count)

I once ignored the start_period parameter. The result? My app started slowly, needed 40 seconds to connect to the database, but health checks started after 10 seconds. Three consecutive failures, immediately marked as unhealthy. After adding start_period: 60s, I gave the application enough initialization time.

Common Health Check Commands

Web Service:

healthcheck:
  test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
  interval: 30s
  timeout: 10s
  retries: 3
  start_period: 30s

PostgreSQL:

healthcheck:
  test: ["CMD-SHELL", "pg_isready -U postgres"]
  interval: 10s
  timeout: 5s
  retries: 5

Redis:

healthcheck:
  test: ["CMD", "redis-cli", "ping"]
  interval: 10s
  timeout: 3s
  retries: 3

Combining with depends_on for Dependency Startup

This is where health checks really shine: making dependent services wait until they’re truly ready.

services:
  app:
    depends_on:
      db:
        condition: service_healthy  # Wait for DB health check to pass
      redis:
        condition: service_healthy  # Wait for Redis health check to pass

I used to just use depends_on: [db, redis], but the app container would start while the database was still initializing. Connection failed, immediate error and exit. After switching to condition: service_healthy, the application patiently waits until the database can respond to pg_isready before starting. Peace at last.

2. Restart Policies: Giving Containers Self-Healing Ability

When a container crashes, who brings it back up?

Manual docker restart? Try that at 3 AM when you get an alert.

Restart policies let the Docker daemon handle this for you. After a container exits, Docker automatically decides whether to restart it.

Comparing Four Policies

PolicyBehaviorUse Case
noIt crashes, it stays crashedTemporary testing, CI/CD
alwaysRestart no matter how it exitsCore services
on-failureOnly restart on abnormal exitTask-based containers
unless-stoppedAlways restart unless manually stoppedProduction favorite

Production Choice: unless-stopped

restart: unless-stopped

Why recommend unless-stopped over always?

The difference lies in: behavior after manual docker stop.

  • always: After manual stop, if the system or Docker service restarts, the container automatically starts again
  • unless-stopped: After manual stop, it stays stopped—won’t come back to life on its own

Imagine this: You manually stop a container for maintenance, then the server reboots and it starts running again. You might just want to say: what the hell.

Retry Limits with on-failure

on-failure can set retry limits:

restart: on-failure:5  # Restart at most 5 times

If a container fails to start 5 consecutive times, Docker gives up. Perfect for scenarios where repeated crashes might be caused by external issues (database unreachable, config errors)—preventing infinite restart loops.

How to Choose? Simple Guidelines

  • Core services (Web, API, Database): unless-stopped
  • Background tasks, scheduled scripts: on-failure
  • Development debugging, temporary runs: no

One pitfall: Restart policies only handle “should the container restart after exiting”. They don’t care about “is the service actually usable”. For that, you need health checks.

3. Resource Limits: Preventing Runaway Containers

Have you ever experienced this: one container has a memory leak, eats up all server memory, and the OOM Killer takes out all other containers.

I have. That feeling… well.

Resource limits set a “ceiling” for each container: exceed this limit, and it gets killed, protecting other services.

limits vs reservations

deploy:
  resources:
    limits:
      cpus: '1.0'      # Maximum 1 CPU
      memory: 512M     # Maximum 512MB memory
    reservations:
      cpus: '0.5'      # Reserve at least 0.5 CPU
      memory: 256M     # Reserve at least 256MB memory
  • limits: Hard limits—exceed and the process gets killed (OOM)
  • reservations: Soft limits—tell the scheduler “this container needs at least this much”

To put it plainly: limits is “can’t exceed”, reservations is “at least guaranteed”.

How to Set CPU Limits?

cpus: '1.0'   # Maximum 1 full CPU core
cpus: '0.5'   # Maximum 50% CPU
cpus: '2.0'   # Maximum 2 cores

CPU limits are soft—containers that exceed them get throttled, not killed. So err on the higher side.

How to Set Memory Limits?

memory: 512M    # 512MB
memory: 2G      # 2GB

Memory limits are hard limits. Exceed them, and the container gets killed by OOM Killer—no negotiation.

My empirical values:

  • Node.js applications: At least 512M, production recommended 1G
  • Python applications: 256M - 512M
  • PostgreSQL: Based on connections and data volume, 1G - 4G
  • Redis: 256M - 512M, larger if used for caching

A Practical Configuration

services:
  app:
    image: myapp:latest
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.25'
          memory: 256M

This configuration means: the app can use at most 1 CPU and 1GB memory, but Docker guarantees at least 0.25 CPU and 256MB memory.

Note: deploy configuration is primarily for Docker Swarm. For single-machine deployment with docker-compose up, resource limits work but require Docker Compose V2 or the docker-compose --compatibility flag. A more universal single-machine approach is using mem_limit and cpus (deprecated) or directly using deploy (supported in Compose V2.20+).

4. Complete Template: Production-Ready YAML You Can Copy

Each element is useful on its own, but combined they create production-grade configuration. Here’s a complete example with Web service + PostgreSQL + Redis that you can copy and adapt.

Complete Example

version: '3.8'

services:
  # Web Application
  app:
    image: myapp:latest
    restart: unless-stopped
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    deploy:
      resources:
        limits:
          cpus: '1.0'
          memory: 1G
        reservations:
          cpus: '0.25'
          memory: 256M
    logging:
      driver: json-file
      options:
        max-size: "10m"   # Max 10MB per log file
        max-file: "3"     # Keep at most 3 log files

  # PostgreSQL Database
  db:
    image: postgres:15
    restart: unless-stopped
    environment:
      POSTGRES_USER: appuser
      POSTGRES_PASSWORD: apppassword
      POSTGRES_DB: appdb
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
      start_period: 30s
    deploy:
      resources:
        limits:
          cpus: '2.0'
          memory: 2G
        reservations:
          cpus: '0.5'
          memory: 512M
    volumes:
      - pgdata:/var/lib/postgresql/data

  # Redis Cache
  redis:
    image: redis:7-alpine
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 512M
        reservations:
          memory: 128M

volumes:
  pgdata:

Configuration Separation Technique

Development and production environments usually need different configurations. Manage them with two separate files:

# compose.yaml - Development
version: '3.8'
services:
  app:
    build: .
    ports:
      - "3000:3000"
    restart: "no"  # Don't auto-restart during development
# compose.production.yaml - Production Override
version: '3.8'
services:
  app:
    image: myapp:v1.2.3  # Use pre-built image in production
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 3
    deploy:
      resources:
        limits:
          memory: 1G

Start production environment:

docker-compose -f compose.yaml -f compose.production.yaml up -d

The two files merge, with compose.production.yaml overriding compose.yaml.

Log Management: Preventing Disk Full

Docker’s default log driver is json-file, and logs grow indefinitely. Without limits, your disk gets eaten by logs after a few months.

logging:
  driver: json-file
  options:
    max-size: "10m"   # Max 10MB per file
    max-file: "3"     # Max 3 files, total 30MB max

Each container gets max 30MB of logs with automatic rotation. I add this to every service now—saves me from manual cleanup later.

Summary

After all that, the core logic of the three essentials is:

Health checks detect problems → Restart policies auto-recover → Resource limits contain failures

With this combination, your Docker Compose application can:

  1. Wait for dependencies to be truly ready at startup, instead of blindly rushing in
  2. Get back up on its own after crashing—you don’t need to wake up at 3 AM
  3. Prevent one runaway container from taking down the entire server

Now go check your docker-compose.yml. What’s missing? Add it.

If you haven’t started using Docker Compose for production deployment yet, bring these three configurations next time. You’ll thank yourself.

Configure Docker Compose Production Deployment Essentials

Add health checks, restart policies, and resource limits to Docker Compose for production-grade stable deployment

⏱️ Estimated time: 15 min

  1. 1

    Step1: Add health check configuration

    Add healthcheck configuration for each service:

    • test: Check command (curl, pg_isready, redis-cli ping, etc.)
    • interval: Check interval (recommended 10-30s)
    • timeout: Timeout duration (recommended 5-10s)
    • retries: Number of failures (recommended 3-5)
    • start_period: Startup grace period (set 30-60s based on app startup time)
  2. 2

    Step2: Configure dependency conditional startup

    Use the condition parameter of depends_on:

    • Change depends_on: [db] to depends_on: db: condition: service_healthy
    • Ensure dependent services have health checks configured
    • Application will wait for dependencies to be truly available before starting
  3. 3

    Step3: Set restart policy

    Choose restart policy based on service type:

    • Core services (Web/API/Database): restart: unless-stopped
    • Background tasks/scheduled scripts: restart: on-failure:5
    • Development debugging: restart: "no"
    • Avoid using always (may unexpectedly restart after manual stop)
  4. 4

    Step4: Configure resource limits

    Set limits and reservations in deploy.resources:

    • limits: Hard limits, will be killed by OOM Killer if exceeded
    • reservations: Soft limits, minimum resources guaranteed by Docker
    • Node.js applications: limits.memory recommended at least 512M
    • Databases: Set 1G-4G based on connections and data volume
  5. 5

    Step5: Add log rotation configuration

    Prevent log files from filling up disk:

    • logging.driver: json-file (default driver)
    • logging.options.max-size: "10m" (max 10MB per file)
    • logging.options.max-file: "3" (keep 3 files)
    • Total log cap 30MB, automatic rotation

FAQ

Will a container automatically restart after health check failure?
No. Health checks only mark the container as unhealthy and don't automatically trigger restarts. You need to combine them with restart policies (like unless-stopped) and depends_on's service_healthy condition. The restart policy only intervenes when the container process crashes and exits.
What's the difference between unless-stopped and always?
The key difference lies in behavior after manual stop: with always policy, after manually executing docker stop, if the server or Docker service restarts, the container will automatically start again; with unless-stopped policy, after manual stop, the container remains stopped and won't come back to life on its own. Production environments recommend unless-stopped.
What's the difference between limits and reservations in resource limits?
limits are hard limits—containers that exceed limits.memory will be killed by OOM Killer; reservations are soft limits that tell the Docker scheduler this container needs at least this much resources, but the container can use more. CPU limits are soft (exceeding gets throttled), memory limits are hard (exceeding gets killed).
What does the start_period parameter do?
start_period is a startup grace period. Health check failures during this time don't count toward retries. For slow-starting applications (like those needing 40 seconds to connect to database), setting start_period: 60s prevents the app from being marked unhealthy immediately after startup.
How to use different configurations for development and production?
Use two configuration files for separation:

• compose.yaml: Development config (build: ., restart: "no")
• compose.production.yaml: Production override (image: xxx, restart: unless-stopped)
• Start command: docker-compose -f compose.yaml -f compose.production.yaml up -d
• The latter overrides the former's configuration, achieving configuration separation
Can I use deploy.resources for single-machine deployment?
Yes. While deploy configuration is primarily for Docker Swarm, Docker Compose V2.20+ supports single-machine use. If using older versions, you need the --compatibility flag, or use the deprecated mem_limit/cpus parameters. Recommend upgrading to the latest Docker Compose version.

8 min read · Published on: Apr 24, 2026 · Modified on: Apr 25, 2026

Related Posts

Comments

Sign in with GitHub to leave a comment