Introduction to Multi-Stage Builds
In our previous lecture, we explored how to adapt Docker for production environments. One of the key challenges we identified was creating images that are both optimized for production (small, secure, performant) while still supporting an efficient development workflow.
Multi-stage builds provide an elegant solution to this challenge. Think of them as a way to use multiple "workbenches" during the construction process, but only shipping the final product—leaving behind all the tools, scraps, and intermediate components that were necessary for building but aren't needed for running the application.
Multi-stage builds allow us to:
- Use all the tools needed for building and testing in early stages
- Copy only the necessary artifacts to the final image
- Significantly reduce the size and attack surface of production images
- Maintain a clean, readable Dockerfile that documents the build process
Multi-Stage Build Basics
The Problem: Single-Stage Builds
Let's start by understanding the limitations of traditional single-stage builds:
# Single-stage Dockerfile example
FROM node:18
WORKDIR /app
# Install build dependencies
COPY package*.json ./
RUN npm install
# Copy source code
COPY . .
# Build the application
RUN npm run build
# Start the application
CMD ["npm", "start"]
This approach has several drawbacks:
- Bloated images: Includes build tools, source code, and dependencies
- Larger attack surface: Unnecessary components increase security risk
- Slower deployments: Larger images take longer to transfer and deploy
- Inefficient caching: Changes to source often invalidate dependency layers
The Solution: Multi-Stage Builds
Here's the same application built using a multi-stage approach:
# Multi-stage Dockerfile example
# Stage 1: Build stage
FROM node:18 AS builder
WORKDIR /app
# Install dependencies
COPY package*.json ./
RUN npm install
# Copy source code and build
COPY . .
RUN npm run build
# Stage 2: Production stage
FROM node:18-alpine
WORKDIR /app
# Install production dependencies only
COPY package*.json ./
RUN npm install --production
# Copy built assets from builder stage
COPY --from=builder /app/dist ./dist
# Start the application
CMD ["node", "dist/server.js"]
Key improvements:
- Smaller final image: Only includes runtime components
- Reduced attack surface: Build tools and source code not included
- Better caching: Build dependencies separate from runtime dependencies
- Clear separation of concerns: Build vs. runtime environments
Syntax and Structure
The basic syntax of a multi-stage build includes:
- Multiple FROM statements: Each FROM statement starts a new stage
- Stage names with AS: Name stages for reference (e.g.,
FROM image AS stagename) - COPY --from: Copy artifacts from a previous stage
...build steps..."] --> B["FROM image2 AS test
...test steps..."] B --> C["FROM image3
...final steps..."] A -- "COPY --from=builder" --> C B -- "COPY --from=test" --> C
You can have as many stages as needed, but typically you'll see 2-4 stages in most multi-stage builds.
Multi-Stage Build Use Cases
Compiled Language Applications
One of the most common use cases is for compiled languages like Go, Rust, or C++:
# Golang multi-stage example
# Build stage
FROM golang:1.20 AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app .
# Final stage
FROM alpine:3.18
RUN apk --no-cache add ca-certificates
WORKDIR /root/
# Copy the binary from builder
COPY --from=builder /app/app .
# Run the binary
CMD ["./app"]
Benefits for compiled languages:
- Dramatic size reduction: From ~1GB (Go toolchain) to ~20MB (final binary)
- No source code in production: Source remains in build stage only
- Minimal runtime: Only the compiled binary and essential libraries
Frontend Applications
For frontend applications, multi-stage builds can separate the build process from the web server:
# React application example
# Build stage
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Production stage
FROM nginx:alpine
# Copy build output to nginx
COPY --from=builder /app/build /usr/share/nginx/html
# Copy custom nginx config if needed
COPY nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 80
CMD ["nginx", "-g", "daemon off;"]
This approach:
- Separates the Node.js build environment from the nginx runtime
- Reduces final image size dramatically (30-50MB vs 1GB+)
- Simplifies the runtime environment (static files only)
Full-Stack Applications
For full-stack applications, you can combine multiple build stages:
# Full-stack application example
# Frontend build stage
FROM node:18 AS frontend-builder
WORKDIR /app/frontend
COPY frontend/package*.json ./
RUN npm ci
COPY frontend/ .
RUN npm run build
# Backend build stage
FROM maven:3.8.6-openjdk-17 AS backend-builder
WORKDIR /app/backend
COPY backend/pom.xml .
RUN mvn dependency:go-offline
COPY backend/src ./src
RUN mvn package -DskipTests
# Final stage
FROM openjdk:17-slim
WORKDIR /app
# Copy backend JAR
COPY --from=backend-builder /app/backend/target/*.jar app.jar
# Copy frontend build
COPY --from=frontend-builder /app/frontend/build ./public
EXPOSE 8080
CMD ["java", "-jar", "app.jar"]
This multi-stage approach:
- Builds frontend and backend separately with appropriate tools
- Combines them in a single runtime image
- Avoids including any build tools in the final image
Advanced Multi-Stage Techniques
Targeting Specific Stages
You can build specific stages using the --target flag:
# Build only up to the test stage
docker build --target test -t myapp:test .
# Build the complete image
docker build -t myapp:latest .
This is useful for:
- CI/CD pipelines: Run tests in one stage, build production image if tests pass
- Development: Use a dev-focused stage for local development
- Debugging: Create larger images with debugging tools for troubleshooting
Parallel Building with BuildKit
Docker BuildKit allows parallel execution of stages that don't depend on each other:
Enable BuildKit for more efficient builds:
# Enable BuildKit
export DOCKER_BUILDKIT=1
# Or use with specific build
DOCKER_BUILDKIT=1 docker build -t myapp .
Cross-Platform Building
Multi-stage builds work well with multi-platform images:
# Build for multiple platforms
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:latest .
Each stage can target a specific platform if needed:
# Multi-platform example
FROM --platform=$BUILDPLATFORM golang:1.20 AS builder
ARG TARGETPLATFORM
ARG BUILDPLATFORM
RUN echo "Building on $BUILDPLATFORM for $TARGETPLATFORM"
# Platform-specific build commands
RUN case "$TARGETPLATFORM" in \
"linux/amd64") GOARCH=amd64 ;; \
"linux/arm64") GOARCH=arm64 ;; \
*) echo "Unsupported platform: $TARGETPLATFORM" && exit 1 ;; \
esac && \
CGO_ENABLED=0 GOOS=linux GOARCH=$GOARCH go build -o app .
# Final stage
FROM --platform=$TARGETPLATFORM alpine:3.18
COPY --from=builder /app .
CMD ["./app"]
Using External Build Context
You can copy from completely separate build contexts:
# Copy from another repository
COPY --from=nginx:latest /etc/nginx/nginx.conf /nginx.conf
# Use local directory as context
COPY --from=app-builder /app/build ./public
This enables interesting patterns like:
- Copying configuration from official images
- Including pre-built components from other images
- Creating composite images from multiple sources
Optimizing Multi-Stage Builds
Layer Optimization
Properly ordering operations for maximum cache efficiency:
# BEFORE: Inefficient caching
FROM node:18 AS builder
WORKDIR /app
# Copying everything at once means any file change
# invalidates the npm install cache
COPY . .
RUN npm install
RUN npm run build
# AFTER: Better layer caching
FROM node:18 AS builder
WORKDIR /app
# Copy dependency files first
COPY package*.json ./
RUN npm install
# Copy source code and build
COPY . .
RUN npm run build
Selecting Appropriate Base Images
Different stages can use different base images optimized for their purpose:
(large but includes all tools)"] C[Test Stage] --> D["Test Environment
(includes test frameworks)"] E[Production Stage] --> F["Minimal Runtime
(small, secure)"]
Common patterns:
- Build stage: Full SDK/toolchain images (node:18, golang:1.20)
- Test stage: Images with test frameworks (cypress/included, phpunit)
- Production stage: Minimal runtime (alpine, distroless, scratch)
# Example of different base images per stage
# Build stage - full SDK
FROM node:18 AS builder
# ... build steps ...
# Test stage - includes test frameworks
FROM cypress/included:12.8.1 AS test
# ... test steps ...
# Production stage - minimal runtime
FROM node:18-alpine AS production
# ... production setup ...
Minimizing the Final Image
Techniques to further reduce the final image size:
# Minimal final stage example
# For a static website
FROM nginx:alpine
# Use non-root user
USER nginx
# Copy only what's needed
COPY --from=builder /app/build /usr/share/nginx/html
# Remove default nginx config
RUN rm /etc/nginx/conf.d/default.conf
# Add custom minimal config
COPY nginx.conf /etc/nginx/conf.d/
# Remove unnecessary files and clear cache
RUN rm -rf /var/cache/apk/* && \
rm -rf /tmp/*
# Make filesystem read-only where possible
RUN chmod -R 555 /usr/share/nginx/html
Additional techniques:
- Use distroless images: No shell or package manager (gcr.io/distroless/base)
- Use scratch images: Empty base for statically compiled applications
- Multi-stage compression: Compress assets in build stage before copying
- Precise file selection: Copy specific files rather than entire directories
Real-World Multi-Stage Examples
Example 1: Python Application
# Python application with multi-stage build
# Build stage
FROM python:3.10 AS builder
WORKDIR /app
# Install build dependencies
RUN pip install --no-cache-dir poetry
# Copy dependency definitions
COPY pyproject.toml poetry.lock ./
# Configure poetry to use system Python
RUN poetry config virtualenvs.create false \
&& poetry install --no-interaction --no-ansi --no-dev
# Copy application code
COPY . .
# Generate static files if needed
RUN python manage.py collectstatic --noinput
# Final stage
FROM python:3.10-slim
WORKDIR /app
# Install runtime dependencies
COPY --from=builder /usr/local/lib/python3.10/site-packages /usr/local/lib/python3.10/site-packages
COPY --from=builder /usr/local/bin/ /usr/local/bin/
# Copy application code
COPY --from=builder /app /app
# Copy static files
COPY --from=builder /app/staticfiles /app/staticfiles
# Create non-root user
RUN useradd -m appuser
USER appuser
# Run the application
CMD ["gunicorn", "myproject.wsgi:application", "--bind", "0.0.0.0:8000"]
Example 2: Java Spring Boot Application
# Java Spring Boot application with multi-stage build
# Build stage
FROM maven:3.8.6-openjdk-17 AS builder
WORKDIR /app
# Copy pom file and download dependencies
COPY pom.xml .
RUN mvn dependency:go-offline
# Copy source code and build
COPY src ./src
RUN mvn package -DskipTests
# Test stage
FROM maven:3.8.6-openjdk-17 AS test
WORKDIR /app
COPY --from=builder /app /app
RUN mvn test
# Production stage
FROM eclipse-temurin:17-jre-alpine
WORKDIR /app
# Copy the JAR file
COPY --from=builder /app/target/*.jar app.jar
# Create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
USER appuser
# Configure JVM options
ENV JAVA_OPTS="-XX:+UseContainerSupport -XX:MaxRAMPercentage=75.0"
# Expose port
EXPOSE 8080
# Run the application
ENTRYPOINT ["sh", "-c", "java $JAVA_OPTS -jar app.jar"]
Example 3: Rust Web Service
# Rust web service with multi-stage build
# Build stage
FROM rust:1.70 AS builder
WORKDIR /app
# Create a dummy project for caching dependencies
RUN mkdir src && \
echo 'fn main() { println!("Dummy project"); }' > src/main.rs && \
echo '[package]\nname = "app"\nversion = "0.1.0"\n\n[dependencies]\nactix-web = "4"\ntokio = { version = "1", features = ["full"] }\nserde = { version = "1", features = ["derive"] }' > Cargo.toml && \
cargo build --release && \
rm -rf src Cargo.toml Cargo.lock target/release/app*
# Copy the real source code
COPY Cargo.toml Cargo.lock ./
COPY src ./src
# Build the application
RUN cargo build --release
# Production stage
FROM debian:bookworm-slim AS runtime
WORKDIR /app
# Install SSL certificates and minimal dependencies
RUN apt-get update && \
apt-get install -y ca-certificates libssl-dev && \
rm -rf /var/lib/apt/lists/*
# Copy the binary from builder
COPY --from=builder /app/target/release/app /app/app
# Create non-root user
RUN useradd -ms /bin/bash appuser
USER appuser
# Environment variables
ENV RUST_LOG=info
# Expose port
EXPOSE 8080
# Run the application
CMD ["./app"]
Example 4: Full-Stack JavaScript Application
# Full-stack JavaScript application with multi-stage build
# Frontend build stage
FROM node:18 AS frontend-builder
WORKDIR /app/frontend
# Install dependencies
COPY frontend/package*.json ./
RUN npm ci
# Build frontend
COPY frontend/ .
RUN npm run build
# Backend build stage
FROM node:18 AS backend-builder
WORKDIR /app/backend
# Install dependencies
COPY backend/package*.json ./
RUN npm ci
# Build backend
COPY backend/ .
RUN npm run build
# Production stage
FROM node:18-alpine
WORKDIR /app
# Install production dependencies for backend
COPY backend/package*.json ./
RUN npm ci --production
# Copy backend build
COPY --from=backend-builder /app/backend/dist ./dist
# Copy frontend build to public directory
COPY --from=frontend-builder /app/frontend/build ./public
# Create non-root user
RUN addgroup -g 1001 appuser && \
adduser -u 1001 -G appuser -s /bin/sh -D appuser
# Set ownership
RUN chown -R appuser:appuser /app
# Switch to non-root user
USER appuser
# Expose port
EXPOSE 3000
# Run the application
CMD ["node", "dist/server.js"]
Practical Exercise: Converting to Multi-Stage Builds
Exercise Brief
In this exercise, you'll convert a single-stage Dockerfile to a multi-stage build to optimize a React + Node.js application.
Starting Point: Single-Stage Dockerfile
# Single-stage Dockerfile
FROM node:18
WORKDIR /app
# Install all dependencies
COPY package*.json ./
RUN npm install
# Copy all files
COPY . .
# Build the React frontend
RUN npm run build
# Expose port
EXPOSE 3000
# Start the application
CMD ["npm", "start"]
Your Task
Convert this to a multi-stage build that:
- Uses a builder stage for compiling the React application
- Uses a production stage with a minimal runtime environment
- Separates development and production dependencies
- Implements proper security practices
- Optimizes for image size and performance
Solution Outline
Here's a sample solution you can use as a reference:
# Multi-stage Dockerfile for React + Node.js application
# Build stage
FROM node:18 AS builder
WORKDIR /app
# Install all dependencies (including dev dependencies)
COPY package*.json ./
RUN npm ci
# Copy source code
COPY . .
# Build the React application
RUN npm run build
# Production stage
FROM node:18-alpine
# Create non-root user
RUN addgroup -g 1001 appuser && \
adduser -u 1001 -G appuser -s /bin/sh -D appuser
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install production dependencies only
RUN npm ci --production && \
npm cache clean --force
# Copy built assets from builder stage
COPY --from=builder --chown=appuser:appuser /app/build ./build
COPY --from=builder --chown=appuser:appuser /app/server ./server
# Switch to non-root user
USER appuser
# Set production environment
ENV NODE_ENV=production
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
CMD wget --no-verbose --tries=1 --spider http://localhost:3000/health || exit 1
# Start the application
CMD ["node", "server/index.js"]
Challenge Extension
Once you've completed the basic task, extend your solution to:
- Add a testing stage that runs unit and integration tests
- Implement layer caching optimizations for faster builds
- Add configuration for different environments (dev, staging, prod)
- Create a multi-architecture build for different platforms
- Implement security scanning in the build process
Multi-Stage Build Best Practices
Naming Conventions
- Use descriptive stage names: builder, test, security-scan, production
- Be consistent across projects for maintainability
- Document the purpose of each stage with comments
Performance Optimization
- Order operations for maximum cache efficiency
- Use BuildKit for parallel execution and better caching
- Be selective about what files to copy between stages
- Combine RUN commands to reduce layer count
Security Considerations
- Never include secrets in any stage
- Scan artifacts between stages
- Use minimal base images for production stage
- Run as non-root in the final stage
- Set appropriate permissions on copied files
Maintainability
- Keep stages focused: Each stage should have a clear purpose
- Use comments to explain complex or non-obvious steps
- Consider extracting complex logic to helper scripts
- Standardize patterns across your organization
- Document build arguments for customization
Troubleshooting Multi-Stage Builds
Common Issues and Solutions
| Issue | Possible Causes | Solutions |
|---|---|---|
| Missing files in final image |
|
|
| Permission issues |
|
|
| Build failures |
|
|
| Runtime crashes |
|
|
Debugging Techniques
Some effective strategies for debugging multi-stage builds:
# Adding a debug stage
FROM node:18 AS builder
# ... build steps ...
# Debug stage (can be targeted with --target debug)
FROM ubuntu:22.04 AS debug
# Install debugging tools
RUN apt-get update && apt-get install -y curl wget netcat-openbsd procps
# Copy artifacts from builder
COPY --from=builder /app/build /app/build
# Interactive shell for inspection
CMD ["bash"]
# Build and enter debug stage
docker build --target debug -t myapp:debug .
docker run -it myapp:debug bash
# Inspect output files
ls -la /app/build
# Check environment
env | grep MY_VAR
# Test connections
nc -zv api-server 8080
Layer Inspection
Use Docker's layer inspection tools to understand what's in each layer:
# View layers in an image
docker history myapp:latest
# Analyze image size
docker images myapp:latest
# Use dive for interactive layer exploration
# https://github.com/wagoodman/dive
dive myapp:latest
Conclusion
Multi-stage builds represent a significant improvement in Docker image creation, offering an elegant solution to the challenge of balancing development flexibility with production optimization.
Key takeaways from this lecture include:
- Separation of Concerns: Keep build tools separate from runtime environments
- Optimized Images: Create smaller, more secure production images
- Build Flexibility: Customize the build process for different environments
- Improved Security: Reduce attack surface by excluding unnecessary components
- Better Performance: Smaller images deploy faster and use fewer resources
In our next lecture, we'll explore how to use Docker Compose for orchestrating multi-container applications in production environments.