Docker Architecture and Components

Introduction to Docker

Docker is a platform that enables developers to build, package, and run applications in containers. It has become synonymous with containerization because it made containers accessible and practical for everyday development and deployment scenarios.

While our previous lecture introduced containerization concepts broadly, today we'll dive into Docker specifically, exploring its architecture and key components that make it work.

graph TD A[Docker Platform] --> B[Docker Engine] A --> C[Docker Hub] A --> D[Docker Compose] A --> E[Docker Desktop] B --> F[Container Runtime] B --> G[Image Management] B --> H[Networking] B --> I[Storage]

Docker Architecture Overview

Docker uses a client-server architecture, with a client component that communicates with a server (daemon) component using a REST API. This separation allows the Docker client to run on a different system than the Docker daemon, enabling remote management of Docker hosts.

Think of this architecture like a restaurant: The client (you) places an order (command), the server (daemon) receives the order and delegates tasks to the kitchen staff (containerd, runc) who prepare your meal (container). The pantry (registry) provides ingredients (images) when needed.

Core Architectural Components

Docker Client: The command-line interface or API that users interact with
Docker Daemon: The background service that manages Docker objects
containerd: A container runtime that manages container lifecycle operations
runc: A low-level container runtime that interfaces with the operating system
Docker Registry: A service that stores and distributes Docker images

Docker Client

The Docker client is the primary way users interact with Docker. When you run commands like docker run or docker build, you're using the Docker client, which sends these commands to the Docker daemon for execution.

Common Client Commands

# Run a container
docker run nginx

# List running containers
docker ps

# Build an image
docker build -t myapp .

# Pull an image from a registry
docker pull ubuntu:20.04

# Push an image to a registry
docker push myusername/myapp:1.0

The client is like the remote control for your TV. You press buttons on the remote (issue commands), but the TV itself (the daemon) does the actual work of changing channels or adjusting volume.

Docker Client Configuration

The Docker client can be configured to connect to different Docker daemons, allowing you to manage containers on remote systems. This is done using environment variables or configuration files:

# Connect to a remote Docker daemon
export DOCKER_HOST=tcp://192.168.1.100:2375

# Use TLS for secure connections
export DOCKER_TLS=1
export DOCKER_CERT_PATH=/path/to/certs

Docker Daemon

The Docker daemon (dockerd) is a persistent background process that manages Docker objects such as images, containers, networks, and volumes. It listens for Docker API requests and processes them accordingly.

If the Docker client is the remote control, the daemon is the TV's internal circuitry that actually performs the work. It constantly listens for incoming commands and carries them out.

Daemon Responsibilities

Creating and managing Docker objects (containers, images, etc.)
Building images (when instructed by the client)
Managing container lifecycle (creating, starting, stopping, etc.)
Handling networking between containers
Managing persistent storage for containers

Daemon Configuration

The Docker daemon can be configured using a JSON configuration file, typically located at /etc/docker/daemon.json:

{
  "debug": true,
  "tls": true,
  "tlscert": "/var/docker/server.pem",
  "tlskey": "/var/docker/serverkey.pem",
  "hosts": ["tcp://192.168.1.10:2376"]
}

Security Considerations

The Docker daemon runs with root privileges, which means anyone with access to the daemon effectively has root access to the host system. This underscores the importance of properly securing Docker installations:

Use TLS for remote connections
Implement proper user namespace mapping
Apply principle of least privilege
Regularly update Docker to patch security vulnerabilities

containerd and runc

In 2016, Docker restructured its architecture to extract core container runtime functionality into separate components: containerd and runc. This modularization allowed these components to be used independently of Docker and contributed to the standardization of container runtimes.

containerd

containerd is a daemon that manages the complete container lifecycle on a single host:

Image transfer and storage
Container execution and supervision
Network and storage attachment

In our restaurant analogy, if the Docker daemon is the head chef coordinating the kitchen, containerd is the station chef responsible for implementing the cooking processes.

runc

runc is a lightweight, portable container runtime that implements the Open Container Initiative (OCI) specification. It's responsible for the low-level work of actually creating containers:

Creating container namespaces and cgroups
Configuring container capabilities
Setting up the container's filesystem
Executing the container process

Continuing our restaurant analogy, runc is the cook who actually prepares the individual dishes according to specific recipes.

containerd-shim

The containerd-shim is a small process that sits between containerd and runc. Its main purposes are:

Allowing containers to run without a constantly running container runtime (runc)
Reporting container exit status back to containerd
Keeping STDIO streams open even if containerd crashes

This component is like the kitchen expediter who ensures that finished dishes are properly presented and delivered to the customer, even if the chef is busy with other orders.

Docker Images and Layers

Docker images are read-only templates used to create containers. They're composed of filesystem layers that represent the file changes at each step of the image creation process.

graph TD A[Base Image Layer
e.g., ubuntu:20.04] --> B[Add Node.js Layer] B --> C[Add Application Code Layer] C --> D[Configure Environment Layer] D --> E[Final Image] E --> F[Container 1
with R/W Layer] E --> G[Container 2
with R/W Layer] E --> H[Container 3
with R/W Layer]

Image Layering System

Each instruction in a Dockerfile creates a new layer in the image:

# Layer 1: Base Image
FROM ubuntu:20.04

# Layer 2: Update packages
RUN apt-get update && apt-get upgrade -y

# Layer 3: Install Node.js
RUN apt-get install -y nodejs npm

# Layer 4: Set working directory
WORKDIR /app

# Layer 5: Copy application code
COPY . .

# Layer 6: Install dependencies
RUN npm install

# Layer 7: Configure port
EXPOSE 3000

# Layer 8: Set startup command
CMD ["npm", "start"]

Each layer only stores the changes from the previous layer, which makes image distribution more efficient. When you pull an image, Docker only downloads the layers you don't already have locally.

Union File System

Docker uses a union file system to combine these layers into a single, coherent filesystem for the container. This is similar to how transparent overlays work in image editing software: each layer is stacked on top of previous layers, with higher layers taking precedence when files exist in multiple layers.

Read-Only Layers and Copy-on-Write

All image layers are read-only. When a container runs, Docker adds a writable layer on top of the image layers. Any changes made within the container are stored in this writable layer using a copy-on-write mechanism:

If a container process needs to read a file, it reads from the existing file in the lower image layers.
If a process needs to modify a file, Docker first copies the file from the image layer to the writable container layer, then makes the change.
All future reads will see the modified version of the file from the container layer.

This is like working with a photocopy of an important document instead of the original. You can make all the notes and edits you want on your copy, but the original remains unchanged for others to use.

Docker Storage

Docker provides several options for managing data in containers, each with different use cases and characteristics.

Storage Types

graph TD A[Docker Storage] --> B[Volumes] A --> C[Bind Mounts] A --> D[tmpfs Mounts] B --> B1[Created and managed by Docker] B --> B2[Stored in /var/lib/docker/volumes/] C --> C1[Any directory on the host] C --> C2[Host path mounted into container] D --> D1[Stored in host memory] D --> D2[Never written to host filesystem]

Volumes

Volumes are the preferred way to persist data in Docker:

Created and managed by Docker
Stored in a part of the host filesystem that's managed by Docker
Not affected by container lifecycle (persists after container is removed)
Can be shared among multiple containers
Can be backed up or restored easily

# Create a volume
docker volume create my-data

# Run a container with a volume
docker run -v my-data:/app/data nginx

# List volumes
docker volume ls

# Inspect a volume
docker volume inspect my-data

Bind Mounts

Bind mounts directly map a host path into a container:

Use specific paths on the host filesystem
Depend on the host filesystem having a specific directory structure
Provide high performance for large datasets
Useful for development environments for immediate code updates

# Run a container with a bind mount
docker run -v /host/path:/container/path nginx

tmpfs Mounts

tmpfs mounts store data in the host's memory only:

Data exists only in host memory (never written to disk)
Useful for storing sensitive information that shouldn't persist
Provides fast I/O for temporary files

# Run a container with a tmpfs mount
docker run --tmpfs /app/temp nginx

Choosing the right storage option is like choosing the right type of notebook: volumes are like a dedicated journal that stays on your bookshelf, bind mounts are like sticky notes you place on various surfaces around your house, and tmpfs mounts are like an erasable whiteboard that clears when powered off.

Docker Networking

Docker provides a networking system that allows containers to communicate with each other and with the outside world. It offers several built-in network drivers to accommodate different scenarios.

graph TD subgraph "Host System" A[Docker Engine] --- B[Network Drivers] B --- C[bridge] B --- D[host] B --- E[none] B --- F[overlay] B --- G[macvlan] end C --- H[Container 1] C --- I[Container 2] D --- J[Container 3] E --- K[Container 4] F --- L[Container 5] F --- M[Container 6 on Different Host] G --- N[Container 7]

Network Drivers

bridge: The default network driver. Containers on the same bridge network can communicate, while providing isolation from containers not on the network.
host: Removes network isolation between the container and the host. The container uses the host's networking directly.
none: Disables networking for the container.
overlay: Connects multiple Docker daemons across hosts, enabling swarm services to communicate.
macvlan: Assigns a MAC address to each container, making it appear as a physical device on the network.

Network Commands

# List networks
docker network ls

# Create a new network
docker network create my-network

# Run a container on a specific network
docker run --network=my-network nginx

# Connect a running container to a network
docker network connect my-network container-name

# Inspect a network
docker network inspect my-network

Container Communication

Containers on the same network can communicate with each other using container names as hostnames, which Docker resolves via an embedded DNS server:

# Run a web server container
docker run -d --name web --network my-network nginx

# Run another container and access the web server
docker run --network my-network alpine wget -O- http://web

Docker networking is like a sophisticated telephone exchange. Different types of connections (network drivers) serve different purposes, but they all enable communication between callers (containers) based on specific rules and directories (DNS).

Docker Registries

Docker registries are services that store and distribute Docker images. They're a crucial part of the Docker ecosystem, enabling collaboration and deployment across different environments.

Types of Registries

Docker Hub: The default public registry operated by Docker, Inc.
Private Registries: Self-hosted or cloud-based private registries for proprietary images
Cloud Provider Registries: Registry services provided by cloud platforms (AWS ECR, Google Container Registry, Azure Container Registry)

Working with Registries

# Pull an image from Docker Hub
docker pull nginx:latest

# Tag an image for a registry
docker tag my-app:1.0 username/my-app:1.0

# Push an image to Docker Hub
docker push username/my-app:1.0

# Pull from a private registry
docker pull registry.example.com/my-app:1.0

Registry Authentication

# Log in to Docker Hub
docker login

# Log in to a private registry
docker login registry.example.com

Docker registries function like package distribution centers. Developers deliver their packaged applications (images) to the center, which then stores them in organized shelves (repositories) and delivers them to customers (users) when requested.

Docker Compose

Docker Compose is a tool for defining and running multi-container Docker applications. It uses a YAML file to configure application services, networks, and volumes, allowing you to start all services with a single command.

Core Features

Define multiple containers in a single file
Create named volumes for persistent data
Configure custom networks for service isolation
Set environment variables and service dependencies
Scale services to multiple containers

Sample Docker Compose File (docker-compose.yml)

version: '3'

services:
  # Web server service
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./website:/usr/share/nginx/html
    depends_on:
      - app

  # Application service
  app:
    build: ./app
    environment:
      - NODE_ENV=production
      - DB_HOST=db
    depends_on:
      - db

  # Database service
  db:
    image: postgres:13
    volumes:
      - postgres_data:/var/lib/postgresql/data
    environment:
      - POSTGRES_PASSWORD=mysecretpassword
      - POSTGRES_USER=myuser
      - POSTGRES_DB=myapp

volumes:
  postgres_data:

Common Commands

# Start services
docker-compose up

# Start services in detached mode
docker-compose up -d

# Stop services
docker-compose down

# View logs
docker-compose logs

# Scale a service
docker-compose up -d --scale app=3

Docker Compose is like a blueprint and construction manager for a complex building. The YAML file is the blueprint that specifies how everything should be arranged, and the compose command is the construction manager that ensures all components are built and connected according to the plan.

Docker in Modern Development Workflows

Docker has transformed development workflows by providing a standardized environment across different stages of development and deployment.

Local Development

Create identical development environments for all team members
Eliminate "works on my machine" problems
Quickly prototype with different technology stacks
Isolate development dependencies from host system

Testing and CI/CD

Run tests in standardized environments
Build once, test everywhere
Integrate with CI/CD pipelines for automated testing and deployment
Ensure consistency between test and production environments

Deployment

Deploy the same container images across different environments
Scale horizontally by running multiple container instances
Perform blue-green deployments or canary releases
Integrate with orchestration platforms like Kubernetes

Practice Activities

Activity 1: Explore Docker Architecture

Install Docker on your system if you haven't already
Run docker info to view information about your Docker installation
Identify the storage driver, logging driver, and network driver configurations
Find the location of Docker's data directory on your system

Activity 2: Investigate Layer Caching

Create a Dockerfile with multiple RUN instructions
Build the image and observe the build time
Make a small change to one of the middle layers and rebuild
Observe which layers are rebuilt and which are pulled from cache
Optimize your Dockerfile to improve caching

Activity 3: Set Up a Multi-Container Application

Create a docker-compose.yml file for a simple web application with a frontend and backend
Configure appropriate networks for the containers to communicate
Set up a volume for persistent data
Use environment variables for configuration
Start the application with Docker Compose and verify it works correctly

Resources for Further Learning

Summary

In this lecture, we've explored the architecture and key components of Docker:

Docker uses a client-server architecture with the Docker client communicating with the Docker daemon
The Docker daemon delegates container management to containerd and runc
Docker images are composed of read-only layers, with containers adding a writable layer
Docker provides multiple storage options, including volumes, bind mounts, and tmpfs mounts
Docker networking enables container communication with several driver options
Docker registries store and distribute container images
Docker Compose simplifies multi-container application management

Understanding Docker's architecture is essential for effectively containerizing applications and troubleshooting issues. In our next lecture, we'll dive into Docker installation and configuration.