Introduction to Docker
Docker is a platform that enables developers to build, package, and run applications in containers. It has become synonymous with containerization because it made containers accessible and practical for everyday development and deployment scenarios.
While our previous lecture introduced containerization concepts broadly, today we'll dive into Docker specifically, exploring its architecture and key components that make it work.
Docker Architecture Overview
Docker uses a client-server architecture, with a client component that communicates with a server (daemon) component using a REST API. This separation allows the Docker client to run on a different system than the Docker daemon, enabling remote management of Docker hosts.
Think of this architecture like a restaurant: The client (you) places an order (command), the server (daemon) receives the order and delegates tasks to the kitchen staff (containerd, runc) who prepare your meal (container). The pantry (registry) provides ingredients (images) when needed.
Core Architectural Components
- Docker Client: The command-line interface or API that users interact with
- Docker Daemon: The background service that manages Docker objects
- containerd: A container runtime that manages container lifecycle operations
- runc: A low-level container runtime that interfaces with the operating system
- Docker Registry: A service that stores and distributes Docker images
Docker Client
The Docker client is the primary way users interact with Docker. When you run commands like docker run or docker build, you're using the Docker client, which sends these commands to the Docker daemon for execution.
Common Client Commands
# Run a container
docker run nginx
# List running containers
docker ps
# Build an image
docker build -t myapp .
# Pull an image from a registry
docker pull ubuntu:20.04
# Push an image to a registry
docker push myusername/myapp:1.0
The client is like the remote control for your TV. You press buttons on the remote (issue commands), but the TV itself (the daemon) does the actual work of changing channels or adjusting volume.
Docker Client Configuration
The Docker client can be configured to connect to different Docker daemons, allowing you to manage containers on remote systems. This is done using environment variables or configuration files:
# Connect to a remote Docker daemon
export DOCKER_HOST=tcp://192.168.1.100:2375
# Use TLS for secure connections
export DOCKER_TLS=1
export DOCKER_CERT_PATH=/path/to/certs
Docker Daemon
The Docker daemon (dockerd) is a persistent background process that manages Docker objects such as images, containers, networks, and volumes. It listens for Docker API requests and processes them accordingly.
If the Docker client is the remote control, the daemon is the TV's internal circuitry that actually performs the work. It constantly listens for incoming commands and carries them out.
Daemon Responsibilities
- Creating and managing Docker objects (containers, images, etc.)
- Building images (when instructed by the client)
- Managing container lifecycle (creating, starting, stopping, etc.)
- Handling networking between containers
- Managing persistent storage for containers
Daemon Configuration
The Docker daemon can be configured using a JSON configuration file, typically located at /etc/docker/daemon.json:
{
"debug": true,
"tls": true,
"tlscert": "/var/docker/server.pem",
"tlskey": "/var/docker/serverkey.pem",
"hosts": ["tcp://192.168.1.10:2376"]
}
Security Considerations
The Docker daemon runs with root privileges, which means anyone with access to the daemon effectively has root access to the host system. This underscores the importance of properly securing Docker installations:
- Use TLS for remote connections
- Implement proper user namespace mapping
- Apply principle of least privilege
- Regularly update Docker to patch security vulnerabilities
containerd and runc
In 2016, Docker restructured its architecture to extract core container runtime functionality into separate components: containerd and runc. This modularization allowed these components to be used independently of Docker and contributed to the standardization of container runtimes.
containerd
containerd is a daemon that manages the complete container lifecycle on a single host:
- Image transfer and storage
- Container execution and supervision
- Network and storage attachment
In our restaurant analogy, if the Docker daemon is the head chef coordinating the kitchen, containerd is the station chef responsible for implementing the cooking processes.
runc
runc is a lightweight, portable container runtime that implements the Open Container Initiative (OCI) specification. It's responsible for the low-level work of actually creating containers:
- Creating container namespaces and cgroups
- Configuring container capabilities
- Setting up the container's filesystem
- Executing the container process
Continuing our restaurant analogy, runc is the cook who actually prepares the individual dishes according to specific recipes.
containerd-shim
The containerd-shim is a small process that sits between containerd and runc. Its main purposes are:
- Allowing containers to run without a constantly running container runtime (runc)
- Reporting container exit status back to containerd
- Keeping STDIO streams open even if containerd crashes
This component is like the kitchen expediter who ensures that finished dishes are properly presented and delivered to the customer, even if the chef is busy with other orders.
Docker Images and Layers
Docker images are read-only templates used to create containers. They're composed of filesystem layers that represent the file changes at each step of the image creation process.
e.g., ubuntu:20.04] --> B[Add Node.js Layer] B --> C[Add Application Code Layer] C --> D[Configure Environment Layer] D --> E[Final Image] E --> F[Container 1
with R/W Layer] E --> G[Container 2
with R/W Layer] E --> H[Container 3
with R/W Layer]
Image Layering System
Each instruction in a Dockerfile creates a new layer in the image:
# Layer 1: Base Image
FROM ubuntu:20.04
# Layer 2: Update packages
RUN apt-get update && apt-get upgrade -y
# Layer 3: Install Node.js
RUN apt-get install -y nodejs npm
# Layer 4: Set working directory
WORKDIR /app
# Layer 5: Copy application code
COPY . .
# Layer 6: Install dependencies
RUN npm install
# Layer 7: Configure port
EXPOSE 3000
# Layer 8: Set startup command
CMD ["npm", "start"]
Each layer only stores the changes from the previous layer, which makes image distribution more efficient. When you pull an image, Docker only downloads the layers you don't already have locally.
Union File System
Docker uses a union file system to combine these layers into a single, coherent filesystem for the container. This is similar to how transparent overlays work in image editing software: each layer is stacked on top of previous layers, with higher layers taking precedence when files exist in multiple layers.
Read-Only Layers and Copy-on-Write
All image layers are read-only. When a container runs, Docker adds a writable layer on top of the image layers. Any changes made within the container are stored in this writable layer using a copy-on-write mechanism:
- If a container process needs to read a file, it reads from the existing file in the lower image layers.
- If a process needs to modify a file, Docker first copies the file from the image layer to the writable container layer, then makes the change.
- All future reads will see the modified version of the file from the container layer.
This is like working with a photocopy of an important document instead of the original. You can make all the notes and edits you want on your copy, but the original remains unchanged for others to use.
Docker Storage
Docker provides several options for managing data in containers, each with different use cases and characteristics.
Storage Types
Volumes
Volumes are the preferred way to persist data in Docker:
- Created and managed by Docker
- Stored in a part of the host filesystem that's managed by Docker
- Not affected by container lifecycle (persists after container is removed)
- Can be shared among multiple containers
- Can be backed up or restored easily
# Create a volume
docker volume create my-data
# Run a container with a volume
docker run -v my-data:/app/data nginx
# List volumes
docker volume ls
# Inspect a volume
docker volume inspect my-data
Bind Mounts
Bind mounts directly map a host path into a container:
- Use specific paths on the host filesystem
- Depend on the host filesystem having a specific directory structure
- Provide high performance for large datasets
- Useful for development environments for immediate code updates
# Run a container with a bind mount
docker run -v /host/path:/container/path nginx
tmpfs Mounts
tmpfs mounts store data in the host's memory only:
- Data exists only in host memory (never written to disk)
- Useful for storing sensitive information that shouldn't persist
- Provides fast I/O for temporary files
# Run a container with a tmpfs mount
docker run --tmpfs /app/temp nginx
Choosing the right storage option is like choosing the right type of notebook: volumes are like a dedicated journal that stays on your bookshelf, bind mounts are like sticky notes you place on various surfaces around your house, and tmpfs mounts are like an erasable whiteboard that clears when powered off.
Docker Networking
Docker provides a networking system that allows containers to communicate with each other and with the outside world. It offers several built-in network drivers to accommodate different scenarios.
Network Drivers
- bridge: The default network driver. Containers on the same bridge network can communicate, while providing isolation from containers not on the network.
- host: Removes network isolation between the container and the host. The container uses the host's networking directly.
- none: Disables networking for the container.
- overlay: Connects multiple Docker daemons across hosts, enabling swarm services to communicate.
- macvlan: Assigns a MAC address to each container, making it appear as a physical device on the network.
Network Commands
# List networks
docker network ls
# Create a new network
docker network create my-network
# Run a container on a specific network
docker run --network=my-network nginx
# Connect a running container to a network
docker network connect my-network container-name
# Inspect a network
docker network inspect my-network
Container Communication
Containers on the same network can communicate with each other using container names as hostnames, which Docker resolves via an embedded DNS server:
# Run a web server container
docker run -d --name web --network my-network nginx
# Run another container and access the web server
docker run --network my-network alpine wget -O- http://web
Docker networking is like a sophisticated telephone exchange. Different types of connections (network drivers) serve different purposes, but they all enable communication between callers (containers) based on specific rules and directories (DNS).
Docker Registries
Docker registries are services that store and distribute Docker images. They're a crucial part of the Docker ecosystem, enabling collaboration and deployment across different environments.
Types of Registries
- Docker Hub: The default public registry operated by Docker, Inc.
- Private Registries: Self-hosted or cloud-based private registries for proprietary images
- Cloud Provider Registries: Registry services provided by cloud platforms (AWS ECR, Google Container Registry, Azure Container Registry)
Working with Registries
# Pull an image from Docker Hub
docker pull nginx:latest
# Tag an image for a registry
docker tag my-app:1.0 username/my-app:1.0
# Push an image to Docker Hub
docker push username/my-app:1.0
# Pull from a private registry
docker pull registry.example.com/my-app:1.0
Registry Authentication
# Log in to Docker Hub
docker login
# Log in to a private registry
docker login registry.example.com
Docker registries function like package distribution centers. Developers deliver their packaged applications (images) to the center, which then stores them in organized shelves (repositories) and delivers them to customers (users) when requested.
Docker Compose
Docker Compose is a tool for defining and running multi-container Docker applications. It uses a YAML file to configure application services, networks, and volumes, allowing you to start all services with a single command.
Core Features
- Define multiple containers in a single file
- Create named volumes for persistent data
- Configure custom networks for service isolation
- Set environment variables and service dependencies
- Scale services to multiple containers
Sample Docker Compose File (docker-compose.yml)
version: '3'
services:
# Web server service
web:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./website:/usr/share/nginx/html
depends_on:
- app
# Application service
app:
build: ./app
environment:
- NODE_ENV=production
- DB_HOST=db
depends_on:
- db
# Database service
db:
image: postgres:13
volumes:
- postgres_data:/var/lib/postgresql/data
environment:
- POSTGRES_PASSWORD=mysecretpassword
- POSTGRES_USER=myuser
- POSTGRES_DB=myapp
volumes:
postgres_data:
Common Commands
# Start services
docker-compose up
# Start services in detached mode
docker-compose up -d
# Stop services
docker-compose down
# View logs
docker-compose logs
# Scale a service
docker-compose up -d --scale app=3
Docker Compose is like a blueprint and construction manager for a complex building. The YAML file is the blueprint that specifies how everything should be arranged, and the compose command is the construction manager that ensures all components are built and connected according to the plan.
Docker in Modern Development Workflows
Docker has transformed development workflows by providing a standardized environment across different stages of development and deployment.
Local Development
- Create identical development environments for all team members
- Eliminate "works on my machine" problems
- Quickly prototype with different technology stacks
- Isolate development dependencies from host system
Testing and CI/CD
- Run tests in standardized environments
- Build once, test everywhere
- Integrate with CI/CD pipelines for automated testing and deployment
- Ensure consistency between test and production environments
Deployment
- Deploy the same container images across different environments
- Scale horizontally by running multiple container instances
- Perform blue-green deployments or canary releases
- Integrate with orchestration platforms like Kubernetes
Practice Activities
Activity 1: Explore Docker Architecture
- Install Docker on your system if you haven't already
- Run
docker infoto view information about your Docker installation - Identify the storage driver, logging driver, and network driver configurations
- Find the location of Docker's data directory on your system
Activity 2: Investigate Layer Caching
- Create a Dockerfile with multiple RUN instructions
- Build the image and observe the build time
- Make a small change to one of the middle layers and rebuild
- Observe which layers are rebuilt and which are pulled from cache
- Optimize your Dockerfile to improve caching
Activity 3: Set Up a Multi-Container Application
- Create a docker-compose.yml file for a simple web application with a frontend and backend
- Configure appropriate networks for the containers to communicate
- Set up a volume for persistent data
- Use environment variables for configuration
- Start the application with Docker Compose and verify it works correctly
Resources for Further Learning
Summary
In this lecture, we've explored the architecture and key components of Docker:
- Docker uses a client-server architecture with the Docker client communicating with the Docker daemon
- The Docker daemon delegates container management to containerd and runc
- Docker images are composed of read-only layers, with containers adding a writable layer
- Docker provides multiple storage options, including volumes, bind mounts, and tmpfs mounts
- Docker networking enables container communication with several driver options
- Docker registries store and distribute container images
- Docker Compose simplifies multi-container application management
Understanding Docker's architecture is essential for effectively containerizing applications and troubleshooting issues. In our next lecture, we'll dive into Docker installation and configuration.