Relational Database Design

Principles and practices for creating effective relational data models

Introduction to Relational Database Design

Relational databases remain the backbone of most business applications, from e-commerce platforms to healthcare systems. Their enduring popularity stems from their ability to reliably store and query structured data while maintaining data integrity.

Good relational database design is like architecture—when done well, it creates a solid foundation that supports your application as it grows and evolves. Poor design, however, leads to performance problems, data inconsistencies, and inflexible systems that resist change.

In this lecture, we'll explore the principles and practices that guide effective relational database design, focusing on:

By the end of this lecture, you'll have the knowledge to design database schemas that are efficient, maintainable, and aligned with business requirements.

The Relational Model

Developed by E.F. Codd at IBM in 1970, the relational model revolutionized database management by providing a mathematical foundation for organizing and accessing data. This model has three key components:

Core Components

classDiagram class Relation { Table/Entity } class Tuple { Row/Record } class Attribute { Column/Field } Relation "1" --> "*" Tuple : contains Relation "1" --> "*" Attribute : defines Tuple "1" --> "*" Attribute : has values for

Let's illustrate these concepts with a simple example of a "Customers" table:

Customers (Relation/Table)
customer_id first_name last_name email date_joined
Attributes/Columns
1 John Smith john@example.com 2024-01-15
2 Sarah Johnson sarah@example.com 2024-02-20
3 Michael Lee michael@example.com 2024-03-05
Tuples/Rows

Properties of Relations

In a properly designed relational database, relations have these important properties:

Think of a relation as a spreadsheet with strict rules—every cell must contain a single value of the right type, column headers must be unique, and both the ordering of rows and columns has no meaning.

Relational Schema

A relational schema is the blueprint of a database that defines its structure. It specifies:

A schema is typically represented using this notation:


TableName(column1:type, column2:type, ...)

Example:
Customers(customer_id:integer, first_name:varchar, last_name:varchar, email:varchar, date_joined:date)
Orders(order_id:integer, customer_id:integer, order_date:date, total_amount:decimal)
            

In practice, schemas are implemented using SQL statements like CREATE TABLE:


CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    date_joined DATE NOT NULL
);

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
            

Entity-Relationship Modeling

Before writing SQL to create tables, database designers typically create an Entity-Relationship (ER) model—a visual representation of the database structure that's easier to understand and discuss with stakeholders.

Core Components of ER Models

erDiagram CUSTOMER ||--o{ ORDER : places CUSTOMER { int customer_id PK string first_name string last_name string email date date_joined } ORDER { int order_id PK int customer_id FK date order_date decimal total_amount } ORDER ||--|{ ORDER_ITEM : contains PRODUCT ||--o{ ORDER_ITEM : included_in ORDER_ITEM { int order_id PK,FK int product_id PK,FK int quantity decimal price } PRODUCT { int product_id PK string name string description decimal price int stock }

Types of Attributes

In relational databases, multi-valued attributes typically require separate tables, and composite attributes are often split into individual columns.

Types of Relationships

Relationships in ER models have cardinality—the number of instances of one entity that can relate to instances of another entity.

graph TD subgraph "One-to-One" A1[Person] --- B1[Passport] end subgraph "One-to-Many" A2[Customer] --- B2[Order 1] A2 --- B3[Order 2] A2 --- B4[Order 3] end subgraph "Many-to-Many" A3[Student 1] --- B5[Course A] A3 --- B6[Course B] A4[Student 2] --- B5 A4 --- B6 end

When implementing these relationships in a relational database:

Converting ER Diagrams to Relational Schemas

The process of moving from an ER diagram to a relational schema follows these general steps:

  1. Each entity type becomes a table
  2. Attributes become columns in the tables
  3. Primary key attributes become primary keys in the tables
  4. Relationships are implemented through foreign keys
  5. Many-to-many relationships require junction tables

Let's illustrate with a simple example showing how a many-to-many relationship is implemented:


-- Students table
CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL
);

-- Courses table
CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    title VARCHAR(100) NOT NULL,
    credits INT NOT NULL,
    department VARCHAR(50) NOT NULL
);

-- Junction table for many-to-many relationship
CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE NOT NULL,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
            

The junction table Enrollments resolves the many-to-many relationship between Students and Courses, and also stores relationship-specific attributes like enrollment_date and grade.

Keys and Relationships

Keys are fundamental to relational database design, serving as identifiers and creating relationships between tables.

Types of Keys

classDiagram class Customers { customer_id (Primary Key) email (Candidate Key) first_name last_name } class Orders { order_id (Primary Key) customer_id (Foreign Key) order_date total_amount } class Enrollments { student_id + course_id (Composite Primary Key) student_id (Foreign Key) course_id (Foreign Key) enrollment_date grade } Customers "1" -- "*" Orders : places

Referential Integrity

Referential integrity ensures that relationships between tables remain consistent. It's enforced through foreign key constraints:

Common referential actions include:

Here's an example implementing referential integrity with CASCADE:


CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (customer_id) 
        REFERENCES Customers(customer_id)
        ON DELETE CASCADE  -- If a customer is deleted, all their orders are deleted too
        ON UPDATE CASCADE  -- If a customer_id is updated, it's updated in Orders too
);
            

In a real-world application like an e-commerce platform, referential integrity ensures that no order exists without a corresponding customer, maintaining data consistency throughout the system.

Natural vs. Surrogate Keys

The choice between natural and surrogate keys is an important design decision:

Natural Keys Surrogate Keys
Have business meaning (e.g., ISBN, SSN) No business meaning (e.g., auto-incrementing ID)
Can be complex (multiple columns) Typically simple (single column)
Can change (problematic!) Never change
Can be lengthy (e.g., UUIDs) Usually compact integers
May reveal sensitive information Reveals nothing about the entity

Modern practice generally favors surrogate keys for most tables, especially for:

A real-world analogy is your passport or driver's license number versus your name. Your name has meaning, but it might not be unique, and it could change (e.g., through marriage). The ID number has no meaning but uniquely and permanently identifies you in the system.

Normalization

Normalization is a systematic approach to database design that reduces data redundancy and improves data integrity. It involves organizing data into multiple related tables and defining relationships between them.

Reasons for Normalization

Imagine a spreadsheet where each row contains a customer's information along with all their orders and the products in each order. This approach would have serious problems:

Normalization solves these problems by decomposing tables into smaller, related tables according to specific rules called normal forms.

Dependency Types

To understand normalization, we first need to understand functional dependencies:

Normal Forms

Normalization progresses through several "normal forms," each addressing specific types of dependencies:

First Normal Form (1NF)

Example: Converting a table with a multi-valued attribute (phone numbers) into 1NF:

Before normalization:

customer_id name phone_numbers
1 John Smith 555-1234, 555-5678

After 1NF:

customer_id name phone_number
1 John Smith 555-1234
1 John Smith 555-5678

Or better yet, with a separate table for phone numbers:

Customers
customer_id name
1 John Smith
PhoneNumbers
customer_id phone_number
1 555-1234
1 555-5678

Second Normal Form (2NF)

Example: Converting a composite key table with partial dependencies into 2NF:

Before 2NF:

student_id course_id student_name course_name grade
101 CS101 Alice Johnson Intro to Programming A

Here, student_name depends only on student_id, and course_name depends only on course_id, creating partial dependencies.

After 2NF:

Students
student_id student_name
101 Alice Johnson
Courses
course_id course_name
CS101 Intro to Programming
Enrollments
student_id course_id grade
101 CS101 A

Third Normal Form (3NF)

Example: Converting a table with transitive dependencies into 3NF:

Before 3NF:

order_id product_id product_name category_id category_name quantity
1001 P100 Laptop C10 Electronics 1

Here, category_name depends on category_id, which depends on product_id, creating a transitive dependency.

After 3NF:

Categories
category_id category_name
C10 Electronics
Products
product_id product_name category_id
P100 Laptop C10
OrderItems
order_id product_id quantity
1001 P100 1

Beyond 3NF

Additional normal forms exist (BCNF, 4NF, 5NF, 6NF), but 3NF is sufficient for most applications. Higher normal forms address more specialized dependency issues.

When to Denormalize

While normalization improves data integrity, sometimes denormalization (deliberately introducing redundancy) is justified for performance reasons:

However, denormalization should be done carefully and only after the normalized design is understood:

A real-world example is an e-commerce product page. The product details might include the average rating, which is technically derived data (calculated from individual reviews). However, storing this pre-calculated value avoids having to compute it on every page view.

Practical Database Design Patterns

Beyond normalization, several common patterns and techniques are useful for real-world database design.

Inheritance Patterns

Relational databases don't directly support inheritance like object-oriented languages, but several patterns can model it:

Example of Table Per Type for a Person → Employee/Customer hierarchy:


CREATE TABLE People (
    person_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    date_of_birth DATE
);

CREATE TABLE Employees (
    person_id INT PRIMARY KEY,
    hire_date DATE NOT NULL,
    department VARCHAR(50) NOT NULL,
    salary DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (person_id) REFERENCES People(person_id)
);

CREATE TABLE Customers (
    person_id INT PRIMARY KEY,
    customer_since DATE NOT NULL,
    credit_limit DECIMAL(10, 2),
    FOREIGN KEY (person_id) REFERENCES People(person_id)
);
            

Historical Data Patterns

For systems that need to track changes over time, these patterns are useful:

Example of a temporal table for product prices:


CREATE TABLE ProductPrices (
    product_id INT NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    valid_from TIMESTAMP NOT NULL,
    valid_to TIMESTAMP,
    PRIMARY KEY (product_id, valid_from),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

-- Insert a new price (setting the previous one's end date)
BEGIN TRANSACTION;

-- Set end date for current price
UPDATE ProductPrices
SET valid_to = CURRENT_TIMESTAMP
WHERE product_id = 123 AND valid_to IS NULL;

-- Insert new price
INSERT INTO ProductPrices (product_id, price, valid_from, valid_to)
VALUES (123, 29.99, CURRENT_TIMESTAMP, NULL);

COMMIT;
            

Tree and Hierarchical Data

Hierarchical data (like organizational charts, category trees, or file systems) can be challenging in relational databases. Several patterns exist:

Example of an Adjacency List for categories:


CREATE TABLE Categories (
    category_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INT,
    FOREIGN KEY (parent_id) REFERENCES Categories(category_id)
);

INSERT INTO Categories (category_id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Desktop PCs', 2),
(5, 'Smartphones', 1);

-- Find all subcategories of Electronics
WITH RECURSIVE CategoryTree AS (
    -- Base case: start with Electronics
    SELECT category_id, name, parent_id
    FROM Categories
    WHERE category_id = 1
    
    UNION ALL
    
    -- Recursive step: find children of each category
    SELECT c.category_id, c.name, c.parent_id
    FROM Categories c
    JOIN CategoryTree ct ON c.parent_id = ct.category_id
)
SELECT * FROM CategoryTree;
            

Common Design Pitfalls

Even experienced database designers can make these common mistakes:

Overusing Surrogate Keys

While surrogate keys are generally good practice, overusing them can lead to problems:

Best practice: Use surrogate keys for most entities, but still enforce uniqueness on natural candidate keys.

Poor Naming Conventions

Inconsistent or unclear naming makes databases harder to understand and maintain:

Best practice: Adopt consistent naming conventions that align with business terminology.

Ignoring Indexing Needs

Proper indexing is crucial for performance but often overlooked:

Best practice: Analyze query patterns early and index accordingly, while monitoring performance impacts.

Inappropriate Denormalization

Premature or excessive denormalization creates maintenance headaches:

Best practice: Start with a normalized design, then selectively denormalize based on performance testing.

Neglecting Data Types

Using inappropriate data types leads to wasted space and performance issues:

Best practice: Choose the most appropriate data type for each column based on its content and usage patterns.

Case Study: E-Commerce Database Design

Let's apply what we've learned to design a simplified database for an e-commerce application:

erDiagram USERS ||--o{ ADDRESSES : have USERS ||--o{ ORDERS : place USERS { int user_id PK string email UK string password_hash string first_name string last_name datetime created_at boolean is_active } ADDRESSES { int address_id PK int user_id FK string address_line1 string address_line2 string city string state string postal_code string country boolean is_default string address_type } PRODUCTS { int product_id PK int category_id FK string sku UK string name string description decimal price int stock_quantity string image_url boolean is_active datetime created_at } CATEGORIES ||--o{ PRODUCTS : contain CATEGORIES { int category_id PK int parent_id FK string name string description } ORDERS ||--|{ ORDER_ITEMS : contain ORDERS { int order_id PK int user_id FK int shipping_address_id FK int billing_address_id FK datetime order_date string status decimal total_amount string payment_method string tracking_number } ORDER_ITEMS { int order_item_id PK int order_id FK int product_id FK int quantity decimal price decimal subtotal } PRODUCTS ||--o{ ORDER_ITEMS : included_in REVIEWS }|--|| USERS : written_by REVIEWS }|--|| PRODUCTS : about REVIEWS { int review_id PK int user_id FK int product_id FK int rating string comment datetime created_at }

Let's examine some of the design decisions:

SQL implementation for key portions:


-- Users and Addresses
CREATE TABLE Users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT TRUE
);

CREATE TABLE Addresses (
    address_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    address_line1 VARCHAR(100) NOT NULL,
    address_line2 VARCHAR(100),
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50),
    postal_code VARCHAR(20) NOT NULL,
    country VARCHAR(50) NOT NULL,
    is_default BOOLEAN DEFAULT FALSE,
    address_type ENUM('shipping', 'billing', 'both') NOT NULL,
    FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

-- Products and Categories
CREATE TABLE Categories (
    category_id INT AUTO_INCREMENT PRIMARY KEY,
    parent_id INT,
    name VARCHAR(50) NOT NULL,
    description TEXT,
    FOREIGN KEY (parent_id) REFERENCES Categories(category_id)
);

CREATE TABLE Products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    category_id INT NOT NULL,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INT NOT NULL DEFAULT 0,
    image_url VARCHAR(255),
    is_active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);

-- Orders and OrderItems
CREATE TABLE Orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    shipping_address_id INT NOT NULL,
    billing_address_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled') NOT NULL DEFAULT 'pending',
    total_amount DECIMAL(10, 2) NOT NULL,
    payment_method VARCHAR(50) NOT NULL,
    tracking_number VARCHAR(100),
    FOREIGN KEY (user_id) REFERENCES Users(user_id),
    FOREIGN KEY (shipping_address_id) REFERENCES Addresses(address_id),
    FOREIGN KEY (billing_address_id) REFERENCES Addresses(address_id)
);

CREATE TABLE OrderItems (
    order_item_id INT AUTO_INCREMENT PRIMARY KEY,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    subtotal DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES Orders(order_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

-- Reviews
CREATE TABLE Reviews (
    review_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    product_id INT NOT NULL,
    rating INT NOT NULL CHECK (rating BETWEEN 1 AND 5),
    comment TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES Users(user_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id),
    UNIQUE (user_id, product_id) -- One review per user per product
);
            

This schema demonstrates a well-normalized design that would serve as a solid foundation for an e-commerce application. In a real-world scenario, you might add more tables (like payment processing information, inventory history, etc.) and possibly denormalize certain areas based on performance requirements.

Practice Activities

Activity 1: Normalization Exercise

Consider the following un-normalized table for a library management system:


Books(book_id, title, author_name, author_birth_date, publisher_name, publisher_address, publisher_phone, isbn, publication_date, genre, shelf_location, copy_number, borrower_id, borrower_name, borrower_email, borrow_date, return_date)
            

Apply normalization to create a properly structured relational schema:

  1. Identify entities and their attributes
  2. Identify primary keys for each entity
  3. Create foreign key relationships
  4. Ensure the design meets 3NF requirements
  5. Draw an ER diagram for your solution

Activity 2: E-Learning Platform Design

Design a database schema for an e-learning platform with these requirements:

Create an ER diagram and write SQL CREATE TABLE statements for your design.

Activity 3: Design Critique

Review the following database design for a blog platform and identify potential problems:


-- Users table
CREATE TABLE Users (
    id INT PRIMARY KEY,
    username VARCHAR(50),
    password VARCHAR(50),
    email VARCHAR(100),
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    bio TEXT,
    joined_date DATE
);

-- Posts table
CREATE TABLE Posts (
    id INT PRIMARY KEY,
    title VARCHAR(100),
    content TEXT,
    author_id INT,
    author_username VARCHAR(50),
    author_email VARCHAR(100),
    created_date DATETIME,
    category VARCHAR(50),
    tags VARCHAR(255),
    status VARCHAR(20)
);

-- Comments table
CREATE TABLE Comments (
    id INT PRIMARY KEY,
    post_id INT,
    commenter_name VARCHAR(50),
    commenter_email VARCHAR(100),
    content TEXT,
    created_date DATETIME,
    is_approved BOOLEAN
);
            

For each issue you identify:

  1. Explain why it's a problem
  2. Suggest how to fix it
  3. Provide the revised SQL statements

Summary

In this lecture, we've explored the principles and practices of relational database design:

A well-designed relational database is the foundation of reliable, efficient applications. By following these principles, you can create database schemas that not only meet current needs but can also adapt to changing requirements over time.

Further Reading