Relational Database Design

Introduction to Relational Database Design

Relational databases remain the backbone of most business applications, from e-commerce platforms to healthcare systems. Their enduring popularity stems from their ability to reliably store and query structured data while maintaining data integrity.

Good relational database design is like architecture—when done well, it creates a solid foundation that supports your application as it grows and evolves. Poor design, however, leads to performance problems, data inconsistencies, and inflexible systems that resist change.

In this lecture, we'll explore the principles and practices that guide effective relational database design, focusing on:

The relational model and its core components
Entity-Relationship modeling
Normalization and its importance
Keys and relationships
Practical design patterns and considerations

By the end of this lecture, you'll have the knowledge to design database schemas that are efficient, maintainable, and aligned with business requirements.

The Relational Model

Developed by E.F. Codd at IBM in 1970, the relational model revolutionized database management by providing a mathematical foundation for organizing and accessing data. This model has three key components:

Core Components

Relations (Tables): Collections of related data organized into rows and columns. Each table represents an entity or concept.
Tuples (Rows): Individual records within a table, representing specific instances of the entity.
Attributes (Columns): Properties or characteristics of an entity, each with a defined data type.

classDiagram class Relation { Table/Entity } class Tuple { Row/Record } class Attribute { Column/Field } Relation "1" --> "*" Tuple : contains Relation "1" --> "*" Attribute : defines Tuple "1" --> "*" Attribute : has values for

Let's illustrate these concepts with a simple example of a "Customers" table:

**Customers** (Relation/Table)
customer_id	first_name	last_name	email	date_joined
Attributes/Columns
1	John	Smith	john@example.com	2024-01-15
2	Sarah	Johnson	sarah@example.com	2024-02-20
3	Michael	Lee	michael@example.com	2024-03-05
Tuples/Rows

Properties of Relations

In a properly designed relational database, relations have these important properties:

Atomic Values: Each cell contains a single, indivisible value (not a list or set)
Unique Rows: No duplicate rows are allowed in a table
Column Homogeneity: All values in a column have the same data type
Column Order Insignificance: The order of columns doesn't affect the meaning of the data
Row Order Insignificance: The order of rows doesn't matter
Unique Column Names: Each column within a table has a distinct name

Think of a relation as a spreadsheet with strict rules—every cell must contain a single value of the right type, column headers must be unique, and both the ordering of rows and columns has no meaning.

Relational Schema

A relational schema is the blueprint of a database that defines its structure. It specifies:

The tables in the database
The columns in each table and their data types
Constraints on the data (e.g., NOT NULL, UNIQUE)
Relationships between tables

A schema is typically represented using this notation:


TableName(column1:type, column2:type, ...)

Example:
Customers(customer_id:integer, first_name:varchar, last_name:varchar, email:varchar, date_joined:date)
Orders(order_id:integer, customer_id:integer, order_date:date, total_amount:decimal)

In practice, schemas are implemented using SQL statements like CREATE TABLE:


CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    date_joined DATE NOT NULL
);

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT NOT NULL,
    order_date DATE NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

Entity-Relationship Modeling

Before writing SQL to create tables, database designers typically create an Entity-Relationship (ER) model—a visual representation of the database structure that's easier to understand and discuss with stakeholders.

Core Components of ER Models

Entities: Objects or concepts about which we want to store data (e.g., customers, products, orders). Represented as rectangles in ER diagrams.
Attributes: Properties or characteristics of entities. Represented as ovals connected to entities.
Relationships: Connections between entities. Represented as diamonds connecting entities.

erDiagram CUSTOMER ||--o{ ORDER : places CUSTOMER { int customer_id PK string first_name string last_name string email date date_joined } ORDER { int order_id PK int customer_id FK date order_date decimal total_amount } ORDER ||--|{ ORDER_ITEM : contains PRODUCT ||--o{ ORDER_ITEM : included_in ORDER_ITEM { int order_id PK,FK int product_id PK,FK int quantity decimal price } PRODUCT { int product_id PK string name string description decimal price int stock }

Types of Attributes

Simple vs. Composite: Simple attributes can't be divided (e.g., age), while composite attributes can be split into smaller parts (e.g., address).
Single-valued vs. Multi-valued: Single-valued attributes hold one value (e.g., date of birth), while multi-valued attributes can hold multiple values (e.g., phone numbers).
Derived: Attributes whose values can be calculated from other attributes (e.g., age derived from date of birth).
Key Attributes: Attributes that uniquely identify an entity (e.g., customer_id).

In relational databases, multi-valued attributes typically require separate tables, and composite attributes are often split into individual columns.

Types of Relationships

Relationships in ER models have cardinality—the number of instances of one entity that can relate to instances of another entity.

One-to-One (1:1): One entity instance relates to exactly one instance of another entity.
Example: A person has one passport, and each passport belongs to one person.
One-to-Many (1:N): One entity instance relates to multiple instances of another entity.
Example: A customer can place many orders, but each order is placed by one customer.
Many-to-Many (N:M): Multiple instances of one entity relate to multiple instances of another entity.
Example: A student can enroll in many courses, and a course can have many students.

graph TD subgraph "One-to-One" A1[Person] --- B1[Passport] end subgraph "One-to-Many" A2[Customer] --- B2[Order 1] A2 --- B3[Order 2] A2 --- B4[Order 3] end subgraph "Many-to-Many" A3[Student 1] --- B5[Course A] A3 --- B6[Course B] A4[Student 2] --- B5 A4 --- B6 end

When implementing these relationships in a relational database:

One-to-One: Either table can include a foreign key referencing the other, typically the entity that depends on the other.
One-to-Many: The "many" side includes a foreign key referencing the "one" side.
Many-to-Many: Requires a junction table (also called bridge or association table) with foreign keys to both entities.

Converting ER Diagrams to Relational Schemas

The process of moving from an ER diagram to a relational schema follows these general steps:

Each entity type becomes a table
Attributes become columns in the tables
Primary key attributes become primary keys in the tables
Relationships are implemented through foreign keys
Many-to-many relationships require junction tables

Let's illustrate with a simple example showing how a many-to-many relationship is implemented:


-- Students table
CREATE TABLE Students (
    student_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL
);

-- Courses table
CREATE TABLE Courses (
    course_id INT PRIMARY KEY,
    title VARCHAR(100) NOT NULL,
    credits INT NOT NULL,
    department VARCHAR(50) NOT NULL
);

-- Junction table for many-to-many relationship
CREATE TABLE Enrollments (
    student_id INT,
    course_id INT,
    enrollment_date DATE NOT NULL,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES Students(student_id),
    FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);

The junction table Enrollments resolves the many-to-many relationship between Students and Courses, and also stores relationship-specific attributes like enrollment_date and grade.

Keys and Relationships

Keys are fundamental to relational database design, serving as identifiers and creating relationships between tables.

Types of Keys

Primary Key: A column or combination of columns that uniquely identifies each row in a table.
Example: customer_id in a Customers table.
Foreign Key: A column or combination of columns that refers to the primary key of another table.
Example: customer_id in an Orders table referencing the Customers table.
Candidate Key: A column or combination of columns that could potentially serve as a primary key.
Example: In a Students table, both student_id and email might be candidate keys.
Composite Key: A primary key composed of multiple columns.
Example: In an Enrollments table, the combination of student_id and course_id.
Surrogate Key: An artificial primary key, typically an auto-incrementing ID, with no business meaning.
Example: An auto-generated product_id in a Products table.
Natural Key: A primary key that has business meaning.
Example: A social security number or ISBN.

classDiagram class Customers { customer_id (Primary Key) email (Candidate Key) first_name last_name } class Orders { order_id (Primary Key) customer_id (Foreign Key) order_date total_amount } class Enrollments { student_id + course_id (Composite Primary Key) student_id (Foreign Key) course_id (Foreign Key) enrollment_date grade } Customers "1" -- "*" Orders : places

Referential Integrity

Referential integrity ensures that relationships between tables remain consistent. It's enforced through foreign key constraints:

Every foreign key value must match a primary key value in the referenced table, or be NULL if allowed
Actions can be specified when a referenced row is deleted or updated

Common referential actions include:

CASCADE: If a primary key is updated/deleted, the foreign key values are automatically updated/deleted too
RESTRICT or NO ACTION: Prevents deletion/update of a primary key if it's referenced by foreign keys
SET NULL: Sets the foreign key values to NULL if the referenced primary key is deleted/updated
SET DEFAULT: Sets the foreign key values to a default value if the referenced primary key is deleted/updated

Here's an example implementing referential integrity with CASCADE:


CREATE TABLE Customers (
    customer_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL
);

CREATE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE NOT NULL,
    total_amount DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (customer_id) 
        REFERENCES Customers(customer_id)
        ON DELETE CASCADE  -- If a customer is deleted, all their orders are deleted too
        ON UPDATE CASCADE  -- If a customer_id is updated, it's updated in Orders too
);

In a real-world application like an e-commerce platform, referential integrity ensures that no order exists without a corresponding customer, maintaining data consistency throughout the system.

Natural vs. Surrogate Keys

The choice between natural and surrogate keys is an important design decision:

Natural Keys	Surrogate Keys
Have business meaning (e.g., ISBN, SSN)	No business meaning (e.g., auto-incrementing ID)
Can be complex (multiple columns)	Typically simple (single column)
Can change (problematic!)	Never change
Can be lengthy (e.g., UUIDs)	Usually compact integers
May reveal sensitive information	Reveals nothing about the entity

Modern practice generally favors surrogate keys for most tables, especially for:

Performance (integer joins are faster)
Simplicity (especially in complex joins)
Stability (natural keys might need to change)
Privacy (avoiding exposure of sensitive identifiers)

A real-world analogy is your passport or driver's license number versus your name. Your name has meaning, but it might not be unique, and it could change (e.g., through marriage). The ID number has no meaning but uniquely and permanently identifies you in the system.

Normalization

Normalization is a systematic approach to database design that reduces data redundancy and improves data integrity. It involves organizing data into multiple related tables and defining relationships between them.

Reasons for Normalization

Imagine a spreadsheet where each row contains a customer's information along with all their orders and the products in each order. This approach would have serious problems:

Data Redundancy: Customer data would be repeated for each order
Update Anomalies: Changing a customer's address would require updating multiple rows
Insertion Anomalies: You couldn't add a customer without an order
Deletion Anomalies: Deleting an order might lose customer information

Normalization solves these problems by decomposing tables into smaller, related tables according to specific rules called normal forms.

Dependency Types

To understand normalization, we first need to understand functional dependencies:

Functional Dependency: If knowing the value of attribute A lets you determine the value of attribute B, then B is functionally dependent on A (written as A → B).
Example: customer_id → customer_name (knowing the customer ID lets you determine the customer name)
Partial Dependency: When a non-key attribute depends on only part of a composite primary key.
Example: In an Enrollments table with primary key (student_id, course_id), if enrollment_date depends only on student_id
Transitive Dependency: When a non-key attribute depends on another non-key attribute.
Example: In a Students table, if dormitory_phone depends on dormitory_name, which depends on student_id

Normal Forms

Normalization progresses through several "normal forms," each addressing specific types of dependencies:

First Normal Form (1NF)

Eliminate repeating groups
Create separate tables for each set of related data
Identify each record with a primary key
Each cell must contain a single, atomic value

Example: Converting a table with a multi-valued attribute (phone numbers) into 1NF:

Before normalization:

customer_id	name	phone_numbers
1	John Smith	555-1234, 555-5678

After 1NF:

customer_id	name	phone_number
1	John Smith	555-1234
1	John Smith	555-5678

Or better yet, with a separate table for phone numbers:

Customers
customer_id	name
1	John Smith

PhoneNumbers
customer_id	phone_number
1	555-1234
1	555-5678

Second Normal Form (2NF)

Must be in 1NF
Remove partial dependencies (attributes that depend on only part of the primary key)

Example: Converting a composite key table with partial dependencies into 2NF:

Before 2NF:

student_id	course_id	student_name	course_name	grade
101	CS101	Alice Johnson	Intro to Programming	A

Here, student_name depends only on student_id, and course_name depends only on course_id, creating partial dependencies.

After 2NF:

Students
student_id	student_name
101	Alice Johnson

Courses
course_id	course_name
CS101	Intro to Programming

Enrollments
student_id	course_id	grade
101	CS101	A

Third Normal Form (3NF)

Must be in 2NF
Remove transitive dependencies (non-key attributes depending on other non-key attributes)

Example: Converting a table with transitive dependencies into 3NF:

Before 3NF:

order_id	product_id	product_name	category_id	category_name	quantity
1001	P100	Laptop	C10	Electronics	1

Here, category_name depends on category_id, which depends on product_id, creating a transitive dependency.

After 3NF:

Categories
category_id	category_name
C10	Electronics

Products
product_id	product_name	category_id
P100	Laptop	C10

OrderItems
order_id	product_id	quantity
1001	P100	1

Beyond 3NF

Additional normal forms exist (BCNF, 4NF, 5NF, 6NF), but 3NF is sufficient for most applications. Higher normal forms address more specialized dependency issues.

When to Denormalize

While normalization improves data integrity, sometimes denormalization (deliberately introducing redundancy) is justified for performance reasons:

Read-Heavy Applications: When a system performs many more reads than writes
Reporting Databases: For analytical queries that aggregate data across many tables
Performance Bottlenecks: When joins become too expensive for frequent queries

However, denormalization should be done carefully and only after the normalized design is understood:

Begin with a properly normalized design
Identify specific performance issues through testing
Selectively denormalize only where needed
Implement additional controls to maintain data consistency

A real-world example is an e-commerce product page. The product details might include the average rating, which is technically derived data (calculated from individual reviews). However, storing this pre-calculated value avoids having to compute it on every page view.

Practical Database Design Patterns

Beyond normalization, several common patterns and techniques are useful for real-world database design.

Inheritance Patterns

Relational databases don't directly support inheritance like object-oriented languages, but several patterns can model it:

Single Table Inheritance: All entities are stored in one table with a type discriminator.
Pros: Simple queries, no joins needed
Cons: Many nullable columns, potential wasted space
Table Per Type: Base attributes in a parent table, type-specific attributes in child tables.
Pros: Normalized design, avoids null columns
Cons: Requires joins for complete entity data
Table Per Concrete Type: Each entity type gets its own complete table.
Pros: Simple queries for specific types
Cons: Duplicated column definitions, complex queries for "all types"

Example of Table Per Type for a Person → Employee/Customer hierarchy:


CREATE TABLE People (
    person_id INT PRIMARY KEY,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    date_of_birth DATE
);

CREATE TABLE Employees (
    person_id INT PRIMARY KEY,
    hire_date DATE NOT NULL,
    department VARCHAR(50) NOT NULL,
    salary DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (person_id) REFERENCES People(person_id)
);

CREATE TABLE Customers (
    person_id INT PRIMARY KEY,
    customer_since DATE NOT NULL,
    credit_limit DECIMAL(10, 2),
    FOREIGN KEY (person_id) REFERENCES People(person_id)
);

Historical Data Patterns

For systems that need to track changes over time, these patterns are useful:

Audit Tables: Separate tables that record changes to main tables.
Example: customers_audit to record all changes to the customers table
Temporal Tables: Adding valid_from and valid_to timestamps to track when data was valid.
Example: price_history table to track product price changes over time
Event Sourcing: Store events that led to state changes rather than just the current state.
Example: account_transactions vs. just account_balance

Example of a temporal table for product prices:


CREATE TABLE ProductPrices (
    product_id INT NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    valid_from TIMESTAMP NOT NULL,
    valid_to TIMESTAMP,
    PRIMARY KEY (product_id, valid_from),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

-- Insert a new price (setting the previous one's end date)
BEGIN TRANSACTION;

-- Set end date for current price
UPDATE ProductPrices
SET valid_to = CURRENT_TIMESTAMP
WHERE product_id = 123 AND valid_to IS NULL;

-- Insert new price
INSERT INTO ProductPrices (product_id, price, valid_from, valid_to)
VALUES (123, 29.99, CURRENT_TIMESTAMP, NULL);

COMMIT;

Tree and Hierarchical Data

Hierarchical data (like organizational charts, category trees, or file systems) can be challenging in relational databases. Several patterns exist:

Adjacency List: Each item references its parent (simplest).
Pros: Easy to understand and implement, good for single-level operations
Cons: Requires recursive queries for tree traversal (not supported in all DBMSs)
Path Enumeration: Store the path from root to node (e.g., "1/4/7").
Pros: Easy retrieval of full paths, simple ancestor queries
Cons: Maintenance overhead when restructuring the tree
Nested Sets: Assign left and right values based on a tree traversal.
Pros: Efficient for read-heavy scenarios, especially subtree operations
Cons: Complex updates, not intuitive
Closure Table: Store all relationships, direct and indirect.
Pros: Efficient for all operations, intuitive
Cons: Additional storage requirements, maintenance overhead

Example of an Adjacency List for categories:


CREATE TABLE Categories (
    category_id INT PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    parent_id INT,
    FOREIGN KEY (parent_id) REFERENCES Categories(category_id)
);

INSERT INTO Categories (category_id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Desktop PCs', 2),
(5, 'Smartphones', 1);

-- Find all subcategories of Electronics
WITH RECURSIVE CategoryTree AS (
    -- Base case: start with Electronics
    SELECT category_id, name, parent_id
    FROM Categories
    WHERE category_id = 1
    
    UNION ALL
    
    -- Recursive step: find children of each category
    SELECT c.category_id, c.name, c.parent_id
    FROM Categories c
    JOIN CategoryTree ct ON c.parent_id = ct.category_id
)
SELECT * FROM CategoryTree;

Common Design Pitfalls

Even experienced database designers can make these common mistakes:

Overusing Surrogate Keys

While surrogate keys are generally good practice, overusing them can lead to problems:

Not enforcing natural uniqueness constraints (e.g., allowing duplicate email addresses)
Using surrogate keys for junction tables when composite keys would be more appropriate
Exposing internal surrogate keys to users instead of using natural keys for business operations

Best practice: Use surrogate keys for most entities, but still enforce uniqueness on natural candidate keys.

Poor Naming Conventions

Inconsistent or unclear naming makes databases harder to understand and maintain:

Inconsistent plural/singular naming (Orders vs. Product)
Unclear abbreviations (cust_nm vs. customer_name)
Inconsistent use of prefixes/suffixes
Names that don't reflect business concepts

Best practice: Adopt consistent naming conventions that align with business terminology.

Ignoring Indexing Needs

Proper indexing is crucial for performance but often overlooked:

Not indexing foreign keys (causing slow joins)
Not indexing frequently filtered columns
Creating too many indexes (slowing down writes)
Not considering composite indexes for multi-column conditions

Best practice: Analyze query patterns early and index accordingly, while monitoring performance impacts.

Inappropriate Denormalization

Premature or excessive denormalization creates maintenance headaches:

Denormalizing before understanding normal forms
Introducing redundancy without clear performance benefits
Failing to implement mechanisms to keep redundant data in sync

Best practice: Start with a normalized design, then selectively denormalize based on performance testing.

Neglecting Data Types

Using inappropriate data types leads to wasted space and performance issues:

Using VARCHAR when CHAR would be more appropriate (for fixed-length data)
Using TEXT for short strings
Using inappropriate numeric precision (DECIMAL(10,2) for dollar amounts vs. INTEGER for cents)
Using string types for dates or numeric data

Best practice: Choose the most appropriate data type for each column based on its content and usage patterns.

Case Study: E-Commerce Database Design

Let's apply what we've learned to design a simplified database for an e-commerce application:

erDiagram USERS ||--o{ ADDRESSES : have USERS ||--o{ ORDERS : place USERS { int user_id PK string email UK string password_hash string first_name string last_name datetime created_at boolean is_active } ADDRESSES { int address_id PK int user_id FK string address_line1 string address_line2 string city string state string postal_code string country boolean is_default string address_type } PRODUCTS { int product_id PK int category_id FK string sku UK string name string description decimal price int stock_quantity string image_url boolean is_active datetime created_at } CATEGORIES ||--o{ PRODUCTS : contain CATEGORIES { int category_id PK int parent_id FK string name string description } ORDERS ||--|{ ORDER_ITEMS : contain ORDERS { int order_id PK int user_id FK int shipping_address_id FK int billing_address_id FK datetime order_date string status decimal total_amount string payment_method string tracking_number } ORDER_ITEMS { int order_item_id PK int order_id FK int product_id FK int quantity decimal price decimal subtotal } PRODUCTS ||--o{ ORDER_ITEMS : included_in REVIEWS }|--|| USERS : written_by REVIEWS }|--|| PRODUCTS : about REVIEWS { int review_id PK int user_id FK int product_id FK int rating string comment datetime created_at }

Let's examine some of the design decisions:

Normalized Structure: The design follows 3NF with separate tables for users, addresses, products, categories, orders, order items, and reviews.
Surrogate Keys: Each table uses an auto-incrementing ID as its primary key.
Hierarchical Data: Categories uses a self-referential adjacency list pattern to represent the category hierarchy.
Junction Tables: ORDER_ITEMS serves as a junction table between ORDERS and PRODUCTS, with additional attributes for quantity and price.
Flexible Address Handling: Multiple addresses per user, with separate shipping and billing addresses for orders.

SQL implementation for key portions:


-- Users and Addresses
CREATE TABLE Users (
    user_id INT AUTO_INCREMENT PRIMARY KEY,
    email VARCHAR(100) UNIQUE NOT NULL,
    password_hash VARCHAR(255) NOT NULL,
    first_name VARCHAR(50) NOT NULL,
    last_name VARCHAR(50) NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    is_active BOOLEAN DEFAULT TRUE
);

CREATE TABLE Addresses (
    address_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    address_line1 VARCHAR(100) NOT NULL,
    address_line2 VARCHAR(100),
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50),
    postal_code VARCHAR(20) NOT NULL,
    country VARCHAR(50) NOT NULL,
    is_default BOOLEAN DEFAULT FALSE,
    address_type ENUM('shipping', 'billing', 'both') NOT NULL,
    FOREIGN KEY (user_id) REFERENCES Users(user_id)
);

-- Products and Categories
CREATE TABLE Categories (
    category_id INT AUTO_INCREMENT PRIMARY KEY,
    parent_id INT,
    name VARCHAR(50) NOT NULL,
    description TEXT,
    FOREIGN KEY (parent_id) REFERENCES Categories(category_id)
);

CREATE TABLE Products (
    product_id INT AUTO_INCREMENT PRIMARY KEY,
    category_id INT NOT NULL,
    sku VARCHAR(50) UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL,
    description TEXT,
    price DECIMAL(10, 2) NOT NULL,
    stock_quantity INT NOT NULL DEFAULT 0,
    image_url VARCHAR(255),
    is_active BOOLEAN DEFAULT TRUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);

-- Orders and OrderItems
CREATE TABLE Orders (
    order_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    shipping_address_id INT NOT NULL,
    billing_address_id INT NOT NULL,
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled') NOT NULL DEFAULT 'pending',
    total_amount DECIMAL(10, 2) NOT NULL,
    payment_method VARCHAR(50) NOT NULL,
    tracking_number VARCHAR(100),
    FOREIGN KEY (user_id) REFERENCES Users(user_id),
    FOREIGN KEY (shipping_address_id) REFERENCES Addresses(address_id),
    FOREIGN KEY (billing_address_id) REFERENCES Addresses(address_id)
);

CREATE TABLE OrderItems (
    order_item_id INT AUTO_INCREMENT PRIMARY KEY,
    order_id INT NOT NULL,
    product_id INT NOT NULL,
    quantity INT NOT NULL,
    price DECIMAL(10, 2) NOT NULL,
    subtotal DECIMAL(10, 2) NOT NULL,
    FOREIGN KEY (order_id) REFERENCES Orders(order_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

-- Reviews
CREATE TABLE Reviews (
    review_id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    product_id INT NOT NULL,
    rating INT NOT NULL CHECK (rating BETWEEN 1 AND 5),
    comment TEXT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (user_id) REFERENCES Users(user_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id),
    UNIQUE (user_id, product_id) -- One review per user per product
);

This schema demonstrates a well-normalized design that would serve as a solid foundation for an e-commerce application. In a real-world scenario, you might add more tables (like payment processing information, inventory history, etc.) and possibly denormalize certain areas based on performance requirements.

Practice Activities

Activity 1: Normalization Exercise

Consider the following un-normalized table for a library management system:


Books(book_id, title, author_name, author_birth_date, publisher_name, publisher_address, publisher_phone, isbn, publication_date, genre, shelf_location, copy_number, borrower_id, borrower_name, borrower_email, borrow_date, return_date)

Apply normalization to create a properly structured relational schema:

Identify entities and their attributes
Identify primary keys for each entity
Create foreign key relationships
Ensure the design meets 3NF requirements
Draw an ER diagram for your solution

Activity 2: E-Learning Platform Design

Design a database schema for an e-learning platform with these requirements:

Users can be students, instructors, or administrators (with different attributes)
Courses have modules, and modules have lessons
Students can enroll in multiple courses
Instructors can teach multiple courses
Students can submit assignments for each course
Instructors can grade assignments and provide feedback
The system tracks student progress through courses

Create an ER diagram and write SQL CREATE TABLE statements for your design.

Activity 3: Design Critique

Review the following database design for a blog platform and identify potential problems:


-- Users table
CREATE TABLE Users (
    id INT PRIMARY KEY,
    username VARCHAR(50),
    password VARCHAR(50),
    email VARCHAR(100),
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    bio TEXT,
    joined_date DATE
);

-- Posts table
CREATE TABLE Posts (
    id INT PRIMARY KEY,
    title VARCHAR(100),
    content TEXT,
    author_id INT,
    author_username VARCHAR(50),
    author_email VARCHAR(100),
    created_date DATETIME,
    category VARCHAR(50),
    tags VARCHAR(255),
    status VARCHAR(20)
);

-- Comments table
CREATE TABLE Comments (
    id INT PRIMARY KEY,
    post_id INT,
    commenter_name VARCHAR(50),
    commenter_email VARCHAR(100),
    content TEXT,
    created_date DATETIME,
    is_approved BOOLEAN
);

For each issue you identify:

Explain why it's a problem
Suggest how to fix it
Provide the revised SQL statements

Summary

In this lecture, we've explored the principles and practices of relational database design:

The Relational Model: Tables (relations), rows (tuples), and columns (attributes) as the foundation of relational databases
Entity-Relationship Modeling: Visual representations of database structure with entities, attributes, and relationships
Keys and Relationships: Primary keys, foreign keys, and their role in establishing table relationships, with considerations for surrogate vs. natural keys
Normalization: The process of organizing data to reduce redundancy and improve integrity, with an understanding of normal forms (1NF, 2NF, 3NF)
Design Patterns: Common patterns for inheritance, historical data, and hierarchical structures
Design Pitfalls: Common mistakes to avoid in database design
Practical Application: Applying these principles to a real-world e-commerce database design

A well-designed relational database is the foundation of reliable, efficient applications. By following these principles, you can create database schemas that not only meet current needs but can also adapt to changing requirements over time.