Introduction to Relational Database Design
Relational databases remain the backbone of most business applications, from e-commerce platforms to healthcare systems. Their enduring popularity stems from their ability to reliably store and query structured data while maintaining data integrity.
Good relational database design is like architecture—when done well, it creates a solid foundation that supports your application as it grows and evolves. Poor design, however, leads to performance problems, data inconsistencies, and inflexible systems that resist change.
In this lecture, we'll explore the principles and practices that guide effective relational database design, focusing on:
- The relational model and its core components
- Entity-Relationship modeling
- Normalization and its importance
- Keys and relationships
- Practical design patterns and considerations
By the end of this lecture, you'll have the knowledge to design database schemas that are efficient, maintainable, and aligned with business requirements.
The Relational Model
Developed by E.F. Codd at IBM in 1970, the relational model revolutionized database management by providing a mathematical foundation for organizing and accessing data. This model has three key components:
Core Components
- Relations (Tables): Collections of related data organized into rows and columns. Each table represents an entity or concept.
- Tuples (Rows): Individual records within a table, representing specific instances of the entity.
- Attributes (Columns): Properties or characteristics of an entity, each with a defined data type.
Let's illustrate these concepts with a simple example of a "Customers" table:
| customer_id | first_name | last_name | date_joined | |
|---|---|---|---|---|
| Attributes/Columns | ||||
| 1 | John | Smith | john@example.com | 2024-01-15 |
| 2 | Sarah | Johnson | sarah@example.com | 2024-02-20 |
| 3 | Michael | Lee | michael@example.com | 2024-03-05 |
| Tuples/Rows | ||||
Properties of Relations
In a properly designed relational database, relations have these important properties:
- Atomic Values: Each cell contains a single, indivisible value (not a list or set)
- Unique Rows: No duplicate rows are allowed in a table
- Column Homogeneity: All values in a column have the same data type
- Column Order Insignificance: The order of columns doesn't affect the meaning of the data
- Row Order Insignificance: The order of rows doesn't matter
- Unique Column Names: Each column within a table has a distinct name
Think of a relation as a spreadsheet with strict rules—every cell must contain a single value of the right type, column headers must be unique, and both the ordering of rows and columns has no meaning.
Relational Schema
A relational schema is the blueprint of a database that defines its structure. It specifies:
- The tables in the database
- The columns in each table and their data types
- Constraints on the data (e.g., NOT NULL, UNIQUE)
- Relationships between tables
A schema is typically represented using this notation:
TableName(column1:type, column2:type, ...)
Example:
Customers(customer_id:integer, first_name:varchar, last_name:varchar, email:varchar, date_joined:date)
Orders(order_id:integer, customer_id:integer, order_date:date, total_amount:decimal)
In practice, schemas are implemented using SQL statements like CREATE TABLE:
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
date_joined DATE NOT NULL
);
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT NOT NULL,
order_date DATE NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);
Entity-Relationship Modeling
Before writing SQL to create tables, database designers typically create an Entity-Relationship (ER) model—a visual representation of the database structure that's easier to understand and discuss with stakeholders.
Core Components of ER Models
- Entities: Objects or concepts about which we want to store data (e.g., customers, products, orders). Represented as rectangles in ER diagrams.
- Attributes: Properties or characteristics of entities. Represented as ovals connected to entities.
- Relationships: Connections between entities. Represented as diamonds connecting entities.
Types of Attributes
- Simple vs. Composite: Simple attributes can't be divided (e.g., age), while composite attributes can be split into smaller parts (e.g., address).
- Single-valued vs. Multi-valued: Single-valued attributes hold one value (e.g., date of birth), while multi-valued attributes can hold multiple values (e.g., phone numbers).
- Derived: Attributes whose values can be calculated from other attributes (e.g., age derived from date of birth).
- Key Attributes: Attributes that uniquely identify an entity (e.g., customer_id).
In relational databases, multi-valued attributes typically require separate tables, and composite attributes are often split into individual columns.
Types of Relationships
Relationships in ER models have cardinality—the number of instances of one entity that can relate to instances of another entity.
-
One-to-One (1:1): One entity instance relates to exactly one instance of another entity.
Example: A person has one passport, and each passport belongs to one person. -
One-to-Many (1:N): One entity instance relates to multiple instances of another entity.
Example: A customer can place many orders, but each order is placed by one customer. -
Many-to-Many (N:M): Multiple instances of one entity relate to multiple instances of another entity.
Example: A student can enroll in many courses, and a course can have many students.
When implementing these relationships in a relational database:
- One-to-One: Either table can include a foreign key referencing the other, typically the entity that depends on the other.
- One-to-Many: The "many" side includes a foreign key referencing the "one" side.
- Many-to-Many: Requires a junction table (also called bridge or association table) with foreign keys to both entities.
Converting ER Diagrams to Relational Schemas
The process of moving from an ER diagram to a relational schema follows these general steps:
- Each entity type becomes a table
- Attributes become columns in the tables
- Primary key attributes become primary keys in the tables
- Relationships are implemented through foreign keys
- Many-to-many relationships require junction tables
Let's illustrate with a simple example showing how a many-to-many relationship is implemented:
-- Students table
CREATE TABLE Students (
student_id INT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL
);
-- Courses table
CREATE TABLE Courses (
course_id INT PRIMARY KEY,
title VARCHAR(100) NOT NULL,
credits INT NOT NULL,
department VARCHAR(50) NOT NULL
);
-- Junction table for many-to-many relationship
CREATE TABLE Enrollments (
student_id INT,
course_id INT,
enrollment_date DATE NOT NULL,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES Students(student_id),
FOREIGN KEY (course_id) REFERENCES Courses(course_id)
);
The junction table Enrollments resolves the many-to-many relationship between Students and Courses, and also stores relationship-specific attributes like enrollment_date and grade.
Keys and Relationships
Keys are fundamental to relational database design, serving as identifiers and creating relationships between tables.
Types of Keys
-
Primary Key: A column or combination of columns that uniquely identifies each row in a table.
Example: customer_id in a Customers table. -
Foreign Key: A column or combination of columns that refers to the primary key of another table.
Example: customer_id in an Orders table referencing the Customers table. -
Candidate Key: A column or combination of columns that could potentially serve as a primary key.
Example: In a Students table, both student_id and email might be candidate keys. -
Composite Key: A primary key composed of multiple columns.
Example: In an Enrollments table, the combination of student_id and course_id. -
Surrogate Key: An artificial primary key, typically an auto-incrementing ID, with no business meaning.
Example: An auto-generated product_id in a Products table. -
Natural Key: A primary key that has business meaning.
Example: A social security number or ISBN.
Referential Integrity
Referential integrity ensures that relationships between tables remain consistent. It's enforced through foreign key constraints:
- Every foreign key value must match a primary key value in the referenced table, or be NULL if allowed
- Actions can be specified when a referenced row is deleted or updated
Common referential actions include:
- CASCADE: If a primary key is updated/deleted, the foreign key values are automatically updated/deleted too
- RESTRICT or NO ACTION: Prevents deletion/update of a primary key if it's referenced by foreign keys
- SET NULL: Sets the foreign key values to NULL if the referenced primary key is deleted/updated
- SET DEFAULT: Sets the foreign key values to a default value if the referenced primary key is deleted/updated
Here's an example implementing referential integrity with CASCADE:
CREATE TABLE Customers (
customer_id INT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL
);
CREATE TABLE Orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE NOT NULL,
total_amount DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (customer_id)
REFERENCES Customers(customer_id)
ON DELETE CASCADE -- If a customer is deleted, all their orders are deleted too
ON UPDATE CASCADE -- If a customer_id is updated, it's updated in Orders too
);
In a real-world application like an e-commerce platform, referential integrity ensures that no order exists without a corresponding customer, maintaining data consistency throughout the system.
Natural vs. Surrogate Keys
The choice between natural and surrogate keys is an important design decision:
| Natural Keys | Surrogate Keys |
|---|---|
| Have business meaning (e.g., ISBN, SSN) | No business meaning (e.g., auto-incrementing ID) |
| Can be complex (multiple columns) | Typically simple (single column) |
| Can change (problematic!) | Never change |
| Can be lengthy (e.g., UUIDs) | Usually compact integers |
| May reveal sensitive information | Reveals nothing about the entity |
Modern practice generally favors surrogate keys for most tables, especially for:
- Performance (integer joins are faster)
- Simplicity (especially in complex joins)
- Stability (natural keys might need to change)
- Privacy (avoiding exposure of sensitive identifiers)
A real-world analogy is your passport or driver's license number versus your name. Your name has meaning, but it might not be unique, and it could change (e.g., through marriage). The ID number has no meaning but uniquely and permanently identifies you in the system.
Normalization
Normalization is a systematic approach to database design that reduces data redundancy and improves data integrity. It involves organizing data into multiple related tables and defining relationships between them.
Reasons for Normalization
Imagine a spreadsheet where each row contains a customer's information along with all their orders and the products in each order. This approach would have serious problems:
- Data Redundancy: Customer data would be repeated for each order
- Update Anomalies: Changing a customer's address would require updating multiple rows
- Insertion Anomalies: You couldn't add a customer without an order
- Deletion Anomalies: Deleting an order might lose customer information
Normalization solves these problems by decomposing tables into smaller, related tables according to specific rules called normal forms.
Dependency Types
To understand normalization, we first need to understand functional dependencies:
-
Functional Dependency: If knowing the value of attribute A lets you determine the value of attribute B,
then B is functionally dependent on A (written as A → B).
Example: customer_id → customer_name (knowing the customer ID lets you determine the customer name) -
Partial Dependency: When a non-key attribute depends on only part of a composite primary key.
Example: In an Enrollments table with primary key (student_id, course_id), if enrollment_date depends only on student_id -
Transitive Dependency: When a non-key attribute depends on another non-key attribute.
Example: In a Students table, if dormitory_phone depends on dormitory_name, which depends on student_id
Normal Forms
Normalization progresses through several "normal forms," each addressing specific types of dependencies:
First Normal Form (1NF)
- Eliminate repeating groups
- Create separate tables for each set of related data
- Identify each record with a primary key
- Each cell must contain a single, atomic value
Example: Converting a table with a multi-valued attribute (phone numbers) into 1NF:
Before normalization:
| customer_id | name | phone_numbers |
|---|---|---|
| 1 | John Smith | 555-1234, 555-5678 |
After 1NF:
| customer_id | name | phone_number |
|---|---|---|
| 1 | John Smith | 555-1234 |
| 1 | John Smith | 555-5678 |
Or better yet, with a separate table for phone numbers:
| customer_id | name |
|---|---|
| 1 | John Smith |
| customer_id | phone_number |
|---|---|
| 1 | 555-1234 |
| 1 | 555-5678 |
Second Normal Form (2NF)
- Must be in 1NF
- Remove partial dependencies (attributes that depend on only part of the primary key)
Example: Converting a composite key table with partial dependencies into 2NF:
Before 2NF:
| student_id | course_id | student_name | course_name | grade |
|---|---|---|---|---|
| 101 | CS101 | Alice Johnson | Intro to Programming | A |
Here, student_name depends only on student_id, and course_name depends only on course_id, creating partial dependencies.
After 2NF:
| student_id | student_name |
|---|---|
| 101 | Alice Johnson |
| course_id | course_name |
|---|---|
| CS101 | Intro to Programming |
| student_id | course_id | grade |
|---|---|---|
| 101 | CS101 | A |
Third Normal Form (3NF)
- Must be in 2NF
- Remove transitive dependencies (non-key attributes depending on other non-key attributes)
Example: Converting a table with transitive dependencies into 3NF:
Before 3NF:
| order_id | product_id | product_name | category_id | category_name | quantity |
|---|---|---|---|---|---|
| 1001 | P100 | Laptop | C10 | Electronics | 1 |
Here, category_name depends on category_id, which depends on product_id, creating a transitive dependency.
After 3NF:
| category_id | category_name |
|---|---|
| C10 | Electronics |
| product_id | product_name | category_id |
|---|---|---|
| P100 | Laptop | C10 |
| order_id | product_id | quantity |
|---|---|---|
| 1001 | P100 | 1 |
Beyond 3NF
Additional normal forms exist (BCNF, 4NF, 5NF, 6NF), but 3NF is sufficient for most applications. Higher normal forms address more specialized dependency issues.
When to Denormalize
While normalization improves data integrity, sometimes denormalization (deliberately introducing redundancy) is justified for performance reasons:
- Read-Heavy Applications: When a system performs many more reads than writes
- Reporting Databases: For analytical queries that aggregate data across many tables
- Performance Bottlenecks: When joins become too expensive for frequent queries
However, denormalization should be done carefully and only after the normalized design is understood:
- Begin with a properly normalized design
- Identify specific performance issues through testing
- Selectively denormalize only where needed
- Implement additional controls to maintain data consistency
A real-world example is an e-commerce product page. The product details might include the average rating, which is technically derived data (calculated from individual reviews). However, storing this pre-calculated value avoids having to compute it on every page view.
Practical Database Design Patterns
Beyond normalization, several common patterns and techniques are useful for real-world database design.
Inheritance Patterns
Relational databases don't directly support inheritance like object-oriented languages, but several patterns can model it:
-
Single Table Inheritance: All entities are stored in one table with a type discriminator.
Pros: Simple queries, no joins needed
Cons: Many nullable columns, potential wasted space -
Table Per Type: Base attributes in a parent table, type-specific attributes in child tables.
Pros: Normalized design, avoids null columns
Cons: Requires joins for complete entity data -
Table Per Concrete Type: Each entity type gets its own complete table.
Pros: Simple queries for specific types
Cons: Duplicated column definitions, complex queries for "all types"
Example of Table Per Type for a Person → Employee/Customer hierarchy:
CREATE TABLE People (
person_id INT PRIMARY KEY,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
email VARCHAR(100) UNIQUE NOT NULL,
date_of_birth DATE
);
CREATE TABLE Employees (
person_id INT PRIMARY KEY,
hire_date DATE NOT NULL,
department VARCHAR(50) NOT NULL,
salary DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (person_id) REFERENCES People(person_id)
);
CREATE TABLE Customers (
person_id INT PRIMARY KEY,
customer_since DATE NOT NULL,
credit_limit DECIMAL(10, 2),
FOREIGN KEY (person_id) REFERENCES People(person_id)
);
Historical Data Patterns
For systems that need to track changes over time, these patterns are useful:
-
Audit Tables: Separate tables that record changes to main tables.
Example: customers_audit to record all changes to the customers table -
Temporal Tables: Adding valid_from and valid_to timestamps to track when data was valid.
Example: price_history table to track product price changes over time -
Event Sourcing: Store events that led to state changes rather than just the current state.
Example: account_transactions vs. just account_balance
Example of a temporal table for product prices:
CREATE TABLE ProductPrices (
product_id INT NOT NULL,
price DECIMAL(10, 2) NOT NULL,
valid_from TIMESTAMP NOT NULL,
valid_to TIMESTAMP,
PRIMARY KEY (product_id, valid_from),
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);
-- Insert a new price (setting the previous one's end date)
BEGIN TRANSACTION;
-- Set end date for current price
UPDATE ProductPrices
SET valid_to = CURRENT_TIMESTAMP
WHERE product_id = 123 AND valid_to IS NULL;
-- Insert new price
INSERT INTO ProductPrices (product_id, price, valid_from, valid_to)
VALUES (123, 29.99, CURRENT_TIMESTAMP, NULL);
COMMIT;
Tree and Hierarchical Data
Hierarchical data (like organizational charts, category trees, or file systems) can be challenging in relational databases. Several patterns exist:
-
Adjacency List: Each item references its parent (simplest).
Pros: Easy to understand and implement, good for single-level operations
Cons: Requires recursive queries for tree traversal (not supported in all DBMSs) -
Path Enumeration: Store the path from root to node (e.g., "1/4/7").
Pros: Easy retrieval of full paths, simple ancestor queries
Cons: Maintenance overhead when restructuring the tree -
Nested Sets: Assign left and right values based on a tree traversal.
Pros: Efficient for read-heavy scenarios, especially subtree operations
Cons: Complex updates, not intuitive -
Closure Table: Store all relationships, direct and indirect.
Pros: Efficient for all operations, intuitive
Cons: Additional storage requirements, maintenance overhead
Example of an Adjacency List for categories:
CREATE TABLE Categories (
category_id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
parent_id INT,
FOREIGN KEY (parent_id) REFERENCES Categories(category_id)
);
INSERT INTO Categories (category_id, name, parent_id) VALUES
(1, 'Electronics', NULL),
(2, 'Computers', 1),
(3, 'Laptops', 2),
(4, 'Desktop PCs', 2),
(5, 'Smartphones', 1);
-- Find all subcategories of Electronics
WITH RECURSIVE CategoryTree AS (
-- Base case: start with Electronics
SELECT category_id, name, parent_id
FROM Categories
WHERE category_id = 1
UNION ALL
-- Recursive step: find children of each category
SELECT c.category_id, c.name, c.parent_id
FROM Categories c
JOIN CategoryTree ct ON c.parent_id = ct.category_id
)
SELECT * FROM CategoryTree;
Common Design Pitfalls
Even experienced database designers can make these common mistakes:
Overusing Surrogate Keys
While surrogate keys are generally good practice, overusing them can lead to problems:
- Not enforcing natural uniqueness constraints (e.g., allowing duplicate email addresses)
- Using surrogate keys for junction tables when composite keys would be more appropriate
- Exposing internal surrogate keys to users instead of using natural keys for business operations
Best practice: Use surrogate keys for most entities, but still enforce uniqueness on natural candidate keys.
Poor Naming Conventions
Inconsistent or unclear naming makes databases harder to understand and maintain:
- Inconsistent plural/singular naming (Orders vs. Product)
- Unclear abbreviations (cust_nm vs. customer_name)
- Inconsistent use of prefixes/suffixes
- Names that don't reflect business concepts
Best practice: Adopt consistent naming conventions that align with business terminology.
Ignoring Indexing Needs
Proper indexing is crucial for performance but often overlooked:
- Not indexing foreign keys (causing slow joins)
- Not indexing frequently filtered columns
- Creating too many indexes (slowing down writes)
- Not considering composite indexes for multi-column conditions
Best practice: Analyze query patterns early and index accordingly, while monitoring performance impacts.
Inappropriate Denormalization
Premature or excessive denormalization creates maintenance headaches:
- Denormalizing before understanding normal forms
- Introducing redundancy without clear performance benefits
- Failing to implement mechanisms to keep redundant data in sync
Best practice: Start with a normalized design, then selectively denormalize based on performance testing.
Neglecting Data Types
Using inappropriate data types leads to wasted space and performance issues:
- Using VARCHAR when CHAR would be more appropriate (for fixed-length data)
- Using TEXT for short strings
- Using inappropriate numeric precision (DECIMAL(10,2) for dollar amounts vs. INTEGER for cents)
- Using string types for dates or numeric data
Best practice: Choose the most appropriate data type for each column based on its content and usage patterns.
Case Study: E-Commerce Database Design
Let's apply what we've learned to design a simplified database for an e-commerce application:
Let's examine some of the design decisions:
- Normalized Structure: The design follows 3NF with separate tables for users, addresses, products, categories, orders, order items, and reviews.
- Surrogate Keys: Each table uses an auto-incrementing ID as its primary key.
- Hierarchical Data: Categories uses a self-referential adjacency list pattern to represent the category hierarchy.
- Junction Tables: ORDER_ITEMS serves as a junction table between ORDERS and PRODUCTS, with additional attributes for quantity and price.
- Flexible Address Handling: Multiple addresses per user, with separate shipping and billing addresses for orders.
SQL implementation for key portions:
-- Users and Addresses
CREATE TABLE Users (
user_id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(100) UNIQUE NOT NULL,
password_hash VARCHAR(255) NOT NULL,
first_name VARCHAR(50) NOT NULL,
last_name VARCHAR(50) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
is_active BOOLEAN DEFAULT TRUE
);
CREATE TABLE Addresses (
address_id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
address_line1 VARCHAR(100) NOT NULL,
address_line2 VARCHAR(100),
city VARCHAR(50) NOT NULL,
state VARCHAR(50),
postal_code VARCHAR(20) NOT NULL,
country VARCHAR(50) NOT NULL,
is_default BOOLEAN DEFAULT FALSE,
address_type ENUM('shipping', 'billing', 'both') NOT NULL,
FOREIGN KEY (user_id) REFERENCES Users(user_id)
);
-- Products and Categories
CREATE TABLE Categories (
category_id INT AUTO_INCREMENT PRIMARY KEY,
parent_id INT,
name VARCHAR(50) NOT NULL,
description TEXT,
FOREIGN KEY (parent_id) REFERENCES Categories(category_id)
);
CREATE TABLE Products (
product_id INT AUTO_INCREMENT PRIMARY KEY,
category_id INT NOT NULL,
sku VARCHAR(50) UNIQUE NOT NULL,
name VARCHAR(100) NOT NULL,
description TEXT,
price DECIMAL(10, 2) NOT NULL,
stock_quantity INT NOT NULL DEFAULT 0,
image_url VARCHAR(255),
is_active BOOLEAN DEFAULT TRUE,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);
-- Orders and OrderItems
CREATE TABLE Orders (
order_id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
shipping_address_id INT NOT NULL,
billing_address_id INT NOT NULL,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
status ENUM('pending', 'processing', 'shipped', 'delivered', 'cancelled') NOT NULL DEFAULT 'pending',
total_amount DECIMAL(10, 2) NOT NULL,
payment_method VARCHAR(50) NOT NULL,
tracking_number VARCHAR(100),
FOREIGN KEY (user_id) REFERENCES Users(user_id),
FOREIGN KEY (shipping_address_id) REFERENCES Addresses(address_id),
FOREIGN KEY (billing_address_id) REFERENCES Addresses(address_id)
);
CREATE TABLE OrderItems (
order_item_id INT AUTO_INCREMENT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10, 2) NOT NULL,
subtotal DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES Orders(order_id),
FOREIGN KEY (product_id) REFERENCES Products(product_id)
);
-- Reviews
CREATE TABLE Reviews (
review_id INT AUTO_INCREMENT PRIMARY KEY,
user_id INT NOT NULL,
product_id INT NOT NULL,
rating INT NOT NULL CHECK (rating BETWEEN 1 AND 5),
comment TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (user_id) REFERENCES Users(user_id),
FOREIGN KEY (product_id) REFERENCES Products(product_id),
UNIQUE (user_id, product_id) -- One review per user per product
);
This schema demonstrates a well-normalized design that would serve as a solid foundation for an e-commerce application. In a real-world scenario, you might add more tables (like payment processing information, inventory history, etc.) and possibly denormalize certain areas based on performance requirements.
Practice Activities
Activity 1: Normalization Exercise
Consider the following un-normalized table for a library management system:
Books(book_id, title, author_name, author_birth_date, publisher_name, publisher_address, publisher_phone, isbn, publication_date, genre, shelf_location, copy_number, borrower_id, borrower_name, borrower_email, borrow_date, return_date)
Apply normalization to create a properly structured relational schema:
- Identify entities and their attributes
- Identify primary keys for each entity
- Create foreign key relationships
- Ensure the design meets 3NF requirements
- Draw an ER diagram for your solution
Activity 2: E-Learning Platform Design
Design a database schema for an e-learning platform with these requirements:
- Users can be students, instructors, or administrators (with different attributes)
- Courses have modules, and modules have lessons
- Students can enroll in multiple courses
- Instructors can teach multiple courses
- Students can submit assignments for each course
- Instructors can grade assignments and provide feedback
- The system tracks student progress through courses
Create an ER diagram and write SQL CREATE TABLE statements for your design.
Activity 3: Design Critique
Review the following database design for a blog platform and identify potential problems:
-- Users table
CREATE TABLE Users (
id INT PRIMARY KEY,
username VARCHAR(50),
password VARCHAR(50),
email VARCHAR(100),
first_name VARCHAR(50),
last_name VARCHAR(50),
bio TEXT,
joined_date DATE
);
-- Posts table
CREATE TABLE Posts (
id INT PRIMARY KEY,
title VARCHAR(100),
content TEXT,
author_id INT,
author_username VARCHAR(50),
author_email VARCHAR(100),
created_date DATETIME,
category VARCHAR(50),
tags VARCHAR(255),
status VARCHAR(20)
);
-- Comments table
CREATE TABLE Comments (
id INT PRIMARY KEY,
post_id INT,
commenter_name VARCHAR(50),
commenter_email VARCHAR(100),
content TEXT,
created_date DATETIME,
is_approved BOOLEAN
);
For each issue you identify:
- Explain why it's a problem
- Suggest how to fix it
- Provide the revised SQL statements
Summary
In this lecture, we've explored the principles and practices of relational database design:
- The Relational Model: Tables (relations), rows (tuples), and columns (attributes) as the foundation of relational databases
- Entity-Relationship Modeling: Visual representations of database structure with entities, attributes, and relationships
- Keys and Relationships: Primary keys, foreign keys, and their role in establishing table relationships, with considerations for surrogate vs. natural keys
- Normalization: The process of organizing data to reduce redundancy and improve integrity, with an understanding of normal forms (1NF, 2NF, 3NF)
- Design Patterns: Common patterns for inheritance, historical data, and hierarchical structures
- Design Pitfalls: Common mistakes to avoid in database design
- Practical Application: Applying these principles to a real-world e-commerce database design
A well-designed relational database is the foundation of reliable, efficient applications. By following these principles, you can create database schemas that not only meet current needs but can also adapt to changing requirements over time.
Further Reading
- Database Design for Mere Mortals by Michael J. Hernandez
- Database Systems: The Complete Book by Garcia-Molina, Ullman, and Widom
- Ten Common Database Design Mistakes by SQL Server Central
- Use The Index, Luke! - A guide to database performance for developers
- All About Keys in SQL by Vertabelo