Introduction to Databases
In our digital world, data is everywhere. From the messages you send to friends, to your shopping history, to the records of your bank transactions—all this information needs to be stored somewhere. This is where databases come in.
A database is an organized collection of structured information, or data, typically stored electronically in a computer system. Databases are designed to efficiently store, retrieve, manage, and update data.
Think of a database as a digital filing cabinet. A traditional filing cabinet contains folders organized in drawers. Similarly, a database contains tables (folders) with records (documents) arranged in an organized way for easy access and management.
Why We Need Databases
Before diving into database types, let's understand why we need specialized database systems instead of just saving data in files:
- Data Integrity: Databases enforce rules to maintain accurate and consistent data
- Concurrent Access: Multiple users can access and modify data simultaneously
- Efficient Querying: Databases are optimized for searching and retrieving information
- Security: Databases provide access control mechanisms to protect sensitive data
- Backup and Recovery: Data can be backed up and restored when necessary
- Scalability: Databases can grow to handle increasing amounts of data and users
Imagine trying to manage a library's collection with just text files—finding books, tracking checkouts, and managing returns would be incredibly inefficient. A database system enables librarians to instantly search by title, author, category, or availability status, and ensures the data stays accurate even when multiple librarians are updating records simultaneously.
The Evolution of Databases
Databases have evolved significantly over time, adapting to changing data storage and processing needs.
Historical Development
This evolution reflects how data requirements have changed over time. For example, early business databases needed to handle structured accounting data, while today's applications need to process diverse data types like user-generated content, real-time analytics, and IoT sensor data.
From Files to Databases
Before modern databases, applications stored data in flat files. Consider how a business might store customer information:
# customer_data.txt (Flat file approach)
1,John Smith,42,New York,1985-03-12
2,Sarah Johnson,35,Chicago,1990-07-28
3,Michael Lee,29,San Francisco,1996-11-05
This simple approach becomes problematic as data grows:
- Difficult to update specific fields without rewriting the entire file
- No built-in way to validate data (e.g., preventing duplicate customer IDs)
- Inefficient for searching (e.g., finding all customers in Chicago)
- Prone to corruption if two processes write to the file simultaneously
- No way to establish relationships between different data files
Databases solve these challenges through structured data organization, indexing, query languages, concurrency control, and more.
Key Database Concepts
Let's explore some fundamental concepts that apply to most database systems:
Database Management System (DBMS)
A Database Management System is software that interacts with the database, applications, and users to capture and analyze data. The DBMS manages everything from data storage to security and performance optimization.
Think of a DBMS as a librarian who knows where every book is located, manages checkout procedures, ensures books are returned to the correct shelves, and maintains the catalog system—all while serving multiple patrons simultaneously.
Schema
A database schema is the blueprint or structure that defines how data is organized within a database. It specifies:
- What tables/collections exist
- What fields each table/collection contains
- The data types of those fields
- Constraints on the data (e.g., required fields, unique values)
- Relationships between different tables/collections
A schema is like an architect's plan for a building. Before construction begins, the plan establishes the structure, dimensions, and connections between rooms. Similarly, a database schema defines the structure before data is added.
ACID Properties
ACID is an acronym representing four critical properties that guarantee reliable database transactions:
- Atomicity: A transaction is all or nothing—if any part fails, the entire transaction fails
- Consistency: A transaction can only bring the database from one valid state to another
- Isolation: Concurrent transactions cannot affect each other
- Durability: Once a transaction is committed, it remains so even in the event of system failure
These properties are especially important in systems like banking. Imagine transferring $100 from your checking to savings account. This involves two operations: deducting from checking and adding to savings. ACID properties ensure that both operations succeed or fail together (Atomicity), the total remains the same (Consistency), other transactions don't interfere (Isolation), and the changes remain permanent once completed (Durability).
Transactions
A transaction is a sequence of database operations that are treated as a single logical unit of work. Either all the operations complete successfully, or none of them take effect.
Think of a transaction like moving to a new home. The process involves multiple steps: packing belongings, transporting them, unpacking at the new location, and returning the moving truck. If any step fails (e.g., the truck breaks down), you haven't successfully "moved"—you're in a transitional state that needs resolution.
A typical database transaction might look like:
BEGIN TRANSACTION; -- Start the transaction
UPDATE accounts SET balance = balance - 100 WHERE account_id = 123; -- Deduct $100
UPDATE accounts SET balance = balance + 100 WHERE account_id = 456; -- Add $100
-- Check if both operations succeeded
IF (@@ERROR = 0)
COMMIT TRANSACTION; -- Make changes permanent
ELSE
ROLLBACK TRANSACTION; -- Undo all changes
Normalization
Normalization is the process of organizing data in a database to reduce redundancy and improve data integrity. This involves dividing large tables into smaller ones and defining relationships between them.
Imagine a bookstore inventory system. Without normalization, you might store the publisher's name, address, and contact information with every book record—creating massive redundancy. With normalization, you'd create separate tables for books and publishers, with books referencing their publisher through an ID.
We'll explore normalization in depth in the next lecture on relational database design.
Indexes
Indexes are special data structures that improve the speed of data retrieval operations. They work similar to the index in a book, allowing the database to quickly locate specific records without scanning every row.
For example, if you frequently search for customers by last name, you might create an index on the last_name column. Without this index, finding "Smith" would require checking every customer record. With the index, the database can jump directly to the "S" section and find all Smiths much faster.
However, indexes come with trade-offs:
- They speed up read operations but slow down writes (as indexes must be updated)
- They consume additional storage space
- Too many indexes can degrade overall performance
Major Types of Databases
Now that we understand core database concepts, let's explore the major categories of database systems.
Relational Databases
Relational databases organize data into tables (relations) with rows and columns. They use Structured Query Language (SQL) for defining, manipulating, and querying data.
Key Characteristics
- Data is organized in tables with predefined schemas
- Relationships between tables are established through keys
- ACID compliance ensures data integrity
- SQL provides a standardized way to interact with the database
- Strong support for complex queries and transactions
Common Relational Database Systems
- MySQL: Open-source, widely used for web applications
- PostgreSQL: Advanced open-source RDBMS with powerful features
- Oracle Database: Enterprise-grade commercial database system
- Microsoft SQL Server: Microsoft's enterprise database solution
- SQLite: Lightweight, file-based database ideal for embedded systems and local storage
Use Cases
- Financial systems requiring transaction integrity
- Customer relationship management (CRM) systems
- Content management systems
- E-commerce platforms
- Enterprise resource planning (ERP) systems
A bank's core banking system is a perfect example of relational database usage. It requires precise transaction handling, complex relationships between accounts, customers, and transactions, and robust reporting capabilities.
NoSQL Databases
NoSQL ("Not Only SQL") databases emerged to address limitations of relational databases, particularly for handling large volumes of unstructured or semi-structured data, and for applications requiring high scalability and availability.
Key Characteristics
- Flexible schemas (or schema-less design)
- Horizontally scalable (can add more servers to handle load)
- Typically sacrifice some ACID guarantees for performance and scalability
- Optimized for specific data models and access patterns
- Often used in distributed systems
Types of NoSQL Databases
NoSQL databases come in several varieties, each optimized for different use cases:
Document Stores
Document databases store data in flexible, JSON-like documents. Each document can have a different structure, and the schema can evolve over time.
// MongoDB document example (user record)
{
"_id": ObjectId("5f8d0c0b1c9d440000a7df7b"),
"username": "johndoe",
"email": "john@example.com",
"profile": {
"firstName": "John",
"lastName": "Doe",
"age": 32,
"interests": ["hiking", "photography", "cooking"]
},
"lastLogin": ISODate("2025-03-15T14:30:00Z"),
"loginHistory": [
{ "date": ISODate("2025-03-15T14:30:00Z"), "ip": "192.168.1.1" },
{ "date": ISODate("2025-03-10T09:15:00Z"), "ip": "192.168.1.1" }
]
}
Examples: MongoDB, CouchDB, Firebase Firestore
Use cases: Content management systems, user profiles, product catalogs, real-time analytics
Key-Value Stores
Key-value databases are the simplest NoSQL databases. They store data as a collection of key-value pairs, where each key is unique and maps to a specific value.
// Redis key-value examples
SET user:1000 "{"name":"John Doe","email":"john@example.com"}"
SET session:a1b2c3d4 "{"userId":1000,"expires":"2025-03-16T14:30:00Z"}"
SET product:5 "{"name":"Wireless Keyboard","price":49.99}"
Examples: Redis, Amazon DynamoDB, Riak
Use cases: Caching, session storage, real-time leaderboards, shopping carts
Column-Family Stores
Column-family databases store data in tables, rows, and columns, but unlike relational databases, the columns can vary from row to row, and they're designed for massive scalability.
// Cassandra column-family example
CREATE TABLE users (
user_id uuid PRIMARY KEY,
first_name text,
last_name text,
email text
);
// Data might be physically stored like this
{
"user123": {
"info": {
"first_name": "John",
"last_name": "Doe"
},
"contact": {
"email": "john@example.com",
"phone": "555-123-4567"
}
}
}
Examples: Apache Cassandra, HBase, ScyllaDB
Use cases: Time-series data, IoT applications, write-heavy applications, systems requiring massive scale
Graph Databases
Graph databases excel at managing highly connected data. They store entities as nodes and relationships as edges, making them ideal for navigating complex relationships.
// Neo4j Cypher query creating a social network relationship
CREATE (john:Person {name: 'John', age: 32})
CREATE (mary:Person {name: 'Mary', age: 30})
CREATE (john)-[:FRIENDS_WITH {since: '2020-01-15'}]->(mary)
CREATE (mary)-[:FRIENDS_WITH {since: '2020-01-15'}]->(john)
Examples: Neo4j, ArangoDB, Amazon Neptune
Use cases: Social networks, recommendation engines, fraud detection, network analysis, knowledge graphs
A social media platform is a perfect example of a graph database application. Efficiently finding "friends of friends" or "people you may know" is complex in a relational database but natural in a graph database, which can easily traverse relationship connections.
NewSQL Databases
NewSQL databases attempt to bring together the benefits of traditional relational databases (ACID compliance, SQL interface) with the scalability advantages of NoSQL systems.
Key Characteristics
- SQL interface for queries
- ACID transaction guarantees
- Horizontally scalable architecture
- No-shared-architecture to eliminate bottlenecks
- High-performance for both OLTP and OLAP workloads
Examples: Google Spanner, CockroachDB, Amazon Aurora, MemSQL
Use cases: Global applications requiring both strong consistency and high availability, financial systems that need to scale
Choosing the Right Database
Selecting the appropriate database for your application is a critical decision that can significantly impact development effort, performance, and scalability. Here are some factors to consider:
Data Structure
- Highly structured data with clear relationships: Relational databases excel here
- Semi-structured or evolving structure: Document databases offer flexibility
- Simple key-based lookups: Key-value stores are fastest
- Highly connected data: Graph databases are the natural choice
Query Patterns
- Complex queries with joins: Relational databases have powerful query capabilities
- Simple lookups by ID: Key-value stores are optimized for this
- Relationship traversal: Graph databases excel at "friend-of-friend" type queries
- High-volume write operations: Column-family stores often handle this well
Scalability Requirements
- Vertical scaling (bigger servers): Traditional relational databases
- Horizontal scaling (more servers): NoSQL or NewSQL databases
- Global distribution: NewSQL or specialized NoSQL solutions
Consistency Requirements
- Strong consistency needed: Relational or NewSQL databases
- Eventual consistency acceptable: Many NoSQL databases
Development Considerations
- Team expertise: Consider what technologies your team already knows
- Ecosystem and tooling: Mature databases have better tools and community support
- Integration with existing systems: Compatibility with your tech stack
Real-World Database Selection Examples
Let's examine how different applications choose their database systems:
- E-commerce Platform (Amazon): Uses a mix of databases - relational databases for order processing and inventory, key-value stores for shopping carts and sessions, and document databases for product catalogs.
- Social Media (LinkedIn): Uses graph databases for the social network connections, document stores for user profiles, and column-family stores for activity feeds and analytics.
- Financial Institution: Primarily relies on relational databases with ACID guarantees for account transactions, possibly with NewSQL for scaling without sacrificing consistency.
- Real-time Analytics Dashboard: Might use a column-family store like Cassandra for time-series data collection, combined with in-memory databases for fast query responses.
Practical Database Usage in Web Development
Now that we understand different database types, let's look at how databases are used within a typical web application architecture.
Three-Tier Architecture
Most web applications follow a three-tier architecture:
The database sits in the data tier, servicing requests from the application tier.
Database Connection Patterns
In web applications, database connections are typically managed in one of these ways:
- Connection Pool: Maintains a pool of database connections that are reused across requests, avoiding the overhead of creating new connections for each request.
- ORM (Object-Relational Mapping): Tools like Sequelize (JavaScript), SQLAlchemy (Python), or Hibernate (Java) that map database records to application objects.
- Query Builders: Libraries that help construct SQL queries programmatically, like Knex.js.
Here's a simplified example of using an ORM (Sequelize) in a Node.js application:
// Define a model representing a database table
const User = sequelize.define('User', {
username: {
type: DataTypes.STRING,
allowNull: false,
unique: true
},
email: {
type: DataTypes.STRING,
allowNull: false,
validate: {
isEmail: true
}
},
status: {
type: DataTypes.ENUM('active', 'inactive', 'banned'),
defaultValue: 'active'
}
});
// Use the model to interact with the database
async function createUser(userData) {
try {
const newUser = await User.create({
username: userData.username,
email: userData.email
});
return newUser;
} catch (error) {
console.error('Failed to create user:', error);
throw error;
}
}
Database Administration Tasks
Beyond just using a database, developers often need to handle these common tasks:
- Backups: Regularly backing up database to prevent data loss
- Migrations: Scripts to update database schema as application evolves
- Monitoring: Tracking database performance and resource usage
- Indexing: Creating and managing indexes to optimize query performance
- Security: Managing access control and protecting sensitive data
Here's an example of a database migration script using a migration tool (like Knex.js):
// Migration to add a new column to the users table
exports.up = function(knex) {
return knex.schema.table('users', function(table) {
table.string('phone_number', 15);
table.index('phone_number');
});
};
// Migration to undo the changes if needed
exports.down = function(knex) {
return knex.schema.table('users', function(table) {
table.dropIndex('phone_number');
table.dropColumn('phone_number');
});
};
Database Trends and Future Directions
The database landscape continues to evolve. Here are some trends to watch:
- Cloud-Native Databases: Purpose-built for cloud environments, offering automatic scaling and high availability (e.g., Amazon Aurora, Azure Cosmos DB, Google Spanner).
- Serverless Databases: Pay-per-use databases that automatically scale up and down based on demand (e.g., Amazon DynamoDB, Azure Cosmos DB, FaunaDB).
- Multi-Model Databases: Supporting multiple data models (relational, document, graph) within a single database system (e.g., ArangoDB, OrientDB).
- Time-Series Databases: Optimized for handling time-stamped data from IoT devices, monitoring systems, and financial markets (e.g., InfluxDB, TimescaleDB).
- Machine Learning Integration: Databases with built-in ML capabilities for in-database analytics (e.g., PostgreSQL with MADlib, MongoDB Atlas with ML features).
- Blockchain Databases: Incorporating blockchain technology for immutable, decentralized data storage (e.g., BigchainDB).
As data volumes continue to grow and application requirements become more complex, we'll likely see further specialization and innovation in database technology.
Practice Activities
Activity 1: Database Type Matching
For each of the following scenarios, identify the most appropriate type of database and explain your reasoning:
- A banking application that processes customer transactions
- A social network application that needs to find "friends of friends"
- A high-traffic e-commerce site's shopping cart system
- A content management system with flexible content types
- An IoT application collecting sensor data from thousands of devices
- A globally distributed financial trading platform
Activity 2: Database Concept Application
Consider a simple library management system and describe:
- What tables/collections would you need?
- What relationships would exist between them?
- What ACID properties would be important for this system?
- How would you handle concurrent users trying to check out the same book?
- Would you use a relational or NoSQL database, and why?
Activity 3: Database Technology Research
Choose one database system from each of these categories:
- Relational Database
- Document Database
- Graph Database
For each database:
- Research its key features and capabilities
- Find its typical use cases
- Identify major companies using this technology
- Compare its query language/syntax with the others
- Briefly describe how you would implement a simple user management system with it
Summary
In this lecture, we've covered fundamental database concepts and types:
- The evolution and purpose of database systems
- Key database concepts like ACID properties, transactions, and schemas
- Major database types including relational, document, key-value, column-family, and graph databases
- Factors to consider when choosing a database for your application
- How databases integrate with web applications
- Current trends and future directions in database technology
Understanding these concepts provides a foundation for making informed decisions about data storage and management in your applications. In the next lecture, we'll dive deeper into relational database design principles, including normalization and entity-relationship modeling.
Further Reading
- MySQL Technical Whitepapers
- NoSQL Explained by MongoDB
- Graph Database Fundamentals
- What is Distributed SQL?
- Oracle: What is a Database?
- "Designing Data-Intensive Applications" by Martin Kleppmann (book)
- "Seven Databases in Seven Weeks" by Luc Perkins, Eric Redmond, and Jim Wilson (book)