MongoDB Document Structure

Understanding NoSQL Document Databases and MongoDB Architecture

Introduction to MongoDB

MongoDB is a popular, open-source NoSQL database that stores data in flexible, JSON-like documents. Instead of using tables and rows as in traditional relational databases, MongoDB uses collections and documents, allowing for a more natural representation of data in many modern applications.

Analogy: MongoDB vs. Traditional Databases

Think of the difference between MongoDB and traditional relational databases like the difference between a filing cabinet and a spreadsheet:

  • Relational Database (Spreadsheet):
    • Data is organized in rigid tables with predefined columns
    • Each row must conform to the same structure
    • Adding a new column affects the entire table
    • Relationships between tables are defined explicitly
  • MongoDB (Filing Cabinet):
    • Data is stored in folders (collections) containing documents
    • Each document can have its own unique structure
    • You can add new fields to some documents without affecting others
    • Related information can be embedded directly within documents

Both approaches have their strengths, but MongoDB's flexibility makes it particularly well-suited for:

  • Rapidly evolving data structures
  • Applications with complex, hierarchical data
  • Development workflows with frequent schema changes
  • Large-scale, distributed systems where horizontal scaling is important

Key Features of MongoDB

graph TD M[MongoDB Features] --- F[Flexible Schema] M --- QA[Query Abilities] M --- S[Scalability] M --- P[Performance] M --- A[Aggregation] F --- FNF[No Fixed Schema] F --- FD[Dynamic Fields] F --- FED[Embedded Documents] QA --- QR[Rich Queries] QA --- QI[Indexing] QA --- QGS[Geospatial] S --- SHS[Horizontal Scaling] S --- SSS[Sharding] S --- SRR[Replica Sets] P --- PI[In-Memory Storage] P --- PG[GridFS] P --- PC[Caching] A --- AP[Aggregation Pipeline] A --- AM[Map-Reduce] A --- AT[Transactions] style M fill:#4DB33D,stroke:#333,stroke-width:2px,color:#fff

MongoDB vs. Relational Databases

Concept Relational (SQL) Database MongoDB (NoSQL)
Data Structure Tables with rows and columns Collections with documents
Schema Fixed, predefined Dynamic, flexible
Relationships Through foreign keys and joins Through embedded documents or references
Query Language SQL (Structured Query Language) JSON-based query language
Scaling Vertical scaling (larger servers) Horizontal scaling (more servers)
Transactions ACID transactions by default Multi-document ACID transactions (since v4.0)
Join Operations Native support for complex joins $lookup aggregation stage (less efficient)
Data Integrity Enforced through constraints Application-enforced (mostly)
Best For Complex relationships, transactions Rapid development, flexible schema, scaling

When to Choose MongoDB

When to Consider Alternatives

MongoDB Document Structure

Understanding Documents and Collections

MongoDB Data Organization Database (e.g., e-commerce) Collection: products Collection: orders { _id: ObjectId("abc123"), name: "Laptop", price: 999.99, in_stock: true } { _id: ObjectId("def456"), name: "Smartphone", price: 699.99, in_stock: true, color: "black" } { _id: ObjectId("ghi789"), customer_id: 1001, date: ISODate("2023-"), items: [ { prod_id: "abc123", qty: 1 }, { prod_id: "def456", qty: 2 } ] } { _id: ObjectId("jkl012"), customer_id: 1002, ...

The basic components of MongoDB's structure are:

Document Format: BSON

MongoDB stores data in BSON (Binary JSON) format, which extends the JSON model to provide additional data types and efficiency. BSON documents can contain a variety of data types:

Data Type Description Example in JavaScript
String UTF-8 character string "Hello, MongoDB!"
Integer 32-bit or 64-bit integer 42
Double 64-bit floating point 3.14159
Boolean true or false true
Array Ordered list of values ["red", "green", "blue"]
Object Embedded document { name: "John", age: 30 }
null Null value null
Date DateTime value new Date()
ObjectId Unique identifier ObjectId("507f1f77bcf86cd799439011")
Binary Data Binary data Buffer.from("binary")
Regular Expression JavaScript RegExp /pattern/i
Timestamp Internal timestamp Timestamp(1412180887, 1)

Document Structure: Key Concepts

Example MongoDB Document:


{
  "_id": ObjectId("5f8a76e910bd12b4e4c9a0f1"),
  "username": "johndoe",
  "email": "john@example.com",
  "profile": {
    "firstName": "John",
    "lastName": "Doe",
    "birthDate": ISODate("1990-07-15T00:00:00Z"),
    "address": {
      "street": "123 Main St",
      "city": "New York",
      "state": "NY",
      "zipCode": "10001"
    }
  },
  "interests": ["programming", "hiking", "photography"],
  "accountCreated": ISODate("2020-10-17T09:34:33.123Z"),
  "isActive": true,
  "loginCount": 42,
  "lastLogin": ISODate("2023-05-20T14:25:16.789Z")
}
          

Key concepts of MongoDB document structure:

Data Modeling in MongoDB

Embedded Documents vs. References

One of the most important decisions in MongoDB schema design is whether to embed related data or use references:

graph TD DM[Data Modeling Approaches] --- E[Embedding] DM --- R[Referencing] E --- EP[Pros] E --- EC[Cons] R --- RP[Pros] R --- RC[Cons] EP --- EP1[Better read performance] EP --- EP2[Retrieves related data in a single query] EP --- EP3[Atomic operations on a single document] EC --- EC1[Document size limit] EC --- EC2[Duplicate data] EC --- EC3[Complex updates] RP --- RP1[No duplication] RP --- RP2[Smaller documents] RP --- RP3[Good for many-to-many relationships] RC --- RC1[Requires multiple queries] RC --- RC2[Joins in application code] RC --- RC3[No atomic updates across documents] style DM fill:#4DB33D,stroke:#333,stroke-width:2px,color:#fff

Embedded Document Approach:


// User document with embedded addresses
{
  "_id": ObjectId("..."),
  "username": "johndoe",
  "email": "john@example.com",
  "addresses": [
    {
      "type": "home",
      "street": "123 Main St",
      "city": "New York",
      "state": "NY",
      "zipCode": "10001"
    },
    {
      "type": "work",
      "street": "456 Business Ave",
      "city": "New York",
      "state": "NY",
      "zipCode": "10002"
    }
  ]
}
          

Reference Approach:


// User document with references to addresses
{
  "_id": ObjectId("5f8a76e910bd12b4e4c9a0f1"),
  "username": "johndoe",
  "email": "john@example.com",
  "addressIds": [
    ObjectId("5f8a77a110bd12b4e4c9a0f2"),
    ObjectId("5f8a77a110bd12b4e4c9a0f3")
  ]
}

// Address documents in a separate collection
{
  "_id": ObjectId("5f8a77a110bd12b4e4c9a0f2"),
  "userId": ObjectId("5f8a76e910bd12b4e4c9a0f1"),
  "type": "home",
  "street": "123 Main St",
  "city": "New York",
  "state": "NY",
  "zipCode": "10001"
}

{
  "_id": ObjectId("5f8a77a110bd12b4e4c9a0f3"),
  "userId": ObjectId("5f8a76e910bd12b4e4c9a0f1"),
  "type": "work",
  "street": "456 Business Ave",
  "city": "New York",
  "state": "NY",
  "zipCode": "10002"
}
          

When to Embed vs. When to Reference

Use Embedding When Use References When
Entities have a "contains" relationship Entities have a "refers to" relationship
One entity always appears with another Entities can be queried independently
Entities have a one-to-few relationship Entities have a one-to-many or many-to-many relationship
Embedded data doesn't grow without bound Related data set can grow very large
Fast reads are a priority Data consistency is more important than read performance
Atomic updates are required Related data is accessed infrequently

Common Data Modeling Patterns

One-to-One Relationship

Typically implemented with embedding:


{
  "_id": ObjectId("..."),
  "user": "johndoe",
  "profile": {
    "firstName": "John",
    "lastName": "Doe",
    "bio": "Software developer"
  }
}
            

One-to-Few Relationship

Best implemented with embedding:


{
  "_id": ObjectId("..."),
  "name": "ACME Inc.",
  "contacts": [
    { "name": "John Doe", "position": "CEO", "email": "john@acme.com" },
    { "name": "Jane Smith", "position": "CFO", "email": "jane@acme.com" }
  ]
}
            

One-to-Many Relationship

Can be implemented with either approach depending on size:

Parent referencing children:


// Author document
{
  "_id": ObjectId("author123"),
  "name": "Stephen King",
  "bookIds": [
    ObjectId("book1"),
    ObjectId("book2"),
    ObjectId("book3")
  ]
}
            

Or child referencing parent (more common):


// Book documents
{
  "_id": ObjectId("book1"),
  "title": "The Shining",
  "authorId": ObjectId("author123")
}
            

Many-to-Many Relationship

Typically implemented with references and sometimes a separate collection:


// Student
{
  "_id": ObjectId("student1"),
  "name": "Alice",
  "courseIds": [
    ObjectId("course1"),
    ObjectId("course2")
  ]
}

// Course
{
  "_id": ObjectId("course1"),
  "name": "Database Design",
  "studentIds": [
    ObjectId("student1"),
    ObjectId("student2")
  ]
}
            

MongoDB ObjectIDs

The _id field is a special field that serves as a primary key for MongoDB documents. By default, MongoDB generates a unique ObjectId for this field.

MongoDB ObjectId Structure (12 bytes) Timestamp (4 bytes) Seconds since Unix epoch Machine ID (3 bytes) Unique to the machine PID (2 bytes) Process ID Counter (3 bytes) Incremental 507f191e810c19729de860ea

Key facts about ObjectIds:

Working with ObjectIds:


// Creating a new ObjectId
const { ObjectId } = require('mongodb');
const newId = new ObjectId();

console.log(newId.toString());  // e.g., "507f191e810c19729de860ea"

// Getting creation timestamp from ObjectId
const timestamp = newId.getTimestamp();
console.log(timestamp);  // e.g., 2023-05-21T12:34:56.000Z

// Creating an ObjectId from a string
const existingId = new ObjectId("507f191e810c19729de860ea");

// Creating an ObjectId for a specific time
const specificTime = new Date("2023-01-01");
const timeBasedId = new ObjectId(Math.floor(specificTime / 1000).toString(16) + "0000000000000000");
          

Setting Up MongoDB

Installation Options

Local Installation

For Windows:

  1. Download the MongoDB installer from the official MongoDB website
  2. Run the installer and follow the setup wizard
  3. MongoDB is installed as a service by default
  4. Data is stored in C:\Program Files\MongoDB\Server\[version]\data by default

For macOS (using Homebrew):


# Install MongoDB
brew tap mongodb/brew
brew install mongodb-community

# Start MongoDB service
brew services start mongodb-community
            

For Linux (Ubuntu):


# Import MongoDB public key
wget -qO - https://www.mongodb.org/static/pgp/server-5.0.asc | sudo apt-key add -

# Create list file for MongoDB
echo "deb [ arch=amd64,arm64 ] https://repo.mongodb.org/apt/ubuntu focal/mongodb-org/5.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-5.0.list

# Reload local package database
sudo apt-get update

# Install MongoDB packages
sudo apt-get install -y mongodb-org

# Start MongoDB service
sudo systemctl start mongod

# Enable MongoDB to start on boot
sudo systemctl enable mongod
            

MongoDB Atlas

MongoDB Atlas is a cloud-hosted MongoDB service:

  1. Go to MongoDB Atlas
  2. Create a free account and sign in
  3. Create a new cluster (the free tier is sufficient for learning)
  4. Configure database access (username and password)
  5. Configure network access (IP address whitelist)
  6. Get your connection string from the "Connect" button

Sample connection string format:

mongodb+srv://username:password@cluster0.mongodb.net/myDatabase?retryWrites=true&w=majority

Docker Container

Run MongoDB in a Docker container:


# Pull the MongoDB image
docker pull mongo

# Run MongoDB container
docker run -d -p 27017:27017 --name mongodb mongo:latest

# To run with a persistent data volume
docker run -d -p 27017:27017 --name mongodb \
  -v mongodb_data:/data/db mongo:latest

# Connect to MongoDB shell in the container
docker exec -it mongodb mongosh
            

MongoDB Compass

MongoDB Compass is an interactive tool for querying, exploring, and visualizing your MongoDB data:

  1. Download MongoDB Compass from the official website
  2. Install and launch the application
  3. Connect to your MongoDB instance:
    • For local MongoDB: mongodb://localhost:27017
    • For MongoDB Atlas: Use the connection string provided by Atlas

MongoDB Connection Options

Common connection string parameters:

  • retryWrites=true: Automatically retry write operations if they fail
  • w=majority: Require acknowledgment from a majority of replica set members
  • readPreference=primary: Read only from the primary node
  • authSource=admin: Database to use for authentication
  • ssl=true: Use SSL/TLS connection
  • connectTimeoutMS=30000: Connection timeout in milliseconds

MongoDB Shell Basics

MongoDB provides a JavaScript shell interface called mongosh (MongoDB Shell) for interacting with the database.

Connecting to MongoDB:


# Connect to local MongoDB
mongosh

# Connect to a specific database
mongosh myDatabase

# Connect with authentication
mongosh --username myUsername --password myPassword --authenticationDatabase admin

# Connect to MongoDB Atlas
mongosh "mongodb+srv://username:password@cluster0.mongodb.net/myDatabase"
          

Basic MongoDB Shell Commands:


// Show all databases
show dbs

// Switch to a database (creates it if it doesn't exist)
use myDatabase

// Show collections in the current database
show collections

// Create a collection
db.createCollection("users")

// Insert a document
db.users.insertOne({
  username: "johndoe",
  email: "john@example.com",
  age: 30,
  active: true
})

// Find documents
db.users.find()                  // All documents
db.users.find({age: 30})         // With a filter
db.users.findOne({username: "johndoe"})  // Single document

// Update a document
db.users.updateOne(
  { username: "johndoe" },
  { $set: { email: "john.doe@example.com" } }
)

// Delete a document
db.users.deleteOne({ username: "johndoe" })

// Count documents
db.users.countDocuments()

// Drop a collection
db.users.drop()

// Drop a database
db.dropDatabase()
          

Query Formatting and Navigation:


// Format results for better readability
db.users.find().pretty()

// Limit results
db.users.find().limit(5)

// Skip results (for pagination)
db.users.find().skip(5).limit(5)

// Sort results (1 for ascending, -1 for descending)
db.users.find().sort({ age: 1 })
db.users.find().sort({ lastLogin: -1 })

// Combine these operations
db.users.find({ active: true })
  .sort({ lastLogin: -1 })
  .skip(10)
  .limit(10)
  .pretty()
          

Practical Activities

Activity 1: Setting Up MongoDB

  1. Install MongoDB locally or create a MongoDB Atlas account
  2. Install MongoDB Compass for visual exploration
  3. Connect to your MongoDB instance using the MongoDB Shell
  4. Create a new database for your web development projects
  5. Create several test collections

Activity 2: Document Structure Practice

Design document structures for the following scenarios:

  1. A blog platform with users, posts, and comments
  2. An e-commerce store with products, categories, and orders
  3. A social media application with users, posts, and friend relationships
  4. A library management system with books, authors, and borrowing records

For each scenario:

Activity 3: Basic MongoDB Operations

Using the MongoDB Shell:

  1. Create a collection called "contacts"
  2. Insert at least 5 contact documents with various fields (name, email, phone, address, etc.)
  3. Query contacts based on different criteria (exact match, range, etc.)
  4. Update contact information
  5. Delete a contact
  6. Practice with sorting, limiting, and skipping results

Key Takeaways

Next Steps

In our next lecture, we'll explore CRUD operations in MongoDB in more detail and learn how to perform these operations using Node.js with the MongoDB native driver.