You run an e-commerce store with orders, customers, and products in separate collections. Every day you need the total spent per customer, the best-selling category, or products never ordered. In MongoDB, the answer isn't a simple query – it's an aggregation pipeline.
We, at Meteora Web, have used the MongoDB pipeline for years to analyze inventory, generate revenue reports, and build real-time dashboards. With a background in accounting, we know that aggregated data without context is worthless. That's why we want to show you how to use $group, $match, $lookup and $project to extract real insights, without jumping between collections.
This guide is for developers who already know MongoDB basics and want to master the pipeline. No abstract theory – just working code and the reasoning behind every stage.
How does $match work to filter documents in the MongoDB aggregation pipeline?
$match is the entry filter. Place it as early as possible in the pipeline: fewer documents passed to later stages means faster execution. It's identical to a find query, but inside the pipeline.
Practical example: orders from a specific month
Assume an orders collection with fields date, total, customer_id. We only want orders from January 2026.
Sponsored Protocol
db.orders.aggregate([
{ $match: { date: { $gte: ISODate("2026-01-01"), $lt: ISODate("2026-02-01") } } }
])Note: $match leverages indexes. If date is indexed, the query is blazing fast. We often see pipelines that run slowly because $match comes after $group. Reverse the order and execution time halves.
Action step: Check your existing pipelines – is the first stage a $match? If not, move it to the beginning.
How powerful is $group for aggregating and calculating metrics in the MongoDB pipeline?
$group groups documents by a key and applies accumulators like $sum, $avg, $max, $push. It's the heart of statistics.
Example: total sales per customer
db.orders.aggregate([
{ $match: { date: { $gte: ISODate("2026-01-01"), $lt: ISODate("2026-02-01") } } },
{ $group: { _id: "$customer_id", total_spent: { $sum: "$total" }, order_count: { $sum: 1 } } }
])The result: for each customer, total spent and order count. Note: $group cannot use indexes directly, but if you've filtered with $match first, it works on a reduced set.
Sponsored Protocol
Key accumulators:
$sum– numeric sum$avg– average$min/$max– extreme values$push– creates an array of all grouped values$addToSet– array of unique values
We use $push for inventory reports: group by supplier and push product codes into an array.
How do you use $lookup for joining collections in a MongoDB pipeline?
$lookup is the equivalent of a SQL LEFT JOIN. It merges documents from another collection based on a common field. Beware: it's not free. Each $lookup adds latency, especially on large collections.
Example: enrich orders with customer data
Collection customers with _id and name. We want each order to include the customer name.
db.orders.aggregate([
{ $match: { date: { $gte: ISODate("2026-01-01"), $lt: ISODate("2026-02-01") } } },
{ $lookup: {
from: "customers",
localField: "customer_id",
foreignField: "_id",
as: "customer"
} },
{ $unwind: "$customer" } // if you want a single document, not an array
])Optimization: after $lookup, use $unwind only if necessary. Otherwise, keep the array and use $project to extract the first element. Also, index _id on customers (already indexed by default) and customer_id on orders.
Sponsored Protocol
A common mistake: performing $lookup on unindexed collections. We see it every day – the system times out. Add a compound index if needed.
How to optimize pipelines with $project in the MongoDB aggregation pipeline?
$project shapes the output document: selects, renames, computes new fields. It serves two purposes: reduce data volume passed downstream and create readable structures for the application.
Example: clean up results after a $lookup
After the join, we only want customer.name, total, date. No internal fields.
db.orders.aggregate([
// ... previous stages ...
{ $project: {
_id: 0,
customer: "$customer.name",
total: 1,
date: 1
} }
])You can also use $addFields (or $set) to add fields without removing existing ones. We prefer $project for a clean final output, especially when sending data to a frontend.
Watch out for computed fields: Don't use $project for heavy expressions. Compute them first with $addFields, then project.
Sponsored Protocol
What to do next with the MongoDB aggregation pipeline
Here are three concrete actions to take today:
- Analyze an existing pipeline – Open MongoDB Atlas profiler or use
explain()to see if$matchis the first stage and if it uses indexes. - Create a grouping report – Take a real collection (e.g., orders, logs) and write a pipeline with
$match+$group. Measure execution time. - Integrate a $lookup – Join two collections you use often and verify performance with
explain(). Add an index if needed.
Remember: the aggregation pipeline is powerful, but every stage costs. We, at Meteora Web, have optimized hundreds of pipelines for e-commerce clients, cutting times from minutes to seconds. A site is measured in revenue, not compliments. A slow pipeline loses customers and sales.
For a deeper dive into the entire NoSQL landscape, read our Pillar Guide on MongoDB and NoSQL Databases. And if you want to compare with Redis for messaging, check out Redis Pub/Sub.
Official reference: MongoDB Aggregation Pipeline Documentation.