The aggregation pipeline: $match, $group, $project

medium

Learn with your AI

Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.

Open in Claude Open in ChatGPT

Why this matters

The aggregation pipeline is MongoDB's answer to SQL GROUP BY, WHERE, and SELECT — but composable as a sequence of stages. Each stage transforms the stream of documents:

match filters,

group buckets and computes accumulators, $project reshapes. Running aggregations server-side instead of pulling documents into application code is often a 100× throughput improvement on analytical queries, because you move computation to where the data lives rather than shuffling gigabytes across the network.

Demo

Count orders by status and compute total revenue per status using

match,

group, and $project.

package main

import (
	"context"; "fmt"
	"go.mongodb.org/mongo-driver/bson"
	"go.mongodb.org/mongo-driver/mongo"
	"go.mongodb.org/mongo-driver/mongo/options"
)

func main() {
	client, _ := mongo.Connect(context.TODO(), options.Client().ApplyURI("mongodb://localhost:27017"))
	col := client.Database("demo").Collection("orders")

	pipeline := bson.A{
		bson.D{{"$match", bson.D{{"amount", bson.D{{"$gt", 0}}}}}},
		bson.D{{"$group", bson.D{
			{"_id", "$status"},
			{"count", bson.D{{"$sum", 1}}},
			{"totalRevenue", bson.D{{"$sum", "$amount"}}},
		}}},
		bson.D{{"$project", bson.D{
			{"status", "$_id"},
			{"count", 1},
			{"totalRevenue", bson.D{{"$round", bson.A{"$totalRevenue", 2}}}},
			{"_id", 0},
		}}},
		bson.D{{"$sort", bson.D{{"totalRevenue", -1}}}},
	}

	cursor, _ := col.Aggregate(context.TODO(), pipeline)
	var results []bson.M
	cursor.All(context.TODO(), &results)
	for _, r := range results { fmt.Println(r) }
}

Run: go run main.go

Try it yourself

Add a $match stage before $group to filter only orders from the last 30 days using a createdAt date field. Verify the revenue totals change by inserting a few old-dated orders.

Add an $addFields stage after $group that computes avgOrderValue as totalRevenue / count. Confirm the arithmetic is correct against manually computed values.

Run the pipeline with and without the leading $match stage on a collection of 10,000 documents. Use explain('executionStats') to compare nReturned and totalDocsExamined at the $group stage.

Prompt your AI

Use these three in order. Each builds on the one before.

1. Basics & terminology

Explain the MongoDB aggregation pipeline concept. What is a 'stage', how do stages connect, and what are the three most commonly used stages?

2. Why it works (the mechanism)

When I put $match before $group vs after $group, how does MongoDB's query planner treat these differently? Which is almost always faster and why?

3. Advanced — application & what's next

I need to compute a 7-day rolling average of daily revenue. Design an aggregation pipeline that produces one document per day with the rolling average over the prior 7 days.