iqtoolkit journal

Beyond the API: DocumentDB vs. Aurora PostgreSQL for JSON Workloads

11 min readBy Giovanni Martinez
awsdocumentdbaurorapostgresqlmongodbjsondatabase internalsarchitectureperformancedba

Beyond the API: DocumentDB vs. Aurora PostgreSQL for JSON Workloads

In my last post, we dug into the deep internal trade-offs of running JSON workloads on PostgreSQL vs. MongoDB. We looked at how PostgreSQL's MVCC and TOAST architectures create hidden write amplification, and how MongoDB's WiredTiger engine handles documents differently with a copy-on-write B-Tree design.

But what happens when you decide you do not want to manage those database servers yourself? You migrate to AWS, open the managed services menu, and start comparing Amazon DocumentDB with Amazon Aurora PostgreSQL.

Here is the plot twist most architecture discussions miss: at this point, you are no longer comparing PostgreSQL heap internals directly against MongoDB WiredTiger internals. You are comparing two different database compute layers built on AWS's decoupled, log-centric, multi-AZ distributed storage model.

That shift completely changes the "Postgres vs. Mongo" math for JSON workloads.

The Shared Foundation: The Database as a Log-Centric System

To understand both DocumentDB and Aurora PostgreSQL, you first need to understand Aurora's storage architecture.

In traditional single-instance database deployments, the database engine is responsible for page management and writes data pages plus WAL over network-attached block storage. The compute node and storage node are tightly coupled from a write path perspective.

Aurora changed that model.

In Aurora's architecture, the compute layer primarily emits redo records to a distributed storage subsystem spanning three Availability Zones. The storage service is responsible for durable replication and page materialization across its internal fleet. This substantially reduces the amount of page-oriented write work the compute node must manage directly.

Data is replicated across multiple copies in three AZs, continuously backed up to S3, and storage scales automatically.

The key insight for this comparison: Amazon DocumentDB and Aurora PostgreSQL both use this same architectural pattern of decoupled compute and distributed, log-oriented storage.

Amazon DocumentDB Unmasked: Compatibility Layer, Not MongoDB Internals

DocumentDB is often described informally as "managed MongoDB on AWS." That description is directionally useful for app teams, but technically misleading — and the distinction matters at the architecture level.

Amazon DocumentDB is not MongoDB. It does not run MongoDB's codebase. It does not use MongoDB's WiredTiger storage engine. It does not contain any MongoDB SSPL-licensed code. It is a fully proprietary AWS-built database engine that implements a subset of the MongoDB wire protocol and API surface. The "with MongoDB compatibility" qualifier in the product name is doing a lot of heavy lifting — it means your MongoDB drivers and application code can connect and issue operations, but the engine executing those operations is fundamentally different under the hood.

When your application sends a MongoDB-style operation to DocumentDB, the service accepts the wire protocol request at its compute layer and executes it through DocumentDB's own engine and storage path built on AWS's distributed storage architecture. The query planner is different. The index internals are different. The replication model is different. What you get is API-level compatibility — not behavioral equivalence.

As of early 2026, DocumentDB supports compatibility modes for MongoDB 3.6, 4.0, 5.0, and the recently launched 8.0 — which brings a new query planner (Planner v3), collation, views, and additional aggregation stages. However, even with 8.0, the compatibility story has real gaps that matter at the architecture level.

The Trade-offs of Compatibility by Emulation

The decoupled storage model brings real operational advantages:

  • You do not manage MongoDB replica set internals directly.
  • Storage growth is decoupled from traditional node-local disk constraints.
  • Read scaling is straightforward via replicas that share the underlying distributed storage model.
  • Backups, durability, and failover characteristics inherit AWS-managed behavior.

But compatibility-by-emulation comes with engineering consequences: API compatibility is not the same as internal engine equivalence or full ecosystem parity.

Here is where things get concrete. These are real compatibility gaps that have bitten teams mid-migration:

$lookup with correlated subqueries

DocumentDB (5.0 and earlier) supports equality-based $lookup joins and uncorrelated subqueries, but does not support correlated subqueries — the $lookup variant where you use let and pipeline together with $expr to reference parent fields inside the child pipeline. This is a common MongoDB pattern for filtered joins. Your aggregation will fail at runtime with:

MongoServerError: Aggregation stage not supported:
  '$lookup on multiple join conditions and uncorrelated subquery'

If your application relies on correlated $lookup pipelines for cross-collection queries, this is a migration-blocking gap that forces you to restructure the aggregation into multiple application-level round trips or flatten the data model.

Missing or version-gated aggregation stages

On DocumentDB 5.0, several commonly used aggregation stages are missing entirely: $facet, $unionWith, $graphLookup, $setWindowFields, and $merge. DocumentDB 8.0 adds some of these (notably $merge, $bucket, $facet, and $set/$unset), but $setWindowFields and $graphLookup remain unavailable. If your analytics pipelines use window functions via $setWindowFields — a feature MongoDB introduced in 5.0 — you will not find a DocumentDB equivalent.

Index behavior differences

DocumentDB does not support partial indexes, case-insensitive indexes, or leverage indexes for queries using certain operators. For teams relying on partial indexes to reduce index size on sparse data or conditional queries, this means either bloated indexes or redesigned query patterns.

Ecosystem tooling assumptions

Tools that assume native MongoDB internals — such as mongodump/mongorestore beyond version 100.6.1, or change stream consumers expecting MongoDB-native latency characteristics — may behave differently or fail. Client-side field-level encryption and queryable encryption are not supported.

The bottom line: Run the AWS DocumentDB compatibility tool against your actual workload in staging. Pay special attention to aggregation pipelines with $lookup subqueries and any stage not listed in the supported APIs documentation. Do not assume that "MongoDB-compatible" means "drop-in replacement."


Aurora PostgreSQL: Does It Fix the MVCC / TOAST Problem?

If you read my previous post, you know that PostgreSQL's MVCC and TOAST mechanisms cause heavy write amplification for large, frequently updated JSON documents. So if you migrate to Aurora PostgreSQL — where the compute node sends redo records to the distributed storage layer rather than writing full pages over a network-attached volume — does that fix the Postgres JSON penalty?

The short answer: No. But the bottleneck shifts.

Aurora vastly reduces page-level network I/O, but the Aurora PostgreSQL compute node still runs the standard PostgreSQL engine in memory. That means:

  • MVCC still happens. When you update a jsonb document, PostgreSQL still creates a new tuple version and leaves a dead tuple behind in the heap.
  • TOAST still happens. Documents over 2KB are still compressed and chunked into separate TOAST tables.
  • HOT updates are still blocked on jsonb. Expression indexes on jsonb fields prevent PostgreSQL from applying Heap-Only Tuple optimization, meaning every update — even a single key change — forces a new index entry across all indexes on that table. This is a write amplification multiplier that compounds with the number of indexes.
  • VACUUM is still required. You still have to tune autovacuum aggressively to clean up dead tuples. In Aurora, unbounded bloat also means unbounded storage cost — you pay per GB on the Aurora volume.

The most critical Aurora-specific consideration is I/O cost.

In standard Aurora configurations, you are billed per I/O request at $0.20 per million operations. Because updating a TOASTed jsonb document causes write amplification (MVCC new tuple + TOAST rewrite + index entries), a write-heavy JSON workload can generate I/O charges that aren't obvious until your bill arrives.

The practical tip: If you are running a heavy JSON mutation workload on Aurora PostgreSQL, evaluate Aurora I/O-Optimized. This cluster configuration eliminates per-request I/O charges in exchange for a higher baseline on compute and storage pricing. AWS's guidance is that if I/O spend exceeds 25% of your total Aurora database bill, I/O-Optimized likely saves money — and for workloads dominated by large jsonb updates with multiple indexes, you can easily clear that threshold. Teams have reported 30–40% cost reductions on I/O-intensive workloads after switching.

A secondary option worth evaluating: Aurora Optimized Reads with NVMe-backed instance types (r6gd/r6id). This extends the buffer pool to local SSD, which helps with read-heavy JSON access patterns where TOAST decompression causes repeated storage round-trips.

Querying, Indexing, and the Compute Bottleneck

Because both services use the same underlying distributed storage model, performance differences are won and lost in the compute layer — specifically in the buffer cache and how efficiently each engine executes queries.

Aurora PostgreSQL (jsonb + GIN)

PostgreSQL's query optimizer has full native visibility into your tables and indexes. When you query a jsonb column using a GIN index, PostgreSQL knows exactly how to traverse the decompressed JSON structure. You can join a deeply nested JSON document against multiple normalized relational tables, filter with a CTE, and aggregate with window functions — and the query planner handles all of it natively with decades of optimization work behind it.

You also get partial indexes, expression indexes, and exclusion constraints on JSON data — tools that let you build highly targeted index structures that DocumentDB cannot replicate.

Amazon DocumentDB

DocumentDB is highly efficient for standard document retrieval and basic filtering. For pure document access patterns — key lookups, single-collection queries with indexed predicates — performance is strong and the operational simplicity is real.

However, complex analytics are where the emulation model shows its limits. A multi-stage aggregation pipeline ($match, $unwind, $group, $lookup) requires the DocumentDB compute layer to pull data from the shared storage volume, hold it in memory, and execute pipeline stages iteratively. The $lookup operator in particular lacks the relational query optimizer that PostgreSQL uses to build efficient join execution plans — DocumentDB uses hash, sort-merge, or nested-loop algorithms but without the cost-based planner that selects between them intelligently based on table statistics.

For reporting or ad-hoc analytics that span multiple collections, this becomes a compute bottleneck at scale. And because $setWindowFields is unavailable, any window-function-style analytics require client-side post-processing or a separate analytics layer entirely.

The Decision Framework

Since the underlying storage model is shared, your decision is primarily about your application architecture, team skill set, and where your workload is headed.

Choose Amazon DocumentDB if:

  • You are doing a lift-and-shift of an existing MongoDB application and want managed AWS infrastructure without rewriting your data access layer — but only after you have validated compatibility using AWS's tooling against your actual aggregation pipelines and index usage
  • Your data is heavily siloed into independent documents, cross-collection joins are rare, and your aggregation pipelines use only the supported subset
  • Schema flexibility and rapid iteration on document shape are more valuable than referential integrity — and you have a team experienced with MongoDB operational patterns

Choose Aurora PostgreSQL if:

  • You are building greenfield — the relational model with jsonb gives you the highest ceiling for complexity as requirements evolve, without having to migrate engines later
  • Your data model requires or will eventually require referential integrity, foreign keys, and complex JOIN operations
  • You need robust analytics: full SQL, window functions, GIN indexes, partial indexes, and expression indexes on JSON data
  • You want one engine that handles both structured and semi-structured data without a separate system

The case that often gets decided wrong: A team has a MongoDB application that uses $lookup with correlated subqueries, $facet for multi-faceted search, or $setWindowFields for analytical aggregations. They assume DocumentDB is the natural AWS landing zone because the API looks the same. In practice, they would be better served migrating to Aurora PostgreSQL with a jsonb-first schema, where the features they depend on have native equivalents that are more mature and more performant — SQL JOIN with a cost-based planner instead of $lookup emulation, FILTER clauses and window functions instead of $facet and $setWindowFields.

If your MongoDB usage is already pushing past simple document access into relational territory, the migration to Aurora PostgreSQL is an investment that pays back in query capability and long-term maintainability.

The Verdict

Amazon DocumentDB is a genuine feat of cloud engineering — decoupling a MongoDB-compatible wire protocol from its underlying storage internals enabled legacy application teams to scale on AWS without rearchitecting their data layer. The 8.0 release narrows the gap significantly. For the right workload — high-throughput document access with straightforward query patterns — it earns its place.

But for modern, data-intensive architectures — where data models inevitably become relational as a product matures — Aurora PostgreSQL remains the stronger long-term foundation. Tune your autovacuum, watch your TOAST tables, evaluate Aurora I/O-Optimized for write-heavy JSON workloads, and keep an eye on HOT update limitations if you are indexing jsonb fields aggressively. The operational overhead is real, but the query expressiveness and data integrity guarantees pay it back.

Have you hit unexpected I/O costs from TOAST write amplification on Aurora? Or run into DocumentDB compatibility gaps mid-migration? Drop a comment — I'd love to compare notes.


Coming up next: In the next post, I'll explore TimescaleDB — a PostgreSQL extension that layers a time-series-optimized storage and query engine directly on top of Postgres. For teams already invested in the PostgreSQL ecosystem who are dealing with time-series or event-driven JSON workloads, it occupies a different niche than either Aurora or DocumentDB — and understanding where it fits (and where it doesn't) matters for getting your architecture right.