Azure Cosmos DB

Cosmos DB is Microsoft’s globally distributed, multi-model NoSQL database. It stores JSON documents (and other data models) with single-digit-millisecond reads and writes at any scale, in any Azure region. You get tunable consistency, automatic indexing, and a partition-based architecture that scales horizontally without redesigning your data layer.

The short version: if your application needs fast, flexible document storage that works globally, Cosmos DB is the managed option that removes most of the operational burden.

Resource Hierarchy

Cosmos DB organizes resources in a clear hierarchy. Understanding this structure matters because cost, throughput, and data distribution decisions happen at different levels.

graph TD
    A[Cosmos DB Account] --> B[Database 1]
    A --> C[Database 2]
    B --> D[Container A]
    B --> E[Container B]
    D --> F["Items (JSON documents)"]
    D --> G["Partition Key (e.g. /tenantId)"]
    E --> H["Items (JSON documents)"]
    E --> I["Partition Key (e.g. /userId)"]

    style A fill:#2d5aa0,color:#fff
    style B fill:#3a7bc8,color:#fff
    style C fill:#3a7bc8,color:#fff
    style D fill:#4a9e5c,color:#fff
    style E fill:#4a9e5c,color:#fff

Account - top-level resource. Defines the API model (NoSQL, MongoDB, etc.), regions, networking, and backup policy. One account can span multiple Azure regions.
Database - logical grouping for containers. Throughput can be provisioned at the database level (shared across containers) or at individual containers.
Container - where your data lives. This is the primary unit for throughput, indexing policy, and partition strategy. Container design is the most consequential architectural decision.
Item - a single JSON document (or row/node/entity depending on the API). Each item belongs to exactly one logical partition within its container.

Partition Keys

The partition key is the single most important design choice in Cosmos DB. It determines how your data is distributed across physical partitions, which directly affects performance, cost, and query efficiency.

What it is: A property path in your documents (like /userId or /tenantId) that Cosmos DB uses to place items into logical partitions. Items with the same partition key value live together and can be read in a single, efficient operation.

Why it matters:

All queries scoped to a single partition key are fast point reads.
Queries that span multiple partition keys become fan-out queries, which cost more RU/s and take longer.
Write-heavy workloads need partition keys that distribute traffic evenly to avoid hot partitions.

How to choose:

Good partition keys	Why they work
`userId`	High cardinality, natural access pattern for user-centric apps
`tenantId`	Groups all tenant data together, works well for multi-tenant SaaS
`orderId`	Each order is independent, distributes writes evenly
`deviceId`	IoT scenarios where each device writes its own telemetry
`sessionId`	Session state accessed as a unit

Common mistakes:

Choosing a low-cardinality key (like status or country) creates hot partitions.
Choosing a key that does not align with your read patterns forces expensive cross-partition queries.
Using a timestamp as partition key concentrates all recent writes on one partition.

When no single property works well, Cosmos DB supports hierarchical partition keys (e.g., /tenantId + /userId) to get both grouping and distribution.

Request Units (RU/s)

Cosmos DB measures throughput in Request Units per second (RU/s). Every operation (read, write, query, index update) costs a specific number of RUs. Think of RU/s as a currency: you provision a budget, and your operations spend against it.

A single point read of a 1 KB item costs 1 RU. Everything else is relative to that baseline.

Operation	Approximate cost
Point read (1 KB)	1 RU
Write (1 KB)	~5-6 RU
Query (returns 1 item, in-partition)	~3 RU
Cross-partition query	Varies, significantly more

Two capacity models:

Provisioned throughput - you set a fixed RU/s budget (can autoscale between a min and max). Good for predictable workloads. You pay for provisioned capacity whether you use it or not.
Serverless - pay per RU consumed, no provisioning. Good for dev/test, bursty workloads, or low-traffic applications. Has a per-container cap of 5,000 RU/s.

Consistency Levels

Cosmos DB offers five consistency levels, from strongest to weakest. This is a spectrum, not a binary choice. Each level trades off read freshness against latency and availability.

Level	Guarantee	Best for
Strong	Reads always return the most recent committed write. Linearizable.	Financial transactions, inventory systems where correctness is critical
Bounded Staleness	Reads lag behind writes by at most K versions or T seconds	Scenarios needing strong-ish consistency with multi-region writes
Session	Within a session, reads see that session’s own writes (read-your-writes)	Most applications. The practical default.
Consistent Prefix	Reads never see out-of-order writes, but may lag	Scenarios where ordering matters more than immediacy
Eventual	No ordering or freshness guarantee. Lowest latency.	High-throughput reads where slight staleness is acceptable

Session consistency is the default and the right choice for most applications. It gives you read-your-writes within a session while keeping latency low and allowing multi-region distribution.

Change Feed

The change feed is an ordered, persistent stream of changes to items in a container. Every create and update is captured in the order it occurred within each logical partition.

What you can do with it:

Trigger downstream processing when documents change (event-driven patterns).
Materialize views or projections in other containers or services.
Build real-time notification pipelines.
Replicate data to external systems.
Feed event sourcing architectures.

The change feed integrates directly with Azure Functions (via Cosmos DB trigger), or you can consume it with the change feed processor library for more control over checkpointing and error handling.

Note: The change feed currently captures creates and updates. Deletes require a soft-delete pattern (set a TTL or flag) to appear in the feed.

Multi-Model APIs

Cosmos DB supports multiple wire protocols through different API options. You choose the API when creating an account.

API	Wire protocol	Use when
NoSQL	Cosmos DB native (SQL-like query syntax)	New projects, full feature access, best integration with Azure
MongoDB	MongoDB wire protocol	Migrating from MongoDB, using MongoDB drivers and tools
Cassandra	CQL (Cassandra Query Language)	Migrating from Apache Cassandra
Gremlin	Apache TinkerPop	Graph traversal workloads
Table	Azure Table Storage protocol	Simple key-value, migrating from Table Storage

The NoSQL API gives you the richest feature set and the most direct access to Cosmos DB capabilities. The compatibility APIs are primarily useful for lift-and-shift migrations.

Vector Search

Cosmos DB supports vector indexing and search natively within the NoSQL API. This is useful for AI and RAG (Retrieval-Augmented Generation) scenarios where you want to store embeddings alongside your application data rather than maintaining a separate vector database.

You can define vector policies on containers, index embedding fields, and run vector similarity searches (cosine, dot product, Euclidean) directly in your queries. This keeps your vectors co-located with the documents they describe, simplifying architecture for AI-enhanced applications.

Common Use Cases

Real-time web and mobile apps - user profiles, session state, content feeds with low-latency global reads.
IoT and telemetry - high-throughput device data ingestion with partition-per-device patterns.
E-commerce - product catalogs, shopping carts, order tracking with flexible schemas.
Gaming - player state, leaderboards, session data across regions.
Content management - flexible document schemas that evolve without migrations.
AI/RAG applications - storing documents alongside vector embeddings for similarity search.
Multi-tenant SaaS - tenant-isolated data with partition-based separation.

In Entra-Adjacent Systems

Cosmos DB shows up in systems built around Microsoft Entra when those systems need durable application state that does not belong inside Entra or Microsoft Graph. Common patterns include:

Storing workflow checkpoints, reconciliation records, and onboarding progress for identity automation.
Tracking sync state between Graph and downstream target systems.
Coordinating job ownership and retry metadata across distributed workers.
Using the change feed to trigger downstream processing when identity-related state changes.

In these cases, Cosmos DB acts as the application state layer while Entra and Graph remain the identity source of truth.