Cosmos DB for Identity State
Cosmos DB is a common fit when an Entra-adjacent system needs durable application state that does not belong inside Entra itself. That usually means workflow checkpoints, reconciliation records, onboarding progress, retry metadata, lease ownership, or other coordination data that has to survive across handlers and workers.
The value is not just that Cosmos DB stores JSON. The important part is that it gives you predictable low-latency reads and writes at scale, explicit partitioning, tunable consistency, and a change feed that can drive downstream processing.
The Core Building Blocks
- Account - the top-level Cosmos DB resource that defines the service instance, API model, regions, and overall capacity model.
- Database - a logical grouping for containers, often used to separate systems or major application areas.
- Container - the main unit where documents live. Container design matters far more than database count in day-to-day architecture.
- Item - a JSON document stored in a container.
- Partition key - the property that determines how items are distributed and which items can be read cheaply together.
- RU/s - request units per second, the throughput model that prices and limits reads, writes, queries, and indexing work.
- Consistency - the read model that balances freshness against latency and cross-region behavior.
- Change feed - an ordered stream of item changes within a container, often used to trigger downstream handlers.
Why Identity Workloads Need Separate State
Entra and Graph tell you the identity truth that Microsoft owns. They do not model your application’s surrounding workflow state. Builders usually need somewhere else to keep things like:
- last successful Graph checkpoint,
- pending onboarding stages,
- reconciliation mismatches for review,
- job ownership and retry counts,
- target-system correlation identifiers,
- durable progress across multiple handlers.
Cosmos DB is attractive here because the data is usually document-shaped, accessed frequently by identifiers, and shared across functions or workers.
Partition Key Design Matters Early
Partition-key choice is the main design decision that can either keep the system cheap and smooth or make it noisy and expensive later.
Good partition keys usually group the items that are read and written together, such as:
- tenant or customer identifier,
- workflow instance identifier,
- target system identifier,
- job family or connector scope.
Weak partition keys usually create one of two failures:
- Hot partitions because too much traffic lands on one key.
- Expensive fan-out queries because related data is spread across too many keys.
For identity workloads, a partition key that matches the unit of operational ownership is usually better than a key chosen only for theoretical distribution.
RU/s And Cost Pressure
RU/s is the capacity budget for the container. Every point read, query, write, upsert, and indexed property consumes it.
That matters for Entra-adjacent systems because they often look light at first, then quietly become chatty:
- periodic Graph polling writes checkpoints,
- reconciliation jobs upsert status rows,
- worker fleets compete for ownership records,
- dashboards issue repeated queries,
- change-feed processors create secondary writes.
If the data model forces lots of cross-partition queries or large document rewrites, costs rise quickly. Keep documents focused, access patterns deliberate, and indexing aligned with the queries you actually run.
Consistency Trade-Offs
Cosmos DB offers multiple consistency levels. The right choice depends on whether the system cares more about immediate freshness or global latency behavior.
For many identity-oriented workflows:
- Session consistency is a practical default because a writer can read its own updates without forcing the strongest global behavior.
- Stronger consistency may be justified for narrowly scoped coordination cases, but it increases constraints and can reduce flexibility.
- Weaker consistency can be acceptable for analytics-adjacent or eventually reconciled views where slight staleness is not operationally dangerous.
The important point is to choose consistency based on the workflow contract, not by habit.
Change Feed As A Coordination Tool
The change feed lets downstream processors react whenever items change in a container. That makes it useful for patterns like:
- a Graph polling job writes a checkpointed result,
- a change-feed processor notices the new record,
- downstream handlers emit commands, create artifacts, or update analytics views.
That pattern can work well, but it is not a free replacement for a broker. If you need explicit dead-lettering, delivery coordination, or workflow commands, Service Bus is usually the clearer fit. Use the change feed when reacting to state changes is the natural model.
Practical Framing
Use Cosmos DB when the hard problem is durable application state around Entra-backed workflows. Avoid turning it into a generic dumping ground for logs, binary artifacts, or queue semantics it was not meant to replace.