State and Artifacts
Scenario Setup
Consider a tenant-onboarding workflow for a new enterprise customer. The system needs to track which identity setup steps have completed, which downstream targets are pending, and which reports must be delivered to operators after each run.
The core design question is not just “where do I store data?” It is “which data is workflow state and which data is an artifact?”
Why this pattern exists
This pattern exists because builders often blur state and payloads until both become harder to manage.
- Workflow state needs frequent reads, updates, and correlation.
- Artifacts are usually larger, file-shaped, append-only, or shared with humans and downstream tools.
Separating those roles keeps the workflow store queryable and keeps the artifact store cheap and explicit.
Main lesson: the storage boundary
For tenant onboarding or access-review export flows, use the stores differently:
- Put workflow state in Cosmos DB when the system needs to read and update progress by tenant, job, or workflow instance.
- Put artifacts in Azure Storage when the output is a CSV, JSON export, attachment bundle, audit snapshot, or any larger file-like payload.
Read this boundary together with:
Example split
In a tenant onboarding scenario, Cosmos DB should hold things like:
- tenant ID and onboarding case ID,
- current step and last successful checkpoint,
- retry count and failure summary,
- downstream correlation IDs,
- ownership and review metadata.
Azure Storage should hold things like:
- the generated onboarding manifest,
- exported entitlement reports,
- raw API snapshots needed for auditing,
- operator-facing files that are downloaded or shared outside the workflow runtime.
That division keeps the application state compact and operationally meaningful while letting artifacts grow independently.
How the pattern fits the wider stack
This storage split usually sits underneath another compute or messaging pattern:
- Azure Functions for Identity Workloads may create the records and artifacts.
- Service Bus for Workflows may coordinate the next step.
- Microsoft Graph Control Plane remains the source for the identity objects being processed.
If the flow is driven by provisioning or sync systems, link to the specialized Entra topics rather than duplicating their semantics:
Trade-offs
This pattern is simple, but it does require discipline.
- Putting exports into Cosmos DB makes queries, RU usage, and document shape worse.
- Putting workflow state only inside blobs makes coordination, retries, and reporting harder.
- Using one storage service for both jobs usually hides the real access pattern until scale or failure pressure exposes it.
The storage boundary is one of the main lessons because it is easy to get wrong early and expensive to unwind later.
When not to use it
Do not use this split mechanically when the workload is tiny or the records are truly lightweight.
It is also the wrong fit when:
- a simple table or queue is enough,
- there are no meaningful artifacts beyond the current state record,
- the workload is primarily analytics, not workflow execution.
The point is not to add more services by default. The point is to separate state from artifacts once the workflow shape justifies it.