State and Artifacts

What you will build

A storage architecture that cleanly separates operational workflow state from file-shaped artifacts. This is a foundational pattern that sits underneath most of the other quickstarts in this series.

Scenario

Your workflow needs to track two kinds of data that look similar but behave differently:

Operational state: progress markers, checkpoints, retry counts, correlation IDs, entity status. Data you query and update frequently during processing.
File artifacts: reports, exports, raw snapshots, manifests, staged data bundles. Data you write once and read back later, often by humans or external tools.

Putting both in the same store works until it does not. The access patterns, size profiles, and cost models are different enough that splitting them early saves real pain later.

Examples where this split matters:

Multi-step order fulfillment: track step completion in one store, write shipping labels and invoices to another
Batch processing pipelines: maintain job progress and error state separately from the processed output files
Data export workflows: checkpoint which records have been exported while writing the actual export files elsewhere
Compliance reporting: track report generation status while storing the generated PDF or CSV artifacts

The storage boundary

flowchart TB
    subgraph workflow["Your Workflow"]
        compute[Compute: Functions, Workers, etc.]
    end

    subgraph state["Operational State"]
        cosmos[(Cosmos DB)]
        note1["Query by entity, job, status
        Update frequently
        Small documents
        Cost: request units"]
    end

    subgraph artifacts["File Artifacts"]
        storage[(Azure Storage)]
        note2["Write once, read later
        Any size
        Downloaded by tools/operators
        Cost: storage capacity"]
    end

    compute -->|read/write progress| cosmos
    compute -->|write exports, reports| storage
    compute -->|read checkpoint| cosmos

The core principle: Cosmos DB for state you query and update frequently, Azure Storage for files and larger payloads.

What goes in Cosmos DB

Cosmos DB is a document database optimized for low-latency reads and writes on structured data. Use it for:

Workflow progress. Which step is this job on? When did it last succeed? What is the current status?
Retry metadata. How many times has this item been attempted? What was the last error? When should it be retried?
Correlation IDs. Linking a work item to its upstream trigger, downstream target, and any intermediate steps.
Checkpoints. The last-processed position in a data source, so the next run picks up where the previous one left off.
Entity state. Current status of each entity being processed, queryable by partition key.

These documents are small (typically under 10 KB), updated frequently, and queried by known keys or indexed properties.

What goes in Azure Storage

Azure Storage (Blob Storage specifically) is optimized for storing files of any size at low cost. Use it for:

Exported files. CSVs, JSON exports, PDFs, spreadsheets generated by your workflow.
Reports. Compliance reports, audit snapshots, summary documents.
Raw payloads. API response dumps, data snapshots preserved for debugging or audit.
Staged data. Intermediate files passed between pipeline stages or handed to external systems.
Operator-facing downloads. Anything a human or external tool retrieves by URL or path.

These objects are written once (or appended), rarely updated in place, and accessed by path rather than queried by properties.

Why the split matters

Cost. Cosmos DB charges per request unit (RU). Storing large blobs there burns RUs on writes and inflates storage costs. Azure Storage charges pennies per gigabyte per month. Putting a 50 MB export in Cosmos DB costs orders of magnitude more than putting it in Blob Storage.

Query patterns. Cosmos DB excels at “give me the status of job X” or “find all failed items in partition Y.” Azure Storage excels at “here is the path; download the file.” Forcing blob-shaped data into Cosmos DB means you cannot query it meaningfully. Forcing structured state into opaque blob files means you cannot query it at all without downloading and parsing.

Document size limits. Cosmos DB documents have a 2 MB size limit. Large exports, reports, or raw payloads will not fit. Azure Storage blobs can be up to 190 TB.

Anti-patterns

Embedding large blobs in Cosmos DB. Storing base64-encoded files, large JSON payloads, or binary data in Cosmos DB documents. This wastes RUs, hits size limits, and makes the state store slower and more expensive for its actual job.

Losing structure in opaque Storage files. Putting workflow state in blob files as line-delimited JSON or CSV. Now you cannot query “which jobs are stuck?” without downloading and parsing files. Workflow state belongs in a queryable store.

Single-store simplicity trap. Using one store for everything because “it’s simpler.” It is simpler until the export files slow down your state queries, or your state updates burn through your storage budget, or you hit the document size limit on a large report. The split is cheap to implement early and expensive to retrofit later.

How this fits the wider stack

This storage split sits underneath the other patterns in this series:

Graph-Driven Automation uses Cosmos DB for checkpoints and Storage for export artifacts.
Reliable Worker uses Cosmos DB for processing state and idempotency checks.
Hybrid Worker uses cloud-side state and artifact stores to keep the VM from becoming a data silo.
Event Stream to Data Explorer is the exception: it handles analytics data, not workflow state.

The compute and messaging layers (Functions, Service Bus) orchestrate the work. The storage layer persists the results.

When not to use this split

Do not introduce both stores when the workload does not justify the complexity:

Tiny workloads. If you have a handful of records and no file artifacts, a single Cosmos DB container or even Azure Table Storage is fine.
No real artifacts. If your workflow only tracks state and never produces files, you do not need Azure Storage for this purpose.
Pure analytics. If the data is event streams for analysis rather than workflow execution, the right split is Event Hubs plus Data Explorer, not Cosmos DB plus Storage.

The point is not to add services by default. The point is to separate state from artifacts once the workflow shape justifies it, because doing it later is significantly harder.