Reliable Worker with Service Bus

What you will build

A reliable worker pattern where a producer creates work items, Service Bus ensures they are delivered and processed exactly once, and a worker handles them with proper retry and failure semantics. This pattern applies to any workload that needs guaranteed processing with failure handling.

Scenario

You have discrete work items that must be processed reliably. Each item represents a task with consequences: provisioning a resource, processing an order, syncing data to an external system, running a compliance check. If processing fails, the item should be retried. If it keeps failing, it should be set aside for operator inspection rather than lost or silently dropped.

Examples:

Order processing: validate payment, reserve inventory, confirm shipment
Provisioning requests: create accounts, assign permissions, configure services
Async task queues: PDF generation, report compilation, data migration batches
Graph-driven workflows: react to group changes by updating downstream access lists

Why Service Bus

Service Bus is a message broker built for work items, not event streams. The distinction matters:

Message ownership. When a worker receives a message, it holds a lock. No other worker can process that message until the lock expires or the worker releases it.
Settlement. The worker explicitly completes, abandons, or dead-letters each message. The broker does not assume success.
Retries. If a worker abandons a message (or the lock expires), the message becomes available again. The broker tracks delivery count.
Dead-lettering. After exceeding the maximum delivery count, messages move to a dead-letter sub-queue. Operators inspect them to understand what went wrong.
Sessions. Related messages can be grouped so they are processed in order by the same worker instance.

This is fundamentally different from Event Hubs, which gives you a partitioned event log for high-throughput streaming but does not provide per-message ownership or settlement.

Message lifecycle

sequenceDiagram
    autonumber
    participant Producer as Producer
    participant Queue as Service Bus Queue
    participant Worker as Worker
    participant DLQ as Dead-Letter Queue
    participant Ops as Operator

    Producer->>Queue: Send work item
    Queue->>Worker: Deliver message (locked)
    
    alt Processing succeeds
        Worker->>Worker: Process work item
        Worker->>Queue: Complete message
    else Transient failure
        Worker->>Queue: Abandon message
        Note over Queue: Delivery count incremented
        Queue->>Worker: Redeliver message (locked)
        Worker->>Worker: Retry processing
        Worker->>Queue: Complete message
    else Repeated failure
        Worker->>Queue: Abandon message
        Note over Queue: Max delivery count exceeded
        Queue->>DLQ: Move to dead-letter queue
        Ops->>DLQ: Inspect failed message
        Note over Ops: Fix root cause, resubmit or discard
    end

The lifecycle has three outcomes:

Success. Worker processes the item and completes the message. It is removed from the queue.
Transient failure. Worker abandons the message. Service Bus increments the delivery count and makes it available again after a delay. Another worker (or the same one) picks it up.
Persistent failure. After exceeding the max delivery count, the message moves to the dead-letter queue. An operator reviews it to determine whether the issue is a bad payload, a missing prerequisite, or a broken dependency.

Queues vs topics

Use a queue when one worker pipeline processes each message. This is the simple case: one producer, one consumer, one processing path.

Use a topic with subscriptions when multiple consumers each need their own copy of the message. Each subscription acts like an independent queue with its own filters, delivery count, and dead-letter handling.

Fan-out examples:

One subscription provisions the target system while another records an audit trail
One subscription handles the primary workflow while another sends operator notifications
Different subscriptions filter for different message types using subscription rules

This is workflow fan-out, not analytics streaming. If you need multiple readers on a high-volume event stream, use Event Hubs instead.

Error handling model

Keep the failure model simple and explicit:

Transient failures (network timeouts, rate limits, temporary unavailability) should result in the worker abandoning the message for retry. Service Bus handles the backoff by making the message available again after the lock expires.

Persistent failures (bad payload, missing prerequisite, authorization error) should be allowed to exhaust the delivery count and land in the dead-letter queue. Trying to fix unfixable problems with more retries just wastes compute.

Dead-letter inspection is an operational task, not an afterthought. Build tooling or alerts around the dead-letter queue so operators notice failed messages promptly. Common causes include schema changes, expired credentials, and dependency outages.

Idempotent handlers are essential

Because messages can be delivered more than once (after abandonment, lock expiration, or infrastructure recovery), your handler must be idempotent. Processing the same message twice should produce the same result as processing it once.

Strategies:

Use a unique message ID or correlation ID to detect duplicates before applying side effects
Design downstream writes as upserts rather than inserts
Store processing results in Cosmos DB keyed by the message ID so you can check before re-processing

When not to use this pattern

This pattern is wrong when:

You need high-throughput streaming. If you are processing millions of events per second and consumers read independently, use Event Hubs. Service Bus is not a streaming platform.
Simple buffering is enough. If you just need a cheap queue with basic retry and the workload does not need sessions, dead-lettering, or topic fan-out, Azure Queue Storage is simpler and cheaper.
The data is for analytics. If the goal is to query and aggregate event data rather than process individual items, use Data Explorer.
The work is fire-and-forget. If failed messages can be safely dropped and you do not need delivery guarantees, a simpler approach works.

See Service Bus for deeper coverage of sessions, message scheduling, and advanced broker patterns.