Reliable Worker with Service Bus
What you will build
A reliable worker pattern where a producer creates work items, Service Bus ensures they are delivered and processed exactly once, and a worker handles them with proper retry and failure semantics. This pattern applies to any workload that needs guaranteed processing with failure handling.
Scenario
You have discrete work items that must be processed reliably. Each item represents a task with consequences: provisioning a resource, processing an order, syncing data to an external system, running a compliance check. If processing fails, the item should be retried. If it keeps failing, it should be set aside for operator inspection rather than lost or silently dropped.
Examples:
- Order processing: validate payment, reserve inventory, confirm shipment
- Provisioning requests: create accounts, assign permissions, configure services
- Async task queues: PDF generation, report compilation, data migration batches
- Graph-driven workflows: react to group changes by updating downstream access lists
Why Service Bus
Service Bus is a message broker built for work items, not event streams. The distinction matters:
- Message ownership. When a worker receives a message, it holds a lock. No other worker can process that message until the lock expires or the worker releases it.
- Settlement. The worker explicitly completes, abandons, or dead-letters each message. The broker does not assume success.
- Retries. If a worker abandons a message (or the lock expires), the message becomes available again. The broker tracks delivery count.
- Dead-lettering. After exceeding the maximum delivery count, messages move to a dead-letter sub-queue. Operators inspect them to understand what went wrong.
- Sessions. Related messages can be grouped so they are processed in order by the same worker instance.
This is fundamentally different from Event Hubs, which gives you a partitioned event log for high-throughput streaming but does not provide per-message ownership or settlement.
Message lifecycle
sequenceDiagram
autonumber
participant Producer as Producer
participant Queue as Service Bus Queue
participant Worker as Worker
participant DLQ as Dead-Letter Queue
participant Ops as Operator
Producer->>Queue: Send work item
Queue->>Worker: Deliver message (locked)
alt Processing succeeds
Worker->>Worker: Process work item
Worker->>Queue: Complete message
else Transient failure
Worker->>Queue: Abandon message
Note over Queue: Delivery count incremented
Queue->>Worker: Redeliver message (locked)
Worker->>Worker: Retry processing
Worker->>Queue: Complete message
else Repeated failure
Worker->>Queue: Abandon message
Note over Queue: Max delivery count exceeded
Queue->>DLQ: Move to dead-letter queue
Ops->>DLQ: Inspect failed message
Note over Ops: Fix root cause, resubmit or discard
end
The lifecycle has three outcomes:
- Success. Worker processes the item and completes the message. It is removed from the queue.
- Transient failure. Worker abandons the message. Service Bus increments the delivery count and makes it available again after a delay. Another worker (or the same one) picks it up.
- Persistent failure. After exceeding the max delivery count, the message moves to the dead-letter queue. An operator reviews it to determine whether the issue is a bad payload, a missing prerequisite, or a broken dependency.
Queues vs topics
Use a queue when one worker pipeline processes each message. This is the simple case: one producer, one consumer, one processing path.
Use a topic with subscriptions when multiple consumers each need their own copy of the message. Each subscription acts like an independent queue with its own filters, delivery count, and dead-letter handling.
Fan-out examples:
- One subscription provisions the target system while another records an audit trail
- One subscription handles the primary workflow while another sends operator notifications
- Different subscriptions filter for different message types using subscription rules
This is workflow fan-out, not analytics streaming. If you need multiple readers on a high-volume event stream, use Event Hubs instead.
Error handling model
Keep the failure model simple and explicit:
Transient failures (network timeouts, rate limits, temporary unavailability) should result in the worker abandoning the message for retry. Service Bus handles the backoff by making the message available again after the lock expires.
Persistent failures (bad payload, missing prerequisite, authorization error) should be allowed to exhaust the delivery count and land in the dead-letter queue. Trying to fix unfixable problems with more retries just wastes compute.
Dead-letter inspection is an operational task, not an afterthought. Build tooling or alerts around the dead-letter queue so operators notice failed messages promptly. Common causes include schema changes, expired credentials, and dependency outages.
Idempotent handlers are essential
Because messages can be delivered more than once (after abandonment, lock expiration, or infrastructure recovery), your handler must be idempotent. Processing the same message twice should produce the same result as processing it once.
Strategies:
- Use a unique message ID or correlation ID to detect duplicates before applying side effects
- Design downstream writes as upserts rather than inserts
- Store processing results in Cosmos DB keyed by the message ID so you can check before re-processing
When not to use this pattern
This pattern is wrong when:
- You need high-throughput streaming. If you are processing millions of events per second and consumers read independently, use Event Hubs. Service Bus is not a streaming platform.
- Simple buffering is enough. If you just need a cheap queue with basic retry and the workload does not need sessions, dead-lettering, or topic fan-out, Azure Queue Storage is simpler and cheaper.
- The data is for analytics. If the goal is to query and aggregate event data rather than process individual items, use Data Explorer.
- The work is fire-and-forget. If failed messages can be safely dropped and you do not need delivery guarantees, a simpler approach works.
See Service Bus for deeper coverage of sessions, message scheduling, and advanced broker patterns.