Messaging Patterns
Every platform eventually needs to move work between components. The decision that matters most is not which messaging service to pick in the abstract. It is understanding what kind of message the system carries, how failures should surface, and what recovery looks like when things break.
This applies to any Azure-hosted platform: commerce systems, IoT backends, SaaS integration layers, and identity-adjacent services like those in the Entra ecosystem.
The Core Decision: Command or Event?
The cleanest selection rule starts with what the message means.
- Choose Service Bus when the message means “someone must complete this work.”
- Choose Event Hubs when the message means “here is another event in the stream.”
That difference sounds small, but it changes almost everything downstream.
Service Bus assumes the system cares about ownership, settlement, retries, and inspection of failed work. Event Hubs assumes the system cares about ingesting high-volume data, letting multiple consumers read independently, and replaying retained history when needed.
flowchart TD
A[New messaging need] --> B{What does the<br/>message represent?}
B -->|A command or<br/>work item| C{Expected<br/>throughput?}
B -->|A stream event<br/>or signal| D{Volume and<br/>consumer count?}
B -->|Simple job<br/>queue| E[Queue Storage]
C -->|Needs retries,<br/>dead-lettering,<br/>sessions| F[Service Bus]
C -->|Simple FIFO<br/>with low volume| G{Need broker<br/>features?}
G -->|Yes| F
G -->|No| E
D -->|High volume,<br/>multiple readers,<br/>replay needed| H[Event Hubs]
D -->|Low volume,<br/>single consumer| I{Need retention<br/>and replay?}
I -->|Yes| H
I -->|No| F
style F fill:#4a9,stroke:#333
style H fill:#49a,stroke:#333
style E fill:#a94,stroke:#333
Selection Criteria That Actually Matter
Use the pressure points below rather than product branding.
| Decision pressure | Event Hubs | Service Bus | Queue Storage |
|---|---|---|---|
| Primary fit | Large event streams | Durable workflow coordination | Simple job queues |
| Ordering model | Within a partition only | Per queue; entity-level via sessions | FIFO best-effort |
| Replay | Built in via retention and offsets | Not a replay system | Not a replay system |
| Throughput profile | Very high ingest, parallel consumers | Lower throughput, richer broker behavior | Moderate, simple polling |
| Delivery semantics | At-least-once with client-managed progress | Broker-managed locks, settlement, retries, dead-lettering | At-least-once with visibility timeout |
| Multiple consumers | Native via consumer groups | Fan-out via topics and subscriptions | Single consumer per message |
| Sessions | Not the model | First-class correlated ordering | Not supported |
| Dead-lettering | No built-in equivalent | First-class | Poison queue after N failures |
| Idempotency expectation | Consumers must handle duplicates and replays | Consumers still need idempotency | Consumers must handle duplicates |
Neither service removes the need for idempotent consumers. They fail differently, but both can deliver the same logical message more than once.
Workload Examples
Order processing workflow
A commerce system receives an order, validates payment, reserves inventory, and notifies fulfillment. Each step has an owner, and failure should surface clearly.
Choose Service Bus.
Why:
- the unit of work has an owner at each stage,
- repeated failure should land in a dead-letter queue for operator review,
- related steps for the same order may need session ordering,
- operators need a clear remediation path when a step fails.
IoT telemetry stream
Thousands of devices emit temperature readings every few seconds. Multiple downstream systems (alerting, dashboards, long-term storage) consume the same stream independently.
Choose Event Hubs.
Why:
- the stream feeds analytics, alerting, and archival simultaneously,
- replay matters for backfill or investigating sensor anomalies,
- no single consumer should own or acknowledge an event permanently.
Background job queue
A web application queues thumbnail generation requests. Volume is moderate, processing is simple, and the system does not need broker features.
Choose Queue Storage.
Why:
- the pattern is simple dequeue-process-delete,
- no need for topics, subscriptions, or sessions,
- cost is significantly lower for simple workloads.
Mixed workload: commands plus audit stream
Some systems carry both. A reliable command flow tells a worker to act, and a separate event stream records what happened for diagnostics or compliance.
Use both, but keep the contracts separate:
- Service Bus carries the command or workflow step.
- Event Hubs carries the resulting telemetry or evidence stream.
The common mistake is forcing one service to do both jobs and then rebuilding the missing behavior in application code.
Ordering Is Narrower Than People Expect
Ordering guarantees are often overstated during design reviews.
With Event Hubs, ordering exists only within a partition. If two related events land in different partitions, the stream does not promise cross-partition order. That is acceptable for many analytics workloads, but it breaks systems that assume a single global sequence.
With Service Bus, ordering is still not automatic. A queue gives a cleaner brokered work stream, and sessions can keep related messages together. But ordering only helps if the session key actually matches the workflow boundary you care about.
If the requirement is “all updates for customer X must stay in order,” the design is not done until the partition key or session key reflects customer X consistently.
Replay, Recovery, and Historical Reprocessing
Event Hubs is the clear choice when replay is a real operational tool. A new consumer can start from an earlier offset, an investigation can reread retained data, and a bug fix can reprocess historical events while the retention window covers them.
Service Bus uses a different recovery model:
- retrying the locked message,
- moving the message to the dead-letter queue,
- fixing the payload or dependency and resubmitting,
- recreating a command from a state store or source system.
That is not a weakness. Workflow systems usually need controlled re-execution, not stream replay.
Throughput and Cost Pressure
Both services become expensive when the wrong workload shape lands on them.
Event Hubs cost pressure usually comes from:
- underestimating partition needs,
- one hot partition receiving most traffic,
- retaining large streams longer than the investigation model needs,
- pushing workflow commands through a service optimized for ingest.
Service Bus cost pressure usually comes from:
- routing telemetry through broker features designed for workflow coordination,
- serializing too much work through a narrow session key,
- letting dead-letter volume grow without fixing root causes,
- using topics and subscriptions where a simple queue would do.
The practical rule: high event volume pushes toward Event Hubs; failure handling and workflow ownership push toward Service Bus; simple job queues push toward Queue Storage.
Failure Modes and Mitigation
Event Hubs failure patterns
| Failure | Impact | Mitigation |
|---|---|---|
| Checkpoint mistakes | Duplicate processing or consumption gaps | Checkpoint deliberately; test restart behavior |
| Hot partitions | Lag concentrates on one key | Choose partition keys that reflect both scale and ordering needs |
| Slow consumers | Fall behind; lose replay window when retention expires | Monitor consumer lag; scale consumers independently |
| Schema drift | Breaks downstream parsing across all consumers at once | Version event payloads; use schema registry |
| Simulating broker semantics | Teams build custom retry, poison-event, ownership logic | Move delivery-sensitive work to Service Bus |
Service Bus failure patterns
| Failure | Impact | Mitigation |
|---|---|---|
| Poison messages | Repeatedly fail, block useful work | Configure max delivery count; monitor dead-letter queues |
| Non-idempotent handlers | Duplicate side effects during retry | Persist progress outside the message; check before acting |
| Bad session key | Serializes unrelated work or breaks ordering | Match session keys to the actual workflow boundary |
| Long-running handlers | Lock loss triggers noisy redelivery | Keep handlers short; offload state to external stores |
| Subscription sprawl | Topic fan-out becomes hard to reason about | Audit subscriptions; remove unused ones |
Idempotency Is Mandatory in Both Models
This is the part teams skip until production says otherwise.
With Event Hubs, duplicates happen because consumers restart, replay, or recover from checkpoint ambiguity. With Service Bus, duplicates happen because messages are retried, redelivered after lock loss, or resubmitted after remediation. In both cases, the consumer must detect whether the side effect already happened.
That usually means storing enough state outside the transport to answer questions like:
- did order
Xalready reach fulfillment stepY, - did we already send the notification for event
Z, - did we already emit the downstream command for this checkpoint.
If the answer lives only in memory or only in the transport, recovery will be fragile.
Entra Context
In Entra-adjacent systems, these patterns appear in familiar shapes. Provisioning workflows (user onboarding, entitlement changes, connector remediation) are Service Bus candidates because each step has an owner and failure needs operator visibility. Audit feeds, sign-in telemetry, and reconciliation results are Event Hubs candidates because they are high-volume streams consumed by multiple downstream systems. The selection logic is the same as any platform workload; the domain objects just happen to be identities and entitlements instead of orders and devices.
Practical Recommendation
Start with Service Bus for delivery-sensitive workflows and Event Hubs for high-volume event streams. Use Queue Storage for simple background jobs that do not need broker features. Only mix services when the workload clearly contains both a command path and an analytics path.
Do not hide the difference behind a generic messaging abstraction. The services are opinionated because the workloads are different. Let the contract stay visible in the design.