Messaging Patterns

Every platform eventually needs to move work between components. The decision that matters most is not which messaging service to pick in the abstract. It is understanding what kind of message the system carries, how failures should surface, and what recovery looks like when things break.

This applies to any Azure-hosted platform: commerce systems, IoT backends, SaaS integration layers, and identity-adjacent services like those in the Entra ecosystem.

The Core Decision: Command or Event?

The cleanest selection rule starts with what the message means.

Choose Service Bus when the message means “someone must complete this work.”
Choose Event Hubs when the message means “here is another event in the stream.”

That difference sounds small, but it changes almost everything downstream.

Service Bus assumes the system cares about ownership, settlement, retries, and inspection of failed work. Event Hubs assumes the system cares about ingesting high-volume data, letting multiple consumers read independently, and replaying retained history when needed.

flowchart TD
    A[New messaging need] --> B{What does the<br/>message represent?}
    B -->|A command or<br/>work item| C{Expected<br/>throughput?}
    B -->|A stream event<br/>or signal| D{Volume and<br/>consumer count?}
    B -->|Simple job<br/>queue| E[Queue Storage]

    C -->|Needs retries,<br/>dead-lettering,<br/>sessions| F[Service Bus]
    C -->|Simple FIFO<br/>with low volume| G{Need broker<br/>features?}
    G -->|Yes| F
    G -->|No| E

    D -->|High volume,<br/>multiple readers,<br/>replay needed| H[Event Hubs]
    D -->|Low volume,<br/>single consumer| I{Need retention<br/>and replay?}
    I -->|Yes| H
    I -->|No| F

    style F fill:#4a9,stroke:#333
    style H fill:#49a,stroke:#333
    style E fill:#a94,stroke:#333

Selection Criteria That Actually Matter

Use the pressure points below rather than product branding.

Decision pressure	Event Hubs	Service Bus	Queue Storage
Primary fit	Large event streams	Durable workflow coordination	Simple job queues
Ordering model	Within a partition only	Per queue; entity-level via sessions	FIFO best-effort
Replay	Built in via retention and offsets	Not a replay system	Not a replay system
Throughput profile	Very high ingest, parallel consumers	Lower throughput, richer broker behavior	Moderate, simple polling
Delivery semantics	At-least-once with client-managed progress	Broker-managed locks, settlement, retries, dead-lettering	At-least-once with visibility timeout
Multiple consumers	Native via consumer groups	Fan-out via topics and subscriptions	Single consumer per message
Sessions	Not the model	First-class correlated ordering	Not supported
Dead-lettering	No built-in equivalent	First-class	Poison queue after N failures
Idempotency expectation	Consumers must handle duplicates and replays	Consumers still need idempotency	Consumers must handle duplicates

Neither service removes the need for idempotent consumers. They fail differently, but both can deliver the same logical message more than once.

Workload Examples

Order processing workflow

A commerce system receives an order, validates payment, reserves inventory, and notifies fulfillment. Each step has an owner, and failure should surface clearly.

Choose Service Bus.

Why:

the unit of work has an owner at each stage,
repeated failure should land in a dead-letter queue for operator review,
related steps for the same order may need session ordering,
operators need a clear remediation path when a step fails.

IoT telemetry stream

Thousands of devices emit temperature readings every few seconds. Multiple downstream systems (alerting, dashboards, long-term storage) consume the same stream independently.

Choose Event Hubs.

Why:

the stream feeds analytics, alerting, and archival simultaneously,
replay matters for backfill or investigating sensor anomalies,
no single consumer should own or acknowledge an event permanently.

Background job queue

A web application queues thumbnail generation requests. Volume is moderate, processing is simple, and the system does not need broker features.

Choose Queue Storage.

Why:

the pattern is simple dequeue-process-delete,
no need for topics, subscriptions, or sessions,
cost is significantly lower for simple workloads.

Mixed workload: commands plus audit stream

Some systems carry both. A reliable command flow tells a worker to act, and a separate event stream records what happened for diagnostics or compliance.

Use both, but keep the contracts separate:

Service Bus carries the command or workflow step.
Event Hubs carries the resulting telemetry or evidence stream.

The common mistake is forcing one service to do both jobs and then rebuilding the missing behavior in application code.

Ordering Is Narrower Than People Expect

Ordering guarantees are often overstated during design reviews.

With Event Hubs, ordering exists only within a partition. If two related events land in different partitions, the stream does not promise cross-partition order. That is acceptable for many analytics workloads, but it breaks systems that assume a single global sequence.

With Service Bus, ordering is still not automatic. A queue gives a cleaner brokered work stream, and sessions can keep related messages together. But ordering only helps if the session key actually matches the workflow boundary you care about.

If the requirement is “all updates for customer X must stay in order,” the design is not done until the partition key or session key reflects customer X consistently.

Replay, Recovery, and Historical Reprocessing

Event Hubs is the clear choice when replay is a real operational tool. A new consumer can start from an earlier offset, an investigation can reread retained data, and a bug fix can reprocess historical events while the retention window covers them.

Service Bus uses a different recovery model:

retrying the locked message,
moving the message to the dead-letter queue,
fixing the payload or dependency and resubmitting,
recreating a command from a state store or source system.

That is not a weakness. Workflow systems usually need controlled re-execution, not stream replay.

Throughput and Cost Pressure

Both services become expensive when the wrong workload shape lands on them.

Event Hubs cost pressure usually comes from:

underestimating partition needs,
one hot partition receiving most traffic,
retaining large streams longer than the investigation model needs,
pushing workflow commands through a service optimized for ingest.

Service Bus cost pressure usually comes from:

routing telemetry through broker features designed for workflow coordination,
serializing too much work through a narrow session key,
letting dead-letter volume grow without fixing root causes,
using topics and subscriptions where a simple queue would do.

The practical rule: high event volume pushes toward Event Hubs; failure handling and workflow ownership push toward Service Bus; simple job queues push toward Queue Storage.

Failure Modes and Mitigation

Event Hubs failure patterns

Failure	Impact	Mitigation
Checkpoint mistakes	Duplicate processing or consumption gaps	Checkpoint deliberately; test restart behavior
Hot partitions	Lag concentrates on one key	Choose partition keys that reflect both scale and ordering needs
Slow consumers	Fall behind; lose replay window when retention expires	Monitor consumer lag; scale consumers independently
Schema drift	Breaks downstream parsing across all consumers at once	Version event payloads; use schema registry
Simulating broker semantics	Teams build custom retry, poison-event, ownership logic	Move delivery-sensitive work to Service Bus

Service Bus failure patterns

Failure	Impact	Mitigation
Poison messages	Repeatedly fail, block useful work	Configure max delivery count; monitor dead-letter queues
Non-idempotent handlers	Duplicate side effects during retry	Persist progress outside the message; check before acting
Bad session key	Serializes unrelated work or breaks ordering	Match session keys to the actual workflow boundary
Long-running handlers	Lock loss triggers noisy redelivery	Keep handlers short; offload state to external stores
Subscription sprawl	Topic fan-out becomes hard to reason about	Audit subscriptions; remove unused ones

Idempotency Is Mandatory in Both Models

This is the part teams skip until production says otherwise.

With Event Hubs, duplicates happen because consumers restart, replay, or recover from checkpoint ambiguity. With Service Bus, duplicates happen because messages are retried, redelivered after lock loss, or resubmitted after remediation. In both cases, the consumer must detect whether the side effect already happened.

That usually means storing enough state outside the transport to answer questions like:

did order X already reach fulfillment step Y,
did we already send the notification for event Z,
did we already emit the downstream command for this checkpoint.

If the answer lives only in memory or only in the transport, recovery will be fragile.

Entra Context

In Entra-adjacent systems, these patterns appear in familiar shapes. Provisioning workflows (user onboarding, entitlement changes, connector remediation) are Service Bus candidates because each step has an owner and failure needs operator visibility. Audit feeds, sign-in telemetry, and reconciliation results are Event Hubs candidates because they are high-volume streams consumed by multiple downstream systems. The selection logic is the same as any platform workload; the domain objects just happen to be identities and entitlements instead of orders and devices.

Practical Recommendation

Start with Service Bus for delivery-sensitive workflows and Event Hubs for high-volume event streams. Use Queue Storage for simple background jobs that do not need broker features. Only mix services when the workload clearly contains both a command path and an analytics path.

Do not hide the difference behind a generic messaging abstraction. The services are opinionated because the workloads are different. Let the contract stay visible in the design.