Messaging Patterns

Every platform eventually needs to move work between components. The decision that matters most is not which messaging service to pick in the abstract. It is understanding what kind of message the system carries, how failures should surface, and what recovery looks like when things break.

This applies to any Azure-hosted platform: commerce systems, IoT backends, SaaS integration layers, and identity-adjacent services like those in the Entra ecosystem.

The Core Decision: Command or Event?

The cleanest selection rule starts with what the message means.

  • Choose Service Bus when the message means “someone must complete this work.”
  • Choose Event Hubs when the message means “here is another event in the stream.”

That difference sounds small, but it changes almost everything downstream.

Service Bus assumes the system cares about ownership, settlement, retries, and inspection of failed work. Event Hubs assumes the system cares about ingesting high-volume data, letting multiple consumers read independently, and replaying retained history when needed.

flowchart TD
    A[New messaging need] --> B{What does the<br/>message represent?}
    B -->|A command or<br/>work item| C{Expected<br/>throughput?}
    B -->|A stream event<br/>or signal| D{Volume and<br/>consumer count?}
    B -->|Simple job<br/>queue| E[Queue Storage]

    C -->|Needs retries,<br/>dead-lettering,<br/>sessions| F[Service Bus]
    C -->|Simple FIFO<br/>with low volume| G{Need broker<br/>features?}
    G -->|Yes| F
    G -->|No| E

    D -->|High volume,<br/>multiple readers,<br/>replay needed| H[Event Hubs]
    D -->|Low volume,<br/>single consumer| I{Need retention<br/>and replay?}
    I -->|Yes| H
    I -->|No| F

    style F fill:#4a9,stroke:#333
    style H fill:#49a,stroke:#333
    style E fill:#a94,stroke:#333

Selection Criteria That Actually Matter

Use the pressure points below rather than product branding.

Decision pressureEvent HubsService BusQueue Storage
Primary fitLarge event streamsDurable workflow coordinationSimple job queues
Ordering modelWithin a partition onlyPer queue; entity-level via sessionsFIFO best-effort
ReplayBuilt in via retention and offsetsNot a replay systemNot a replay system
Throughput profileVery high ingest, parallel consumersLower throughput, richer broker behaviorModerate, simple polling
Delivery semanticsAt-least-once with client-managed progressBroker-managed locks, settlement, retries, dead-letteringAt-least-once with visibility timeout
Multiple consumersNative via consumer groupsFan-out via topics and subscriptionsSingle consumer per message
SessionsNot the modelFirst-class correlated orderingNot supported
Dead-letteringNo built-in equivalentFirst-classPoison queue after N failures
Idempotency expectationConsumers must handle duplicates and replaysConsumers still need idempotencyConsumers must handle duplicates

Neither service removes the need for idempotent consumers. They fail differently, but both can deliver the same logical message more than once.

Workload Examples

Order processing workflow

A commerce system receives an order, validates payment, reserves inventory, and notifies fulfillment. Each step has an owner, and failure should surface clearly.

Choose Service Bus.

Why:

  • the unit of work has an owner at each stage,
  • repeated failure should land in a dead-letter queue for operator review,
  • related steps for the same order may need session ordering,
  • operators need a clear remediation path when a step fails.

IoT telemetry stream

Thousands of devices emit temperature readings every few seconds. Multiple downstream systems (alerting, dashboards, long-term storage) consume the same stream independently.

Choose Event Hubs.

Why:

  • the stream feeds analytics, alerting, and archival simultaneously,
  • replay matters for backfill or investigating sensor anomalies,
  • no single consumer should own or acknowledge an event permanently.

Background job queue

A web application queues thumbnail generation requests. Volume is moderate, processing is simple, and the system does not need broker features.

Choose Queue Storage.

Why:

  • the pattern is simple dequeue-process-delete,
  • no need for topics, subscriptions, or sessions,
  • cost is significantly lower for simple workloads.

Mixed workload: commands plus audit stream

Some systems carry both. A reliable command flow tells a worker to act, and a separate event stream records what happened for diagnostics or compliance.

Use both, but keep the contracts separate:

  • Service Bus carries the command or workflow step.
  • Event Hubs carries the resulting telemetry or evidence stream.

The common mistake is forcing one service to do both jobs and then rebuilding the missing behavior in application code.

Ordering Is Narrower Than People Expect

Ordering guarantees are often overstated during design reviews.

With Event Hubs, ordering exists only within a partition. If two related events land in different partitions, the stream does not promise cross-partition order. That is acceptable for many analytics workloads, but it breaks systems that assume a single global sequence.

With Service Bus, ordering is still not automatic. A queue gives a cleaner brokered work stream, and sessions can keep related messages together. But ordering only helps if the session key actually matches the workflow boundary you care about.

If the requirement is “all updates for customer X must stay in order,” the design is not done until the partition key or session key reflects customer X consistently.

Replay, Recovery, and Historical Reprocessing

Event Hubs is the clear choice when replay is a real operational tool. A new consumer can start from an earlier offset, an investigation can reread retained data, and a bug fix can reprocess historical events while the retention window covers them.

Service Bus uses a different recovery model:

  • retrying the locked message,
  • moving the message to the dead-letter queue,
  • fixing the payload or dependency and resubmitting,
  • recreating a command from a state store or source system.

That is not a weakness. Workflow systems usually need controlled re-execution, not stream replay.

Throughput and Cost Pressure

Both services become expensive when the wrong workload shape lands on them.

Event Hubs cost pressure usually comes from:

  • underestimating partition needs,
  • one hot partition receiving most traffic,
  • retaining large streams longer than the investigation model needs,
  • pushing workflow commands through a service optimized for ingest.

Service Bus cost pressure usually comes from:

  • routing telemetry through broker features designed for workflow coordination,
  • serializing too much work through a narrow session key,
  • letting dead-letter volume grow without fixing root causes,
  • using topics and subscriptions where a simple queue would do.

The practical rule: high event volume pushes toward Event Hubs; failure handling and workflow ownership push toward Service Bus; simple job queues push toward Queue Storage.

Failure Modes and Mitigation

Event Hubs failure patterns

FailureImpactMitigation
Checkpoint mistakesDuplicate processing or consumption gapsCheckpoint deliberately; test restart behavior
Hot partitionsLag concentrates on one keyChoose partition keys that reflect both scale and ordering needs
Slow consumersFall behind; lose replay window when retention expiresMonitor consumer lag; scale consumers independently
Schema driftBreaks downstream parsing across all consumers at onceVersion event payloads; use schema registry
Simulating broker semanticsTeams build custom retry, poison-event, ownership logicMove delivery-sensitive work to Service Bus

Service Bus failure patterns

FailureImpactMitigation
Poison messagesRepeatedly fail, block useful workConfigure max delivery count; monitor dead-letter queues
Non-idempotent handlersDuplicate side effects during retryPersist progress outside the message; check before acting
Bad session keySerializes unrelated work or breaks orderingMatch session keys to the actual workflow boundary
Long-running handlersLock loss triggers noisy redeliveryKeep handlers short; offload state to external stores
Subscription sprawlTopic fan-out becomes hard to reason aboutAudit subscriptions; remove unused ones

Idempotency Is Mandatory in Both Models

This is the part teams skip until production says otherwise.

With Event Hubs, duplicates happen because consumers restart, replay, or recover from checkpoint ambiguity. With Service Bus, duplicates happen because messages are retried, redelivered after lock loss, or resubmitted after remediation. In both cases, the consumer must detect whether the side effect already happened.

That usually means storing enough state outside the transport to answer questions like:

  • did order X already reach fulfillment step Y,
  • did we already send the notification for event Z,
  • did we already emit the downstream command for this checkpoint.

If the answer lives only in memory or only in the transport, recovery will be fragile.

Entra Context

In Entra-adjacent systems, these patterns appear in familiar shapes. Provisioning workflows (user onboarding, entitlement changes, connector remediation) are Service Bus candidates because each step has an owner and failure needs operator visibility. Audit feeds, sign-in telemetry, and reconciliation results are Event Hubs candidates because they are high-volume streams consumed by multiple downstream systems. The selection logic is the same as any platform workload; the domain objects just happen to be identities and entitlements instead of orders and devices.

Practical Recommendation

Start with Service Bus for delivery-sensitive workflows and Event Hubs for high-volume event streams. Use Queue Storage for simple background jobs that do not need broker features. Only mix services when the workload clearly contains both a command path and an analytics path.

Do not hide the difference behind a generic messaging abstraction. The services are opinionated because the workloads are different. Let the contract stay visible in the design.