Azure Event Hubs

Event Hubs is Azure’s managed real-time data streaming platform. It ingests millions of events per second from any source and delivers them to multiple consumers independently. If you need an append-only distributed log in the cloud with Apache Kafka compatibility, this is the service.

Core mental model: Event Hubs is a distributed commit log. Producers append events, and consumers read from the log at their own pace, from any position. Events are not deleted after consumption; they remain available for the configured retention period.

This is fundamentally different from a message queue. In a queue, a message is consumed and removed. In a streaming log, events persist and multiple consumers can each maintain their own position in the stream.

Key Concepts

Namespace

The top-level resource that contains one or more Event Hubs. A namespace provides a DNS endpoint, access policies, and network configuration. Think of it as the hosting container for your streaming infrastructure.

Event Hub (Topic)

A named stream within a namespace. Analogous to a Kafka topic. Each Event Hub holds an ordered sequence of events split across partitions. You typically create one Event Hub per event type or data stream.

Partitions

Partitions are the parallel lanes of a stream. Each partition is an independent, ordered sequence of events.

  • Ordering: Events within a single partition are strictly ordered. There is no ordering guarantee across partitions.
  • Scaling: More partitions means more parallel consumers. You set the partition count at creation time (1-32 for Standard, up to 2,000 for Premium/Dedicated).
  • Assignment: Producers can target a specific partition, use a partition key (events with the same key always go to the same partition), or let Event Hubs distribute round-robin.

Consumer Groups

A consumer group is a named view of the entire Event Hub stream. Each consumer group tracks its own position independently, so multiple applications can read the same events without interfering with each other.

For example, one consumer group feeds real-time alerting, another feeds long-term analytics, and a third handles enrichment. All three read from the same Event Hub.

Checkpointing

Consumers track their read position (offset) in each partition. This is called checkpointing. If a consumer restarts, it resumes from its last checkpoint rather than replaying everything. Checkpoints are typically stored in Azure Blob Storage.

Retention

Events are retained for a configurable period (1-90 days on Standard, up to 90 days on Premium). During retention, any consumer can reread or replay events from any position. After retention expires, events are removed.

Architecture

graph LR
    P1[Producer 1] --> EH
    P2[Producer 2] --> EH
    P3[Producer 3] --> EH

    subgraph EH["Event Hub"]
        PA[Partition 0]
        PB[Partition 1]
        PC[Partition 2]
    end

    subgraph CG1["Consumer Group: Analytics"]
        C1A[Consumer A]
        C1B[Consumer B]
    end

    subgraph CG2["Consumer Group: Alerting"]
        C2A[Consumer C]
    end

    PA --> C1A
    PB --> C1B
    PC --> C1A
    PA --> C2A
    PB --> C2A
    PC --> C2A

    style EH fill:#2d5aa0,color:#fff
    style CG1 fill:#4a9e5c,color:#fff
    style CG2 fill:#c4762b,color:#fff

Producers send events into the Event Hub. Events land in partitions (by partition key or round-robin). Each consumer group reads all partitions independently, with individual consumers within a group splitting the partition load.

Event Capture

Capture automatically archives every event to Azure Blob Storage or Azure Data Lake Storage in Avro format. No code required. You configure it on the Event Hub, and it writes batched files at your chosen time or size interval.

This is useful for:

  • Long-term retention beyond the Event Hub’s retention window.
  • Feeding batch analytics and data lake pipelines.
  • Compliance and audit trail requirements.
  • Replaying historical data into new processing systems.

Schema Registry

Event Hubs includes a Schema Registry for managing Avro and JSON schemas. Producers and consumers reference schemas by ID rather than embedding the full schema in every event, which reduces payload size and enforces contract compatibility across teams.

The Schema Registry supports compatibility modes (backward, forward, full) to prevent breaking changes to event schemas.

Tiers

TierKey characteristics
StandardShared infrastructure, up to 32 partitions, 1-7 day retention, up to 20 consumer groups
PremiumDedicated compute, up to 100 partitions, up to 90 day retention, dynamic partition scaling
DedicatedSingle-tenant clusters, highest throughput, up to 2,000 partitions, custom retention

Standard is fine for most workloads. Premium adds isolation and longer retention. Dedicated is for extreme throughput or compliance requirements.

Kafka Compatibility

Event Hubs exposes a Kafka-compatible endpoint. Existing Kafka producers and consumers can connect by changing the bootstrap server configuration and authentication, with no code changes to the Kafka client logic. This makes Event Hubs a managed alternative to running your own Kafka cluster.

Supported Kafka client versions: 1.0 and later.

Common Use Cases

  • IoT telemetry - ingesting millions of device events per second with partition-per-device-type patterns.
  • Log aggregation - centralizing application and infrastructure logs from distributed systems.
  • Clickstream analytics - capturing user interaction events for real-time and batch analysis.
  • Financial transactions - high-throughput transaction streams with strict partition ordering.
  • Kafka migration - moving from self-managed Kafka to a managed service without rewriting producers/consumers.
  • Real-time dashboards - feeding live metrics and KPIs to monitoring systems.
  • Event sourcing - storing domain events as an immutable append-only log.

Event Hubs vs Service Bus

These two services solve different problems. Choosing the wrong one creates friction.

Event HubsService Bus
Mental modelAppend-only distributed logReliable message broker with delivery guarantees
Message lifecycleEvents persist for retention period; consumers read independentlyMessages are consumed and removed (or dead-lettered)
OrderingPer-partition orderingPer-session or FIFO queue ordering
Consumer modelMultiple consumer groups each get the full streamCompeting consumers; each message processed once
ReplayYes, any consumer can reread from any offsetNo; once consumed, messages are gone
Dead-letter queueNoYes
Best forHigh-throughput streaming, telemetry, analyticsWorkflow commands, task queues, reliable delivery

Rule of thumb: If the message means “here is an event that happened,” use Event Hubs. If it means “someone must do this work,” use Service Bus.

In Entra-Adjacent Systems

Event Hubs appears in identity-related architectures when systems need to process high-volume event streams:

  • Streaming audit log feeds from Entra ID diagnostic settings.
  • Ingesting sign-in and provisioning telemetry into analytics pipelines (often into Azure Data Explorer).
  • Capturing large reconciliation result streams for multiple downstream processors.
  • Forwarding operational telemetry from identity automation workers.

In these cases, Event Hubs provides the streaming backbone while other services (Functions, Data Explorer, Storage) handle the processing and storage.