Azure Event Hubs
Event Hubs is Azure’s managed real-time data streaming platform. It ingests millions of events per second from any source and delivers them to multiple consumers independently. If you need an append-only distributed log in the cloud with Apache Kafka compatibility, this is the service.
Core mental model: Event Hubs is a distributed commit log. Producers append events, and consumers read from the log at their own pace, from any position. Events are not deleted after consumption; they remain available for the configured retention period.
This is fundamentally different from a message queue. In a queue, a message is consumed and removed. In a streaming log, events persist and multiple consumers can each maintain their own position in the stream.
Key Concepts
Namespace
The top-level resource that contains one or more Event Hubs. A namespace provides a DNS endpoint, access policies, and network configuration. Think of it as the hosting container for your streaming infrastructure.
Event Hub (Topic)
A named stream within a namespace. Analogous to a Kafka topic. Each Event Hub holds an ordered sequence of events split across partitions. You typically create one Event Hub per event type or data stream.
Partitions
Partitions are the parallel lanes of a stream. Each partition is an independent, ordered sequence of events.
- Ordering: Events within a single partition are strictly ordered. There is no ordering guarantee across partitions.
- Scaling: More partitions means more parallel consumers. You set the partition count at creation time (1-32 for Standard, up to 2,000 for Premium/Dedicated).
- Assignment: Producers can target a specific partition, use a partition key (events with the same key always go to the same partition), or let Event Hubs distribute round-robin.
Consumer Groups
A consumer group is a named view of the entire Event Hub stream. Each consumer group tracks its own position independently, so multiple applications can read the same events without interfering with each other.
For example, one consumer group feeds real-time alerting, another feeds long-term analytics, and a third handles enrichment. All three read from the same Event Hub.
Checkpointing
Consumers track their read position (offset) in each partition. This is called checkpointing. If a consumer restarts, it resumes from its last checkpoint rather than replaying everything. Checkpoints are typically stored in Azure Blob Storage.
Retention
Events are retained for a configurable period (1-90 days on Standard, up to 90 days on Premium). During retention, any consumer can reread or replay events from any position. After retention expires, events are removed.
Architecture
graph LR
P1[Producer 1] --> EH
P2[Producer 2] --> EH
P3[Producer 3] --> EH
subgraph EH["Event Hub"]
PA[Partition 0]
PB[Partition 1]
PC[Partition 2]
end
subgraph CG1["Consumer Group: Analytics"]
C1A[Consumer A]
C1B[Consumer B]
end
subgraph CG2["Consumer Group: Alerting"]
C2A[Consumer C]
end
PA --> C1A
PB --> C1B
PC --> C1A
PA --> C2A
PB --> C2A
PC --> C2A
style EH fill:#2d5aa0,color:#fff
style CG1 fill:#4a9e5c,color:#fff
style CG2 fill:#c4762b,color:#fff
Producers send events into the Event Hub. Events land in partitions (by partition key or round-robin). Each consumer group reads all partitions independently, with individual consumers within a group splitting the partition load.
Event Capture
Capture automatically archives every event to Azure Blob Storage or Azure Data Lake Storage in Avro format. No code required. You configure it on the Event Hub, and it writes batched files at your chosen time or size interval.
This is useful for:
- Long-term retention beyond the Event Hub’s retention window.
- Feeding batch analytics and data lake pipelines.
- Compliance and audit trail requirements.
- Replaying historical data into new processing systems.
Schema Registry
Event Hubs includes a Schema Registry for managing Avro and JSON schemas. Producers and consumers reference schemas by ID rather than embedding the full schema in every event, which reduces payload size and enforces contract compatibility across teams.
The Schema Registry supports compatibility modes (backward, forward, full) to prevent breaking changes to event schemas.
Tiers
| Tier | Key characteristics |
|---|---|
| Standard | Shared infrastructure, up to 32 partitions, 1-7 day retention, up to 20 consumer groups |
| Premium | Dedicated compute, up to 100 partitions, up to 90 day retention, dynamic partition scaling |
| Dedicated | Single-tenant clusters, highest throughput, up to 2,000 partitions, custom retention |
Standard is fine for most workloads. Premium adds isolation and longer retention. Dedicated is for extreme throughput or compliance requirements.
Kafka Compatibility
Event Hubs exposes a Kafka-compatible endpoint. Existing Kafka producers and consumers can connect by changing the bootstrap server configuration and authentication, with no code changes to the Kafka client logic. This makes Event Hubs a managed alternative to running your own Kafka cluster.
Supported Kafka client versions: 1.0 and later.
Common Use Cases
- IoT telemetry - ingesting millions of device events per second with partition-per-device-type patterns.
- Log aggregation - centralizing application and infrastructure logs from distributed systems.
- Clickstream analytics - capturing user interaction events for real-time and batch analysis.
- Financial transactions - high-throughput transaction streams with strict partition ordering.
- Kafka migration - moving from self-managed Kafka to a managed service without rewriting producers/consumers.
- Real-time dashboards - feeding live metrics and KPIs to monitoring systems.
- Event sourcing - storing domain events as an immutable append-only log.
Event Hubs vs Service Bus
These two services solve different problems. Choosing the wrong one creates friction.
| Event Hubs | Service Bus | |
|---|---|---|
| Mental model | Append-only distributed log | Reliable message broker with delivery guarantees |
| Message lifecycle | Events persist for retention period; consumers read independently | Messages are consumed and removed (or dead-lettered) |
| Ordering | Per-partition ordering | Per-session or FIFO queue ordering |
| Consumer model | Multiple consumer groups each get the full stream | Competing consumers; each message processed once |
| Replay | Yes, any consumer can reread from any offset | No; once consumed, messages are gone |
| Dead-letter queue | No | Yes |
| Best for | High-throughput streaming, telemetry, analytics | Workflow commands, task queues, reliable delivery |
Rule of thumb: If the message means “here is an event that happened,” use Event Hubs. If it means “someone must do this work,” use Service Bus.
In Entra-Adjacent Systems
Event Hubs appears in identity-related architectures when systems need to process high-volume event streams:
- Streaming audit log feeds from Entra ID diagnostic settings.
- Ingesting sign-in and provisioning telemetry into analytics pipelines (often into Azure Data Explorer).
- Capturing large reconciliation result streams for multiple downstream processors.
- Forwarding operational telemetry from identity automation workers.
In these cases, Event Hubs provides the streaming backbone while other services (Functions, Data Explorer, Storage) handle the processing and storage.