Event Stream to Data Explorer

What you will build

A streaming analytics pipeline that accepts high-volume events from multiple producers, ingests them through Event Hubs, and makes them queryable in Azure Data Explorer using KQL. This pattern separates event transport from event analysis.

Scenario

You have multiple systems producing events at high volume: application telemetry, service health signals, audit logs, operational metrics. You need to query this data interactively, find patterns, investigate incidents, and build operational dashboards. The events are not work items that need processing; they are observations that need analysis.

Examples:

Application telemetry from multiple microservices feeding a central analytics cluster
Audit log events from Graph-driven automations, provisioning systems, and worker processes
Service health events across regions and environments for SLA tracking
User activity signals aggregated for usage analytics and anomaly detection

The key lesson

Streaming ingestion and analytics exploration are different jobs. Conflating them leads to systems that are bad at both.

Event Hubs is the streaming layer. Its job is accepting events at scale, partitioning them, retaining them for a configurable window, and letting multiple consumers read independently.
Azure Data Explorer is the analytics layer. Its job is storing the accumulated event data and letting you query it fast with KQL.

Keeping those roles separate prevents your analytics cluster from pretending to be a message broker, and prevents your streaming hub from pretending to be a query engine.

Architecture

flowchart LR
    subgraph producers["Event Producers"]
        app[Application telemetry]
        svc[Service health events]
        audit[Audit logs]
        ops[Operational metrics]
    end

    subgraph transport["Streaming Transport"]
        hubs[Event Hubs]
    end

    subgraph analytics["Analytics Layer"]
        adx[Azure Data Explorer]
        kql[KQL Queries]
        dash[Dashboards]
    end

    app --> hubs
    svc --> hubs
    audit --> hubs
    ops --> hubs
    hubs --> adx
    adx --> kql
    adx --> dash

Each layer has a distinct responsibility:

Event Hubs handles throughput, partitioning, retention, and independent consumer groups. Producers write events and forget them.
Azure Data Explorer and KQL handles ingestion from Event Hubs, storage, indexing, and interactive queries. Operators ask questions here.

What Event Hubs gives you

Event Hubs is a partitioned event log, not a message queue. That distinction matters:

Throughput. It handles millions of events per second per namespace. Producers do not wait for consumers.
Retention. Events stay available for a configurable window (1 to 90 days, or longer with capture). Late consumers can catch up.
Independent consumers. Multiple consumer groups read the same stream without interfering. Your analytics pipeline and your alerting pipeline can consume the same events independently.
Partitioning. Events are distributed across partitions for parallel processing. You control the partition key to keep related events together.

What Event Hubs does not give you: message settlement, retries, dead-lettering, or per-message ownership. It is a stream, not a workflow broker.

What Data Explorer gives you

Data Explorer is an analytics engine optimized for time-series and log-style data:

Fast ingestion. Native integration with Event Hubs means events flow in continuously without custom glue code.
KQL. A query language built for exploration: filtering, aggregating, joining, and visualizing large datasets interactively.
Columnar storage. Efficient compression and indexing for the kinds of queries analysts actually run (time ranges, grouping, percentiles, pattern matching).
Retention policies. Automatic data lifecycle management. Old data ages out without manual cleanup.

Typical queries look like:

Which service had the highest error rate in the last hour?
What is the 95th percentile latency by region over the past week?
Which tenant is generating anomalous event volume?
How did deployment X affect error patterns compared to the previous day?

Trade-offs

This pattern gives you replayable ingestion and strong analytics capability, but it does not give you everything:

No workflow semantics. Event Hubs does not ensure that a specific consumer processed a specific event. If you need guaranteed processing with retries, use Service Bus.
Not a transactional store. Data Explorer is append-optimized. It is not the right place for state you update frequently. Use Cosmos DB for that.
Pipeline complexity. You are running two services (Event Hubs + Data Explorer) instead of one. That is justified at scale but overhead at small volumes.
Ingestion latency. Data Explorer batches ingestion for efficiency. Expect seconds to minutes of latency, not sub-second. For real-time alerting on individual events, consider a separate consumer on Event Hubs.

When not to use this pattern

This pattern is wrong when:

The volume is low. If you are processing dozens or hundreds of events per day, a simple log table in Cosmos DB or even Application Insights is simpler and cheaper. The streaming pipeline adds cost and complexity that small volumes do not justify.
Each event is a work item. If events represent tasks that must be completed, retried, and dead-lettered on failure, you need Service Bus, not a stream.
You need current entity state. If the question is “what is the current status of entity X?” rather than “what happened across all entities?”, you need a state store, not an analytics cluster. See State and Artifacts.
Per-entity workflows. If each event triggers a multi-step process for a specific entity, that is orchestration work, not analytics. Use a worker pattern with Service Bus.