Azure Data Explorer and KQL

Azure Data Explorer (ADX) is a managed big data analytics platform built for near-real-time interactive exploration of large datasets. It excels at log analytics, time-series analysis, and ad hoc investigation over billions of records.

You already use the engine even if you have never provisioned an ADX cluster. The same Kusto engine powers:

  • Azure Monitor Logs (Log Analytics workspaces)
  • Application Insights
  • Microsoft Sentinel (SIEM)
  • Microsoft Defender for Endpoint
  • Microsoft Fabric Real-Time Intelligence

A standalone ADX cluster gives you full control over ingestion, retention, caching, and query performance when those managed surfaces are not enough.

KQL: Kusto Query Language

KQL is the query language for Azure Data Explorer and all services built on the Kusto engine. It uses a pipe-based syntax, where each operator transforms the output of the previous one. It resembles SQL in spirit but is optimized for slicing large event and time-series datasets.

Basic Structure

TableName
| where TimeGenerated > ago(1h)
| where Level == "Error"
| summarize Count = count() by AppName
| order by Count desc

Each line applies a transformation: filter rows, aggregate, sort, project columns. The pipe model makes queries easy to build incrementally during investigation.

Essential Operators

// Filter rows
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != "0"

// Aggregate
Requests
| summarize AvgDuration = avg(DurationMs), Count = count() by bin(TimeGenerated, 5m)

// Join two tables
SecurityEvents
| join kind=inner (UserInfo) on $left.UserId == $right.Id

// Render a time chart
PerformanceCounters
| summarize avg(CounterValue) by bin(TimeGenerated, 1m), CounterName
| render timechart

Time Series Functions

KQL has built-in functions for time-series analysis:

  • make-series: create regular time-series arrays from irregular event data
  • series_decompose: separate trend, seasonal, and residual components
  • series_decompose_anomalies: detect anomalies in time-series data
  • series_periods_detect: find seasonal patterns automatically
  • series_fill_*: interpolate missing values
Requests
| make-series RequestCount = count() on TimeGenerated from ago(7d) to now() step 1h
| extend anomalies = series_decompose_anomalies(RequestCount)
| render anomalychart

Key Concepts

Architecture

graph LR
    subgraph Ingestion
        EH[Event Hubs]
        IOT[IoT Hub]
        BL[Blob Storage]
        SDK[SDKs / REST API]
    end

    subgraph ADX Cluster
        DB1[Database A]
        DB2[Database B]
        DB1 --> T1[Table 1]
        DB1 --> T2[Table 2]
        DB2 --> T3[Table 3]
    end

    subgraph Query Consumers
        DA[Dashboards]
        NB[Notebooks]
        APP[Applications]
        AL[Alerts]
    end

    EH --> DB1
    IOT --> DB1
    BL --> DB2
    SDK --> DB2
    T1 --> DA
    T2 --> NB
    T3 --> APP
    T2 --> AL
  • Cluster: the compute and storage unit. Choose the SKU based on query concurrency and data volume.
  • Database: a logical grouping of tables within a cluster. Permissions are scoped at the database level.
  • Table: where data lives. Schema is defined explicitly or inferred during ingestion.

Ingestion

ADX supports two ingestion modes:

  • Queued ingestion (default): data is batched, optimized, and committed in bulk. Higher throughput, slight delay (seconds to minutes).
  • Streaming ingestion: data is available for query within seconds of arrival. Lower throughput, higher cost per operation.

Supported data formats: JSON, CSV, Avro, Parquet, ORC, TSV, and others. ADX handles schema mapping at ingestion time.

Common ingestion sources: Event Hubs, IoT Hub, Blob Storage (Event Grid triggers), Kafka, Logstash, SDKs (C#, Python, Java, Node.js), and REST API.

Materialized Views

Materialized views store pre-aggregated results of a query and update automatically as new data arrives. They are useful for dashboards and reports that repeatedly compute the same aggregation.

.create materialized-view ErrorSummary on AppEvents {
    AppEvents
    | where Level == "Error"
    | summarize ErrorCount = count() by AppName, bin(TimeGenerated, 1h)
}

Update Policies

Update policies run a query on newly ingested data and store the results in a different table. They work like triggers: transform, filter, or enrich data at ingestion time without a separate pipeline.

Common Use Cases

  • Log analytics: centralize application, infrastructure, and security logs for interactive investigation. Query across terabytes in seconds.
  • IoT telemetry: ingest high-volume sensor data, detect anomalies, and visualize trends with built-in time-series functions.
  • Security analytics (SIEM / threat hunting): correlate events across data sources, hunt for indicators of compromise, build detection rules. This is exactly what Microsoft Sentinel does on top of the Kusto engine.
  • Time-series analysis: financial data, operational metrics, performance counters. Native support for seasonality detection, forecasting, and anomaly identification.
  • Exploratory analytics: investigate unknown datasets interactively. The pipe-based query model makes it easy to refine questions progressively.

When NOT to Use ADX

ADX is built for analytical queries over large event datasets. It is not the right tool for everything:

WorkloadBetter Fit
Star-schema analytics, BI reportingMicrosoft Fabric / Synapse Analytics
Transactional reads/writes (CRUD)Azure SQL, Cosmos DB
Low-latency key-value lookupsCosmos DB, Azure Cache for Redis
Relational data with complex joinsAzure SQL, PostgreSQL
Small datasets (< 1 GB)Simpler tools; ADX overhead is not justified

In Entra-Adjacent Systems

Azure Data Explorer and KQL surface frequently in identity and security operations:

  • Entra sign-in and audit logs: routed to a Log Analytics workspace (Kusto engine), queried with KQL. The same query syntax works whether you use Azure Monitor or a standalone ADX cluster.
  • Microsoft Sentinel: built on ADX. Threat detection rules, hunting queries, and incident investigation all use KQL against ingested identity signals.
  • Conditional Access analysis: KQL queries over sign-in logs help identify policy gaps, risky access patterns, and authentication failures at scale.
  • Provisioning diagnostics: correlating provisioning logs across connectors and tenants benefits from ADX’s ability to join and aggregate large event volumes.

Practical Guidance

Use KQL whenever you interact with Azure Monitor Logs, Sentinel, or Application Insights. The query skills transfer directly to standalone ADX clusters. Start with where and summarize to filter and aggregate, then build toward join, make-series, and render as your investigations grow more complex. If you find yourself writing the same aggregation repeatedly, consider a materialized view. If you need data transformation at ingestion time, use update policies instead of building a separate pipeline.