Azure Data Explorer and KQL
Azure Data Explorer (ADX) is a managed big data analytics platform built for near-real-time interactive exploration of large datasets. It excels at log analytics, time-series analysis, and ad hoc investigation over billions of records.
You already use the engine even if you have never provisioned an ADX cluster. The same Kusto engine powers:
- Azure Monitor Logs (Log Analytics workspaces)
- Application Insights
- Microsoft Sentinel (SIEM)
- Microsoft Defender for Endpoint
- Microsoft Fabric Real-Time Intelligence
A standalone ADX cluster gives you full control over ingestion, retention, caching, and query performance when those managed surfaces are not enough.
KQL: Kusto Query Language
KQL is the query language for Azure Data Explorer and all services built on the Kusto engine. It uses a pipe-based syntax, where each operator transforms the output of the previous one. It resembles SQL in spirit but is optimized for slicing large event and time-series datasets.
Basic Structure
TableName
| where TimeGenerated > ago(1h)
| where Level == "Error"
| summarize Count = count() by AppName
| order by Count desc
Each line applies a transformation: filter rows, aggregate, sort, project columns. The pipe model makes queries easy to build incrementally during investigation.
Essential Operators
// Filter rows
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType != "0"
// Aggregate
Requests
| summarize AvgDuration = avg(DurationMs), Count = count() by bin(TimeGenerated, 5m)
// Join two tables
SecurityEvents
| join kind=inner (UserInfo) on $left.UserId == $right.Id
// Render a time chart
PerformanceCounters
| summarize avg(CounterValue) by bin(TimeGenerated, 1m), CounterName
| render timechart
Time Series Functions
KQL has built-in functions for time-series analysis:
make-series: create regular time-series arrays from irregular event dataseries_decompose: separate trend, seasonal, and residual componentsseries_decompose_anomalies: detect anomalies in time-series dataseries_periods_detect: find seasonal patterns automaticallyseries_fill_*: interpolate missing values
Requests
| make-series RequestCount = count() on TimeGenerated from ago(7d) to now() step 1h
| extend anomalies = series_decompose_anomalies(RequestCount)
| render anomalychart
Key Concepts
Architecture
graph LR
subgraph Ingestion
EH[Event Hubs]
IOT[IoT Hub]
BL[Blob Storage]
SDK[SDKs / REST API]
end
subgraph ADX Cluster
DB1[Database A]
DB2[Database B]
DB1 --> T1[Table 1]
DB1 --> T2[Table 2]
DB2 --> T3[Table 3]
end
subgraph Query Consumers
DA[Dashboards]
NB[Notebooks]
APP[Applications]
AL[Alerts]
end
EH --> DB1
IOT --> DB1
BL --> DB2
SDK --> DB2
T1 --> DA
T2 --> NB
T3 --> APP
T2 --> AL
- Cluster: the compute and storage unit. Choose the SKU based on query concurrency and data volume.
- Database: a logical grouping of tables within a cluster. Permissions are scoped at the database level.
- Table: where data lives. Schema is defined explicitly or inferred during ingestion.
Ingestion
ADX supports two ingestion modes:
- Queued ingestion (default): data is batched, optimized, and committed in bulk. Higher throughput, slight delay (seconds to minutes).
- Streaming ingestion: data is available for query within seconds of arrival. Lower throughput, higher cost per operation.
Supported data formats: JSON, CSV, Avro, Parquet, ORC, TSV, and others. ADX handles schema mapping at ingestion time.
Common ingestion sources: Event Hubs, IoT Hub, Blob Storage (Event Grid triggers), Kafka, Logstash, SDKs (C#, Python, Java, Node.js), and REST API.
Materialized Views
Materialized views store pre-aggregated results of a query and update automatically as new data arrives. They are useful for dashboards and reports that repeatedly compute the same aggregation.
.create materialized-view ErrorSummary on AppEvents {
AppEvents
| where Level == "Error"
| summarize ErrorCount = count() by AppName, bin(TimeGenerated, 1h)
}
Update Policies
Update policies run a query on newly ingested data and store the results in a different table. They work like triggers: transform, filter, or enrich data at ingestion time without a separate pipeline.
Common Use Cases
- Log analytics: centralize application, infrastructure, and security logs for interactive investigation. Query across terabytes in seconds.
- IoT telemetry: ingest high-volume sensor data, detect anomalies, and visualize trends with built-in time-series functions.
- Security analytics (SIEM / threat hunting): correlate events across data sources, hunt for indicators of compromise, build detection rules. This is exactly what Microsoft Sentinel does on top of the Kusto engine.
- Time-series analysis: financial data, operational metrics, performance counters. Native support for seasonality detection, forecasting, and anomaly identification.
- Exploratory analytics: investigate unknown datasets interactively. The pipe-based query model makes it easy to refine questions progressively.
When NOT to Use ADX
ADX is built for analytical queries over large event datasets. It is not the right tool for everything:
| Workload | Better Fit |
|---|---|
| Star-schema analytics, BI reporting | Microsoft Fabric / Synapse Analytics |
| Transactional reads/writes (CRUD) | Azure SQL, Cosmos DB |
| Low-latency key-value lookups | Cosmos DB, Azure Cache for Redis |
| Relational data with complex joins | Azure SQL, PostgreSQL |
| Small datasets (< 1 GB) | Simpler tools; ADX overhead is not justified |
In Entra-Adjacent Systems
Azure Data Explorer and KQL surface frequently in identity and security operations:
- Entra sign-in and audit logs: routed to a Log Analytics workspace (Kusto engine), queried with KQL. The same query syntax works whether you use Azure Monitor or a standalone ADX cluster.
- Microsoft Sentinel: built on ADX. Threat detection rules, hunting queries, and incident investigation all use KQL against ingested identity signals.
- Conditional Access analysis: KQL queries over sign-in logs help identify policy gaps, risky access patterns, and authentication failures at scale.
- Provisioning diagnostics: correlating provisioning logs across connectors and tenants benefits from ADX’s ability to join and aggregate large event volumes.
Practical Guidance
Use KQL whenever you interact with Azure Monitor Logs, Sentinel, or Application Insights. The query skills transfer directly to standalone ADX clusters. Start with where and summarize to filter and aggregate, then build toward join, make-series, and render as your investigations grow more complex. If you find yourself writing the same aggregation repeatedly, consider a materialized view. If you need data transformation at ingestion time, use update policies instead of building a separate pipeline.