KQL Series – some DevOps things: Provisioning using Azure CLI

I can’t really write about provisioning anything in Azure without mentioning Azure CLI.
My last two posts were about using

terraform and bicep

Here we will be using the Azure CLI.

Step 1: Install and configure the Azure CLI To use the Azure CLI, you first need to install it on your local machine. The Azure CLI can be installed on Windows, macOS, or Linux, and detailed instructions can be found in the Azure documentation.

Once the Azure CLI is installed, you need to log in using the az login command. This command will prompt you to enter your Azure credentials to authenticate with the Azure portal.

Step 2: Create an Azure Data Explorer cluster To create an ADX cluster, you can use the az kusto cluster create command. Here’s an example command that creates an ADX cluster named “myadxcluster” in the “East US” region with a “D13_v2” SKU and two nodes:

az kusto cluster create --name myadxcluster --location eastus --sku D13_v2 --capacity 2

This command will create an ADX cluster with the specified name, location, SKU, and node capacity. You can customize these settings to fit your needs.

Step 3: Create an Azure Data Explorer database After creating an ADX cluster, you can create a database within the cluster using the az kusto database create command.

Here’s an example command that creates a database named “myadxdatabase” within the “myadxcluster” cluster:

az kusto database create –cluster-name myadxcluster –name myadxdatabase

This command will create a new database with the specified name within the ADX cluster.

Step 4: Configure data ingestion Once you have created a database, you can configure data ingestion using the Azure Data Explorer management portal or the Azure CLI. To use the Azure CLI, you can create a data ingestion rule using the az kusto data-ingestion-rule create command.

Here’s an example command that creates a data ingestion rule for a CSV file:

az kusto data-ingestion-rule create --cluster-name myadxcluster --database-name myadxdatabase --name mydataingestionrule --data-source @'./mydata.csv' --format csv --ignore-first-record --flush-immediately --mapping 'col1:string,col2:int,col3:datetime'

This command will create a data ingestion rule named “mydataingestionrule” for a CSV file named “mydata.csv” within the specified ADX cluster and database. The data ingestion rule specifies the file format, data mapping, and ingestion behavior.

Step 5: Verify your deployment Once you have completed the above steps, you can verify your Azure Data Explorer deployment by running queries and analyzing data in the ADX cluster. You can use tools like Azure Data Studio, which provides a graphical user interface for querying and analyzing data in ADX.

Provisioning Azure Data Explorer using the Azure CLI is a pretty simple and straightforward process.

Yip.

KQL Series – some DevOps things: Provisioning using bicep

So in my last post I wrote about how to provision Azure Data Explorer using terraform .

In this post I will use bicep to provision Azure Data Explorer.

Steps:

Step 1: Set up your environment Before you can begin using Bicep to provision Azure Data Explorer, you need to set up your environment. This involves installing the Azure CLI and Bicep. You’ll also need to create an Azure account and set up authentication.

Step 2: Define your infrastructure as code Once you have your environment set up, you can begin defining your infrastructure as code using Bicep. This involves writing code that defines the resources you want to provision, such as Azure Data Explorer clusters, databases, and data ingestion rules.

Here’s an example of a Bicep file that provisions an Azure Data Explorer cluster and database:

param resourceGroupName string
param location string
param clusterName string
param capacity int

resource cluster 'Microsoft.Kusto/clusters@2022-01-01-preview' = {
  name: clusterName
  location: location
  sku: {
    name: 'D13_v2'
    capacity: capacity
  }
}

resource db 'Microsoft.Kusto/clusters/databases@2022-01-01-preview' = {
  name: 'my-kusto-database'
  parent: cluster
  dependsOn: [cluster]
}

In this code, we declare four parameters: the resource group name, the location, the cluster name, and the cluster capacity. We then define an Azure Data Explorer cluster using the Microsoft.Kusto/clusters resource, specifying the name, location, SKU, and capacity. Finally, we define a database using the Microsoft.Kusto/clusters/databases resource, specifying the name and the parent cluster.

Step 3: Deploy your infrastructure. Now that you have defined your infrastructure as code, you can deploy it using the Azure CLI. First, run the az login command to authenticate with Azure. Then, run the following commands to create a new resource group, build the Bicep file, and deploy the infrastructure:

az group create --name my-resource-group --location westus2
az deployment group create --resource-group my-resource-group --template-file main.bicep --parameters resourceGroupName=my-resource-group location=westus2 clusterName=my-kusto-cluster capacity=2

This will create a new resource group, build the Bicep file, and deploy the Azure Data Explorer cluster and database.

Step 4: Test and monitor your deployment Once your infrastructure is deployed, you should test and monitor it to ensure it is working as expected. This may involve running queries on your Azure Data Explorer cluster, monitoring data ingestion rates, and analyzing performance metrics.

Using Bicep to provision Azure Data Explorer offers many benefits, including faster deployment times, greater reliability, and improved scalability. By automating the provisioning process, you can focus on more important tasks, such as analyzing your data and gaining insights into your business.

Bicep is a powerful tool that can simplify the process of provisioning Azure Data Explorer. By following the steps outlined in this blog post, you can quickly and easily set up a Azure Data Explorer.

Yip.

KQL Series – some DevOps things: Provisioning using terraform

If you’ve read my blogs before or seen me speak you’ll know that I love the DevOps.

Provisioning Azure Data Explorer can be a complex task, involving multiple steps and configurations. However, with the help of DevOps methodologies like infrastructure as code we can spin up an Azure Data Explorer cluster in no time.

I like to use terraform, which is an open-source infrastructure as code tool, which we can automate the entire provisioning process, making it faster, more reliable, and less error-prone.

In this blog post, I discuss how to use Terraform to provision Azure Data Explorer, step-by-step.

Step 1: Set up your environment Before you can begin using Terraform to provision Azure Data Explorer, you need to set up your environment. This involves installing the Azure CLI, Terraform, and other necessary tools. You’ll also need to create an Azure account and set up authentication.

Step 2: Define your infrastructure as code Once you have your environment set up, you can begin defining your infrastructure as code using Terraform. This involves writing code that defines the resources you want to provision, such as Azure Data Explorer clusters, databases, and data ingestion rules.

Step 3: Initialize your Terraform project After defining your infrastructure as code, you need to initialize your Terraform project by running the ‘terraform init’ command. This will download any necessary plugins and modules and prepare your project for deployment.

Step 4: Deploy your infrastructure Now that your Terraform project is initialized, you can deploy your infrastructure by running the ‘terraform apply’ command. This will provision all the resources defined in your code and configure them according to your specifications.

Step 5: Test and monitor your deployment Once your infrastructure is deployed, you should test and monitor it to ensure it is working as expected. This may involve running queries on your Azure Data Explorer cluster, monitoring data ingestion rates, and analyzing performance metrics.

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "example" {
  name     = "my-resource-group"
  location = "West US 2"
}

resource "azurerm_kusto_cluster" "example" {
  name                = "my-kusto-cluster"
  location            = azurerm_resource_group.example.location
  resource_group_name = azurerm_resource_group.example.name
  sku                 = "D13_v2"
  capacity            = 2
}

resource "azurerm_kusto_database" "example" {
  name                = "my-kusto-database"
  resource_group_name = azurerm_resource_group.example.name
  cluster_name        = azurerm_kusto_cluster.example.name
}

In this code, we first declare the Azure provider and define a resource group. We then define an Azure Data Explorer cluster using the azurerm_kusto_cluster resource, specifying the name, location, resource group, SKU, and capacity. Finally, we define a database using the azurerm_kusto_database resource, specifying the name, resource group, and the name of the cluster it belongs to.

Once you have this code in a .tf file, you can use the Terraform CLI to initialize your project, authenticate with Azure, and deploy your infrastructure

Using Terraform to provision Azure Data Explorer offers many benefits, including faster deployment times, greater reliability, and improved scalability. By automating the provisioning process, you can focus on more important tasks, such as analyzing your data and gaining insights into your business.

Terraform is a powerful tool that can simplify the process of provisioning Azure Data Explorer. By following the steps outlined in this blog post, you can quickly and easily set up a scalable and efficient data analytics solution that can handle even the largest data volumes.

Yip.

KQL Series – ingesting data into Azure Data Explorer from IoT Hub

This blog post is rare because I am dedicating it to my good mate Bryn Lewis who spoke recently at a community event I ran (the first free community event in Christchurch since June 2019 – the #makeStuffGo conference
https://makestuffgo.moosh.co.nz/

Bryn did a session on IoT and it got me thinking – how could I ingest data into Azure Data Explorer from IoT Hub…?

So first of all what is this IoT thing:

The Internet of Things (IoT) has revolutionized the way we interact with the world around us. From smart homes to industrial automation, IoT devices generate an enormous amount of data. To make sense of this data, we need powerful tools for analysis and visualization. Azure Data Explorer is a cloud-based analytics service that can help you store and analyze large volumes of diverse data in real-time. In this blog post, we’ll explore how to ingest data from IoT Hub into Azure Data Explorer.

What is IoT Hub?

Azure IoT Hub is a cloud-based service that enables you to connect, monitor, and manage IoT devices. IoT Hub provides a secure and scalable platform for IoT device management and data collection. It can handle millions of connected devices and billions of messages per day. IoT Hub supports a range of protocols, including MQTT, HTTPS, and AMQP, allowing you to connect a wide variety of devices.

Why Ingest Data from IoT Hub into Azure Data Explorer?

IoT Hub can collect and store data from millions of IoT devices, but analyzing this data in real-time can be challenging. Azure Data Explorer provides a powerful platform for real-time data analysis, allowing you to gain insights into your IoT data quickly. By ingesting data from IoT Hub into Azure Data Explorer, you can:

  • Analyze data in real-time: Azure Data Explorer can process data in real-time, allowing you to gain insights into your IoT data as it is generated.
  • Store data at scale: Azure Data Explorer is a cloud-based service that can store and analyze large volumes of data. You can store data from IoT Hub in Azure Data Explorer and analyze it at any scale.
  • Simplify data analysis: Azure Data Explorer provides a range of powerful analytical tools that can help you gain insights into your IoT data quickly. You can use these tools to identify patterns and anomalies in your data, detect trends over time, and more.

To ingest data from IoT Hub into Azure Data Explorer, we can follow these steps:

  1. Create an IoT Hub: We can create an IoT Hub in the Azure portal or using Azure CLI. Once we’ve created an IoT Hub, we can connect your IoT devices to it.
  2. Create an Event Hub-compatible endpoint: Azure Data Explorer can ingest data from Event Hub-compatible endpoints, including IoT Hub. To ingest data from IoT Hub, you need to create an Event Hub-compatible endpoint in IoT Hub.
  3. Configure IoT devices to send data to IoT Hub: You need to configure your IoT devices to send data to IoT Hub. You can use any of the protocols supported by IoT Hub, including MQTT, HTTPS, and AMQP.
  4. Create an Azure Data Explorer cluster: You can create an Azure Data Explorer cluster in the Azure portal or using Azure CLI.
  5. Create a database and table in Azure Data Explorer: You can create a database and table in Azure Data Explorer using Azure Data Explorer Web UI, Azure CLI, or Azure PowerShell. The table should have the same structure as the data you want to ingest from IoT Hub.
  6. Create an Event Hub-compatible endpoint in Azure Data Explorer: You can create an Event Hub-compatible endpoint in Azure Data Explorer using Azure Data Explorer Web UI or Azure PowerShell.
  7. Create an Event Hub-compatible consumer group: You can create an Event Hub-compatible consumer group in Azure Data Explorer using Azure Data Explorer Web UI or Azure PowerShell.
  8. Create an Event Hub-compatible source mapping: You can create an Event Hub-compatible source mapping in Azure Data Explorer using Azure Data Explorer Web UI or Azure PowerShell. The mapping should map the data from IoT Hub to the Azure Data Explorer table.
  9. Start the ingestion process: You can start the ingestion process using Azure Data Explorer Web UI or Azure PowerShell. Once the ingestion process has started, Azure Data Explorer will automatically ingest data from IoT Hub and store it in the specified table.

Here is an example PowerShell script to create an Event Hub-compatible endpoint, consumer group, and source mapping in Azure Data Explorer:

# Set variables
$resourceGroupName = "MyResourceGroup"
$dataExplorerClusterName = "MyDataExplorerCluster"
$databaseName = "MyDatabase"
$tableName = "MyTable"
$eventHubEndpointName = "MyEventHubEndpoint"
$consumerGroupName = "MyConsumerGroup"
$eventHubConnectionString = "Endpoint=sb://myeventhub.servicebus.windows.net/;SharedAccessKeyName=mykeyname;SharedAccessKey=mykey;EntityPath=myeventhub"
$sourceMappingName = "MySourceMapping"

# Create Event Hub-compatible endpoint
New-AzKustoEventHubDataConnection `
  -ResourceGroupName $resourceGroupName `
  -ClusterName $dataExplorerClusterName `
  -DatabaseName $databaseName `
  -TableName $tableName `
  -EventHubConnection $eventHubConnectionString `
  -EventHubName $eventHubEndpointName

# Create Event Hub-compatible consumer group
New-AzKustoEventHubConnection `
  -ResourceGroupName $resourceGroupName `
  -ClusterName $dataExplorerClusterName `
  -DatabaseName $databaseName `
  -TableName $tableName `
  -ConsumerGroupName $consumerGroupName

# Create Event Hub-compatible source mapping
New-AzKustoEventHubMapping `
  -ResourceGroupName $resourceGroupName `
  -ClusterName $dataExplorerClusterName `
  -DatabaseName $databaseName `
  -TableName $tableName `
  -EventHubConnection $eventHubConnectionString `
  -EventHubName $eventHubEndpointName `
  -ConsumerGroupName $consumerGroupName `
  -MappingName $sourceMappingName

This script uses the New-AzKustoEventHubDataConnection, New-AzKustoEventHubConnection, and New-AzKustoEventHubMapping cmdlets to create an Event Hub-compatible endpoint, consumer group, and source mapping in Azure Data Explorer.

Make sure to replace the variable values with your own values before running the script. You will need to provide the names of your resource group, Azure Data Explorer cluster, database, table, and Event Hub-compatible endpoint, consumer group, and source mapping. Additionally, you will need to provide the Event Hub connection string.

You can learn a whole heap more here:

https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-iot-hub

KQL Series – ingesting data via Event Grid into Azure Data Explorer

This blog post is about using event grid to ingest data into Azure Data Explorer and was a method I had to use with a client.

It was awesome because it forced me to write some C# code for an Azure function – so be nice and don’t judge the code. Your code is always better than mine….

To ingest data from Event Grid into Azure Data Explorer, we can follow these steps:

  1. Create an Event Grid subscription: You can create an Event Grid subscription to subscribe to the events we want to ingest into Azure Data Explorer. When an event is published to the Event Grid topic, Event Grid sends a notification to our subscription.
  2. Create an Azure Function: We can create an Azure Function that triggers when an event is published to the Event Grid topic. The function will receive the event data as input.
  3. Prepare the event data for ingestion: In the Azure Function, we can prepare the event data for ingestion into Azure Data Explorer. This may include parsing the event data and transforming it into a format that can be ingested by Azure Data Explorer.
  4. Ingest the event data into Azure Data Explorer: Using the Azure Data Explorer .NET SDK, we can ingest the event data into Azure Data Explorer. We can use the SDK to create a table and ingest the data into that table.

Here’s an example Azure Function that ingests Event Grid data into Azure Data Explorer:

using System;
using System.IO;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Azure.EventGrid.Models;
using Microsoft.Azure.WebJobs;
using Microsoft.Extensions.Logging;
using Kusto.Data.Common;
using Kusto.Data.Net.Client;

public static void Run(EventGridEvent eventGridEvent, ILogger log)
{
    // Get the data from the event
    var eventData = (dynamic)eventGridEvent.Data;

    // Prepare the data for ingestion into Azure Data Explorer
    var timestamp = DateTime.UtcNow;
    var value = eventData.Value;

    // Ingest the data into Azure Data Explorer
    var kustoConnectionStringBuilder = new KustoConnectionStringBuilder("https://<clustername>.<region>.kusto.windows.net", "<databasename>")
        .WithAadApplicationKeyAuthentication("<applicationId>", "<applicationKey>", "<tenantId>");
    using (var kustoClient = KustoClientFactory.CreateCslQueryProvider(kustoConnectionStringBuilder))
    {
        var command = $"'.create table <tablename> (Timestamp:datetime, Value:real)'\n"
            + $"'| .set-or-replace <tablename> <| {timestamp:s} , {value} |>'";
        await kustoClient.ExecuteControlCommandAsync(command);
    }

    log.LogInformation($"Data ingested into Azure Data Explorer: {timestamp} {value}");
}

Replace <clustername>, <region>, <databasename>, <applicationId>, <applicationKey>, <tenantId>, and <tablename> with your own values.

In my example I am using an Azure AD application key for authentication. Alternatively, you could use Azure AD user authentication or a managed identity for authentication.

You can find out more information here:

https://learn.microsoft.com/en-us/azure/data-explorer/ingest-data-event-grid?tabs=adx

Yip.

KQL Series – ingesting data into Azure Data Explorer using csv files

In my previous blog post KQL Series – ingesting data using Azure Stream Analytics I talked about one of the more easier ways to ingest data – which is via csv.

Here is how you can:

  1. Create a table in Azure Data Explorer: You can use the Azure Data Explorer Web UI, Azure CLI, or Azure PowerShell to create a table. The table should have the same structure as the CSV file you want to ingest.
  2. Upload the CSV file to Azure Blob Storage: You can use the Azure portal or Azure Storage Explorer to upload the CSV file to Azure Blob Storage.
  3. Create a data connection in Azure Data Explorer: You can use the Azure Data Explorer Web UI or Azure PowerShell to create a data connection to the Azure Blob Storage container where the CSV file is located.
  4. Define an ingestion mapping: You can use the Azure Data Explorer Web UI or Azure PowerShell to define an ingestion mapping that maps the columns in the CSV file to the columns in the Azure Data Explorer table.
  5. Ingest the data: You can use the Azure Data Explorer Web UI or Azure PowerShell to start the ingestion process. The ingestion process will read the CSV file from Azure Blob Storage, apply the ingestion mapping, and write the data to the Azure Data Explorer table.

Below is an example PowerShell script that ingests data from a CSV file into Azure Data Explorer:

$clusterUrl = “https://..kusto.windows.net”

$databaseName = “”

$tableName = “”

$csvPath = “https://.blob.core.windows.net//.csv&#8221;

$mapping = @’

[

{

“column”: “Timestamp”,

“datatype”: “datetime”,

“format”: “yyyy-MM-dd HH:mm:ss”

},

{

“column”: “Value”,

“datatype”: “real”

}

]

‘@

Connect-AzKusto -Cluster $clusterUrl -Database $databaseName

New-AzKustoTableMapping -TableName $tableName -MappingJson $mapping

Start-AzKustoIngestion -TableName $tableName -MappingName $tableName -DataFormat Csv -DataCsvUrl $csvPath

Replace <clustername>, <region>, <databasename>, <tablename>, <storageaccountname>, <containername>, and <filename> with your own values.

Pretty easy right?
Find more details here:

https://learn.microsoft.com/bs-latn-ba/azure/data-explorer/ingest-sample-data?tabs=ingestion-wizard

#Yip.

KQL Series – ingesting data using Azure Stream Analytics

This blog post is about another method of ingesting data into Azure Data Explorer.

Azure Stream Analytics is a cloud-based stream processing service that allows us to ingest and process real-time data from various sources. We can use Azure Stream Analytics to ingest data into Azure Data Explorer in real-time. Here’s how:

  1. Create an Azure Data Explorer Table: The first step is to create a table in Azure Data Explorer that will receive the real-time data.
  2. Create an Azure Stream Analytics Job: The next step is to create an Azure Stream Analytics job that will ingest the data and send it to the Azure Data Explorer table. You will need to specify the input source of the real-time data and the output destination of the data in Azure Data Explorer.
  3. Define a Query: In the Azure Stream Analytics job, you will need to define a query that transforms the real-time data and sends it to Azure Data Explorer.
  4. Start the Azure Stream Analytics Job: Once you have defined the query, you can start the Azure Stream Analytics job. The job will ingest the real-time data and send it to Azure Data Explorer.

Azure Stream Analytics provides us a very user-friendly interface that allows us to monitor the job and troubleshoot any issues.

This has to be one of the easiest ways (outside of ingesting csv) to get data into Azure Data Explorer for us to play around with.

#Yip.

KQL Series – ingesting data with Azure Data Factory

Background:

Azure Data Explorer is a powerful analytics service that allows us to quickly ingest, store, and analyze large volumes of data from various sources. Azure Data Factory is a cloud-based data integration service that allows us to create data pipelines that can move and transform data from various sources to various destinations.

This blog post is about how integrating Azure Data Explorer with Azure Data Factory, we can easily ingest and process data from various sources into Azure Data Explorer.

Here’s how we can integrate Azure Data Explorer with Azure Data Factory:

  1. Create a Data Factory: The first step is to create an Azure Data Factory in your Azure subscription.
  2. Create a Linked Service: The next step is to create a Linked Service that connects to your Azure Data Explorer instance. This Linked Service will contain the connection details for your Azure Data Explorer instance.
  3. Create a Dataset: Once you have created the Linked Service, you need to create a Dataset that specifies the location and format of your source data.
  4. Create a Pipeline: The final step is to create a Pipeline that specifies the flow of data from the source to the Azure Data Explorer instance. The Pipeline contains the activities that will transform and move the data.
  5. Add the Azure Data Explorer Sink: Within the Pipeline, you need to add an Azure Data Explorer Sink that will specify the destination of the data in Azure Data Explorer.
  6. Configure the Sink: You will need to configure the Azure Data Explorer Sink with the table name, database name, and cluster name.
  7. Run the Pipeline: Once you have configured the Pipeline, you can execute it to start ingesting data into Azure Data Explorer. Azure Data Factory provides a visual interface that allows you to monitor the progress of your Pipeline and troubleshoot any issues.

By integrating Azure Data Explorer with Azure Data Factory, we have easily ingested and processed data from various sources into Azure Data Explorer. This integration allows us to build scalable and flexible data integration pipelines that can handle a wide variety of data sources and destinations.

#Yip.

KQL Series – high level of data ingestion- setup

In this blog post let us stop for a second and see where we are in this whole create an Azure Data Explorer cluster and ingest data.
High level summary below:

What is Azure Data Explorer:

It is a fast and scalable data analytics service that can be used to ingest, store, and analyze large volumes of data from various sources. Here are the steps to ingest data using Azure Data Explorer:

  1. Create a database and a table: The first step is to create a database and a table in Azure Data Explorer where the data will be stored. You can create a database and a table using Azure Portal, Azure PowerShell, or Azure CLI.
  2. Prepare the data for ingestion: Before ingesting the data into Azure Data Explorer, you need to prepare the data. This includes cleaning and formatting the data in a way that is compatible with Azure Data Explorer.
  3. Choose a data ingestion method: Azure Data Explorer supports several data ingestion methods, including Azure Data Factory, Azure Stream Analytics, Event Hubs, and more. Choose the method that best suits your needs.
  4. Ingest the data: Once you have chosen the data ingestion method, you can start ingesting the data into Azure Data Explorer. The data will be automatically indexed and stored in the table you created in step 1.
  5. Verify the data ingestion: After the data is ingested, you should verify that it was successfully ingested and is available for analysis. You can use Kusto Query Language (KQL) to query the data and perform analytics.

In summary, to ingest data using Azure Data Explorer, we need to create a database and a table, prepare the data, choose a data ingestion method, ingest the data, and verify the data ingestion.

KQL Series – overview of ingesting data into our ADX cluster

In the previous blog post we created a database in our Azure Data Explorer (ADX) cluster.

In this blog post we will discuss how we can ingest data into that database as part of this process:

There are several ways to ingest data – we will be using the cluster we have built – but I will cover a FREE method where you can use Azure Data Explorer and familiarise yourself with it – where you only need a Microsoft account or an Azure Active Directory user ID – no Azure subscription or credit card is needed. More on that later.

Data ingestion is the process used to load data records from one or more sources into a table in Azure Data Explorer. Once ingested, the data becomes available for query.

The diagram below shows the end-to-end flow for working in Azure Data Explorer and shows different ingestion methods:

The Azure Data Explorer data management service, which manages the data ingestion, implements the following process:

Azure Data Explorer pulls data from our declared external source and reads requests from a pending Azure queue.

Data is batched or streamed to the Data Manager.

Batch data flowing to the same database and table is optimised for fast and efficient ingestion.

Azure Data Explorer will validate initial data and will convert the format of that data if required.

Further data manipulation includes matching schema, organising, indexing, encoding and data compression.

Data is persisted in storage according to the set retention policy.

The Data Manager then commits the data ingest into the engine, where can now query it.

Supported data formats, properties, and permissions

  • Supported data formats: The data formats that Azure Data Explorer can understand and ingest natively (for example Parquet, JSON)
  • Ingestion properties: The properties that affect how the data will be ingested (for example, tagging, mapping, creation time).
  • Permissions: To ingest data, the process requires database ingestor level permissions. Other actions, such as query, may require database admin, database user, or table admin permissions.

We have 2 modes of ingestion:

BATCH INGESTION:

This is where we batch up our data and we optimise it for high throughput. Out of the two methods this one is the faster method and typical for what you will do for data ingestion.We set our ingestion properties for how our data is batched and then small batches of data are merged and optimised for fast query results.

By default, the maximum batching value is 5 minutes, 1000 items, or a total size of 1 GB. The data size limit for a batch ingestion command is 6 GB.

More details can be found here: Ingestion Batching Policy

STREAMING INGESTION:

This is where our data ingestion is from a streaming source and is ongoing. This allows us near real-time latency for any small sets of data that we in our table(s). Data is initially ingested to row store and then moved to column store extents.
You can also ingest streaming data using data pipelines or one of the Azure Data Explorer client libraries:

https://learn.microsoft.com/en-us/azure/data-explorer/kusto/api/client-libraries

For a list of data connectors, see Data connectors overview.

Architecture of Azure Data Explorer ingestion:

Using managed pipelines for ingestion:

There are a number of pipelines that we can use within Azure for data ingestion:

Using connectors and plugins for ingesting data:

Using SDKs to programmatically ingest data:

We have a number of SDKs that we can use for both query and data ingestion.

You can check out these SDK and open source projects:

What we will look at next is the tools that we can use to ingest our data.