Cyber Security

Leveraging LLM for Efficient Log Query Writing in Cybersecurity - with Prompt Examples

Table of Contents

Join thousands of professionals and get the latest insight on Compliance & Cybersecurity.

Your Name*

Your Email Address*

I accept Cyber Sierra's terms and conditions*

In the high-stakes world of cybersecurity, efficient log management is not just a convenience—it's a critical component of threat detection and incident response. Security analysts spend countless hours sifting through logs, writing complex queries to extract meaningful information from vast oceans of data. Yet for many professionals, these queries are a source of significant frustration.

As one security analyst put it: "Writing log queries sucks, being able to do a plain English ask instead of writing some proprietary or complex logic statement would make that whole aspect significantly quicker and easier."

This is where Large Language Models (LLMs) are changing the game. By transforming natural language requests into sophisticated log queries, these AI tools are streamlining workflows and saving security teams valuable time in their battle against cyber threats.

The Pain Points of Traditional Log Query Writing

If you've worked in cybersecurity, you're likely familiar with the challenges of traditional log query approaches:

1. Complexity of Query Languages

Each log analysis platform comes with its own query language—whether it's Kusto Query Language (KQL) for Azure, Elasticsearch Query DSL, or Splunk's Search Processing Language. Mastering these specialized languages requires significant time and mental bandwidth that could be better spent on actual security analysis.

2. Time-Consuming Process

In the midst of incident response, every minute counts. The manual process of crafting precise queries can be painstakingly slow, delaying critical security actions when they're needed most.

3. Skill Barriers

Not every security professional is equally proficient in query languages. This creates knowledge silos and bottlenecks where less experienced team members must wait for query experts to help them extract needed information.

4. Configuration and Monitoring Gaps

As noted in Dynatrace's analysis, manual tagging and configuration often lead to monitoring blind spots, making it challenging to identify and correlate relevant logs when needed.

The frustration is real and widespread. One cybersecurity professional on Reddit expressed: "When you finally get a working query it feels amazing, other times it's frustrating as hell. I think there's also a limit to the number of results your query can show, so also a little limited IMO."

How LLMs Transform Log Query Writing

Large Language Models offer a solution to these challenges by serving as an intelligent intermediary between human analysts and complex query languages. Security professionals can articulate their needs in plain English, and the LLM translates this into the appropriate query syntax.

As one cybersecurity practitioner noted: "When I need to script something, using LLMs saves me a ridiculous amount of time. I code fairly well so I can sanity check the output and modify when required."

Let's explore how to leverage this powerful capability effectively.

Step-by-Step Guide to Using LLMs for Log Query Generation

1. Understand Your Log Structure

Before you can effectively use an LLM to generate queries, you need a clear understanding of your log environment:

Identify key log sources: EDR solutions, network devices, authentication systems, AppSec tools
Know your schema: Understand the fields available in your logs (timestamps, user IDs, IP addresses, event types)
Understand your query language: Whether it's KQL, Elasticsearch, Splunk SPL, or another language

This foundational knowledge enables you to provide the LLM with the context it needs to generate accurate queries.

2. Define Your Query Objectives

Be specific about what you're trying to accomplish:

Are you conducting threat hunting across multiple data sources?
Investigating a potential security incident?
Running compliance checks for PII exposure?
Performing malware analysis or script deobfuscation?

Clear objectives lead to more precise queries.

3. Construct an Effective Prompt

This is where the magic happens. Your prompt should include:

The query language being used
Log structure information
The specific question you're trying to answer

Sample Prompt Template:

I need to write a [QUERY LANGUAGE] query for our [LOG SOURCE/PLATFORM].
Our log schema includes the following fields: [LIST KEY FIELDS AND DATA TYPES].
I want to query for [SPECIFIC SECURITY SCENARIO].
Please generate a query that will [EXPECTED OUTCOME].

Example Prompt:

I need to write a KQL query for our Azure Sentinel instance.
Our SecurityEvent table includes fields like EventID (integer), Account (string),
Computer (string), ActivityType (string), and TimeGenerated (datetime).
I want to query for potential brute force attacks.
Please generate a query that will identify accounts with more than 5 failed
login attempts within a 10-minute window in the last 24 hours.

4. Utilize the LLM

Input your crafted prompt into an LLM like ChatGPT, Claude, or a similar tool. The model will analyze your request and generate a corresponding query based on the information provided.

For the example above, the LLM might generate something like:

SecurityEvent
| where TimeGenerated > ago(24h)
| where EventID == 4625 // Failed login attempt
| summarize FailedAttempts = count() by Account, bin(TimeGenerated, 10m)
| where FailedAttempts > 5
| order by TimeGenerated asc

5. Review and Test the Generated Query

Never blindly trust the output from an LLM. As one security professional cautioned: "LLMs still hallucinate and struggle with complex security logic. They're solid for automating the boring stuff like compliance checks and report summaries, but I wouldn't trust them with critical security decisions."

Always:

Validate table and field names
Check the logic for accuracy
Test the query with a limited dataset first
Ensure it meets performance expectations

6. Refine Your Prompt Based on Results

If the generated query doesn't meet your needs, iterate on your prompt by:

Providing more context about your log structure
Clarifying your objective
Including sample log entries
Specifying performance requirements

Remember that prompt engineering is an iterative process that improves with practice.

Best Practices for Ensuring Query Accuracy

Provide Comprehensive Context

The quality of your LLM-generated query directly correlates with the context you provide. Include:

Database or Log Platform Details: Specify whether you're querying Azure Log Analytics, Elasticsearch, Splunk, etc.
Schema Information: Provide table names, field names, and data types
Sample Data: When possible, include a few anonymized log entries to help the LLM understand data formats
Time Range Requirements: Clarify if you need data from a specific timeframe
Performance Considerations: Mention if query efficiency is critical due to large data volumes

Implement Human-in-the-Loop Verification

While LLMs can dramatically accelerate query writing, human oversight remains essential:

Syntax Verification: Check for proper syntax before executing queries
Logic Validation: Ensure the query logic aligns with your security objectives
Efficiency Review: Look for opportunities to optimize resource-intensive operations
Security Implications: Consider whether the query might expose sensitive data

As one Reddit user noted: "I code fairly well so I can sanity check the output and modify when required." This approach balances efficiency with security.

Iterate and Learn

Developing expertise in LLM-assisted query writing is an iterative process:

Save Successful Prompts: Create a library of effective prompts for common query types
Document Patterns: Note which prompt structures yield the best results
Share Knowledge: Exchange effective prompts with team members to build collective expertise

Address Common Pitfalls

Be aware of these common challenges when using LLMs for log queries:

Hallucinations: LLMs may occasionally invent field names or functions that don't exist in your environment
Outdated Knowledge: Models may not be familiar with the latest versions of query languages
Complex Join Operations: Multi-table joins and complex correlations may require additional guidance
Performance Optimization: LLMs may prioritize readability over query performance

Sample Prompts for Different Security Scenarios

1. Basic Threat Hunting

I need a KQL query for Microsoft Sentinel to identify unusual PowerShell command executions.
Our logs are in the SecurityEvent table with fields including EventID (int),
Account (string), CommandLine (string), and TimeGenerated (datetime).
I want to find PowerShell commands that use encoding or obfuscation techniques
commonly associated with malicious activity.

2. Compliance Monitoring for PII Exposure

Generate a query for Elasticsearch that will help me identify potential PII exposure
in our application logs. Our logs are stored in the 'app-logs' index with fields
including 'message' (text), 'level' (keyword), 'timestamp' (date), and 'service' (keyword).
I need to find instances where credit card numbers, social security numbers,
or email addresses might appear in log messages.

3. Malware Analysis

I need a Splunk query to identify potential lateral movement after a malware infection.
Our logs are in the 'windows_events' index with fields including 'EventCode',
'SourceIP', 'DestinationIP', 'User', and '_time'.
The infected host was 192.168.1.15, and I want to see all network connections
initiated from this host to other internal systems over the past 48 hours.

4. User Behavior Analytics

Create a query for Azure Log Analytics using KQL to establish a baseline of normal
authentication patterns for our executive users. We have the SigninLogs table with
fields including UserPrincipalName, IPAddress, Location, ClientAppUsed, and TimeGenerated.
I want to identify authentication attempts that deviate from each user's normal
patterns based on time of day, location, and device used.

Providing Log Structure Context to LLMs

One of the most critical aspects of generating accurate queries is helping the LLM understand your log structure. Here are effective ways to communicate this information:

1. Table/Index Schema Description

I'm working with our Azure Log Analytics workspace. The SecurityEvent table
has the following schema:
- TimeGenerated (datetime): When the event occurred
- EventID (int): Windows Security Event ID
- Account (string): User account involved
- Computer (string): Host where the event occurred
- Activity (string): Description of the activity
- SourceIP (string): Source IP address if available

2. Sample Log Entries

Here are two sample log entries from our Apache access logs:
192.168.1.105 - - [10/Oct/2023:13:55:36 -0700] "GET /login.php HTTP/1.1" 200 2326
192.168.1.210 - - [10/Oct/2023:13:57:12 -0700] "POST /login.php HTTP/1.1" 401 1842

I need a query that will find all failed login attempts (HTTP status 401)
from the same IP address occurring more than 10 times in a 5-minute window.

3. Query Language Specifications

I'm using Elasticsearch Query DSL for this query. Please include proper
syntax for date ranges, aggregations, and filters. I specifically need to
use the 'terms' aggregation to group results.

4. Previous Query Examples

Here's a similar query I've used before:
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == "0"
| summarize count() by UserPrincipalName, bin(TimeGenerated, 1h)

But now I need to modify it to also include the device information and location.

Real-World Examples: From Natural Language to Query

Let's walk through complete examples showing how an LLM transforms a natural language request into a functional query.

Example 1: Detecting Unusual Admin Activity

Prompt:

I need a KQL query for our Azure Sentinel environment to identify unusual administrative
actions. We use the Azure Activity log with fields like Caller, OperationName, ResourceType,
ResultType, and TimeGenerated. I want to find any operations performed by admin users
(role contains 'Admin') on critical resources (ResourceType contains 'Microsoft.Compute/virtualMachines'
or 'Microsoft.Network/networkSecurityGroups') outside of normal business hours (8am-6pm local time)
in the past week.

LLM-Generated Query:

AzureActivity
| where TimeGenerated > ago(7d)
| where isnotempty(Caller)
| where Caller has "Admin" or ResourceProviderValue has "Admin"
| where ResourceProvider contains "Microsoft.Compute/virtualMachines" or
        ResourceProvider contains "Microsoft.Network/networkSecurityGroups"
| extend hour = datetime_part("hour", TimeGenerated)
| where hour < 8 or hour >= 18
| project TimeGenerated, Caller, OperationName, ResourceProvider, Resource, ResultType
| sort by TimeGenerated desc

Example 2: Identifying Data Exfiltration Attempts

Prompt:

Create a Splunk query to detect potential data exfiltration attempts. Our firewall
logs are in the index 'firewall' with fields including src_ip, dest_ip, dest_port,
bytes_out, action, and _time. I want to identify internal hosts (src_ip starts with 10.0.0. or 192.168.)
that are transferring unusually large amounts of data (more than 100MB) to external
destinations (dest_ip doesn't start with 10.0.0. or 192.168.) over a 1-hour period,
specifically looking at HTTPS traffic (dest_port=443).

LLM-Generated Query:

index=firewall (src_ip=10.0.0.* OR src_ip=192.168.*) NOT (dest_ip=10.0.0.* OR dest_ip=192.168.*)
dest_port=443 action=allowed
| bucket _time span=1h
| stats sum(bytes_out) as total_bytes_out by _time, src_ip
| where total_bytes_out > 100000000
| sort -total_bytes_out
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(_time)

Balancing Automation with Security Concerns

While LLMs offer significant efficiency gains, security professionals must remain vigilant about their limitations:

Data Privacy: Never share sensitive logs containing PII, credentials, or proprietary information with public LLM services
Verification: Always review generated queries before execution in production environments
Governance: Establish clear guidelines for LLM use in security operations
Threat Model: Consider the security implications of your LLM workflow

As one cybersecurity professional cautioned: "You shouldn't give it ANY PII or other sensitive information, but if you give one a specific problem to solve they can save so much time."

Conclusion: The Future of Log Query Writing

Large Language Models are transforming how security professionals approach log analysis. By bridging the gap between plain English and complex query languages, these AI tools help security teams:

Accelerate threat hunting and incident response
Democratize access to log data across varying skill levels
Reduce the cognitive load of writing complex queries
Free up analyst time for higher-value security work

While LLMs aren't perfect—they require human oversight and have limitations around complex security logic—they represent a significant advancement in security workflow automation. As one Reddit user summarized: "LLMs can save so much time long-term, but they still need human oversight."

By following the best practices outlined in this article, security professionals can harness the power of generative AI to transform their log analysis workflows while maintaining the human judgment essential to effective cybersecurity.

The next time you find yourself staring at a blank query editor, wondering how to express a complex security question in KQL or another query language, consider leveraging an LLM. With the right prompting techniques and verification processes, you might find yourself saying what many security professionals already have: that using LLMs for query generation "saves me a ridiculous amount of time."

Further Resources

Microsoft Documentation on KQL - Complete reference for Kusto Query Language
Microsoft Documentation on Query Optimization - Best practices for optimizing log queries
Log Analytics Without Writing Queries - Exploring modern approaches to log analysis
Reddit Discussion on LLMs in Cybersecurity - Practitioner perspectives on AI applications

Remember that effective log query writing is both an art and a science—LLMs can accelerate the process, but your expertise and judgment remain the most valuable assets in your security arsenal.

Leveraging LLM for Efficient Log Query Writing in Cybersecurity - with Prompt Examples

Thank you for subscribing us!

The Pain Points of Traditional Log Query Writing

1. Complexity of Query Languages

2. Time-Consuming Process

3. Skill Barriers

4. Configuration and Monitoring Gaps

How LLMs Transform Log Query Writing

Step-by-Step Guide to Using LLMs for Log Query Generation

1. Understand Your Log Structure

2. Define Your Query Objectives

3. Construct an Effective Prompt

Sample Prompt Template:

Example Prompt:

4. Utilize the LLM

5. Review and Test the Generated Query

6. Refine Your Prompt Based on Results

Best Practices for Ensuring Query Accuracy

Provide Comprehensive Context

Implement Human-in-the-Loop Verification

Iterate and Learn

Address Common Pitfalls

Sample Prompts for Different Security Scenarios

1. Basic Threat Hunting

2. Compliance Monitoring for PII Exposure

3. Malware Analysis

4. User Behavior Analytics

Providing Log Structure Context to LLMs

1. Table/Index Schema Description

2. Sample Log Entries

3. Query Language Specifications

4. Previous Query Examples

Real-World Examples: From Natural Language to Query

Example 1: Detecting Unusual Admin Activity

Example 2: Identifying Data Exfiltration Attempts

Balancing Automation with Security Concerns

Conclusion: The Future of Log Query Writing

Further Resources

From Periodic to Continuous