Leveraging LLM for Efficient Log Query Writing in Cybersecurity - with Prompt Examples


Join thousands of professionals and get the latest insight on Compliance & Cybersecurity.
In the high-stakes world of cybersecurity, efficient log management is not just a convenience—it's a critical component of threat detection and incident response. Security analysts spend countless hours sifting through logs, writing complex queries to extract meaningful information from vast oceans of data. Yet for many professionals, these queries are a source of significant frustration.
As one security analyst put it: "Writing log queries sucks, being able to do a plain English ask instead of writing some proprietary or complex logic statement would make that whole aspect significantly quicker and easier."
This is where Large Language Models (LLMs) are changing the game. By transforming natural language requests into sophisticated log queries, these AI tools are streamlining workflows and saving security teams valuable time in their battle against cyber threats.
The Pain Points of Traditional Log Query Writing
If you've worked in cybersecurity, you're likely familiar with the challenges of traditional log query approaches:
1. Complexity of Query Languages
Each log analysis platform comes with its own query language—whether it's Kusto Query Language (KQL) for Azure, Elasticsearch Query DSL, or Splunk's Search Processing Language. Mastering these specialized languages requires significant time and mental bandwidth that could be better spent on actual security analysis.
2. Time-Consuming Process
In the midst of incident response, every minute counts. The manual process of crafting precise queries can be painstakingly slow, delaying critical security actions when they're needed most.
3. Skill Barriers
Not every security professional is equally proficient in query languages. This creates knowledge silos and bottlenecks where less experienced team members must wait for query experts to help them extract needed information.
4. Configuration and Monitoring Gaps
As noted in Dynatrace's analysis, manual tagging and configuration often lead to monitoring blind spots, making it challenging to identify and correlate relevant logs when needed.
The frustration is real and widespread. One cybersecurity professional on Reddit expressed: "When you finally get a working query it feels amazing, other times it's frustrating as hell. I think there's also a limit to the number of results your query can show, so also a little limited IMO."
How LLMs Transform Log Query Writing
Large Language Models offer a solution to these challenges by serving as an intelligent intermediary between human analysts and complex query languages. Security professionals can articulate their needs in plain English, and the LLM translates this into the appropriate query syntax.
As one cybersecurity practitioner noted: "When I need to script something, using LLMs saves me a ridiculous amount of time. I code fairly well so I can sanity check the output and modify when required."
Let's explore how to leverage this powerful capability effectively.
Step-by-Step Guide to Using LLMs for Log Query Generation
1. Understand Your Log Structure
Before you can effectively use an LLM to generate queries, you need a clear understanding of your log environment:
- Identify key log sources: EDR solutions, network devices, authentication systems, AppSec tools
- Know your schema: Understand the fields available in your logs (timestamps, user IDs, IP addresses, event types)
- Understand your query language: Whether it's KQL, Elasticsearch, Splunk SPL, or another language
This foundational knowledge enables you to provide the LLM with the context it needs to generate accurate queries.
2. Define Your Query Objectives
Be specific about what you're trying to accomplish:
- Are you conducting threat hunting across multiple data sources?
- Investigating a potential security incident?
- Running compliance checks for PII exposure?
- Performing malware analysis or script deobfuscation?
Clear objectives lead to more precise queries.
3. Construct an Effective Prompt
This is where the magic happens. Your prompt should include:
- The query language being used
- Log structure information
- The specific question you're trying to answer
Sample Prompt Template:
I need to write a [QUERY LANGUAGE] query for our [LOG SOURCE/PLATFORM].
Our log schema includes the following fields: [LIST KEY FIELDS AND DATA TYPES].
I want to query for [SPECIFIC SECURITY SCENARIO].
Please generate a query that will [EXPECTED OUTCOME].
Example Prompt:
I need to write a KQL query for our Azure Sentinel instance.
Our SecurityEvent table includes fields like EventID (integer), Account (string),
Computer (string), ActivityType (string), and TimeGenerated (datetime).
I want to query for potential brute force attacks.
Please generate a query that will identify accounts with more than 5 failed
login attempts within a 10-minute window in the last 24 hours.
4. Utilize the LLM
Input your crafted prompt into an LLM like ChatGPT, Claude, or a similar tool. The model will analyze your request and generate a corresponding query based on the information provided.
For the example above, the LLM might generate something like:
SecurityEvent
| where TimeGenerated > ago(24h)
| where EventID == 4625 // Failed login attempt
| summarize FailedAttempts = count() by Account, bin(TimeGenerated, 10m)
| where FailedAttempts > 5
| order by TimeGenerated asc
5. Review and Test the Generated Query
Never blindly trust the output from an LLM. As one security professional cautioned: "LLMs still hallucinate and struggle with complex security logic. They're solid for automating the boring stuff like compliance checks and report summaries, but I wouldn't trust them with critical security decisions."
Always:
- Validate table and field names
- Check the logic for accuracy
- Test the query with a limited dataset first
- Ensure it meets performance expectations
6. Refine Your Prompt Based on Results
If the generated query doesn't meet your needs, iterate on your prompt by:
- Providing more context about your log structure
- Clarifying your objective
- Including sample log entries
- Specifying performance requirements
Remember that prompt engineering is an iterative process that improves with practice.
Best Practices for Ensuring Query Accuracy
Provide Comprehensive Context
The quality of your LLM-generated query directly correlates with the context you provide. Include:
- Database or Log Platform Details: Specify whether you're querying Azure Log Analytics, Elasticsearch, Splunk, etc.
- Schema Information: Provide table names, field names, and data types
- Sample Data: When possible, include a few anonymized log entries to help the LLM understand data formats
- Time Range Requirements: Clarify if you need data from a specific timeframe
- Performance Considerations: Mention if query efficiency is critical due to large data volumes
Implement Human-in-the-Loop Verification
While LLMs can dramatically accelerate query writing, human oversight remains essential:
- Syntax Verification: Check for proper syntax before executing queries
- Logic Validation: Ensure the query logic aligns with your security objectives
- Efficiency Review: Look for opportunities to optimize resource-intensive operations
- Security Implications: Consider whether the query might expose sensitive data
As one Reddit user noted: "I code fairly well so I can sanity check the output and modify when required." This approach balances efficiency with security.
Iterate and Learn
Developing expertise in LLM-assisted query writing is an iterative process:
- Save Successful Prompts: Create a library of effective prompts for common query types
- Document Patterns: Note which prompt structures yield the best results
- Share Knowledge: Exchange effective prompts with team members to build collective expertise
Address Common Pitfalls
Be aware of these common challenges when using LLMs for log queries:
- Hallucinations: LLMs may occasionally invent field names or functions that don't exist in your environment
- Outdated Knowledge: Models may not be familiar with the latest versions of query languages
- Complex Join Operations: Multi-table joins and complex correlations may require additional guidance
- Performance Optimization: LLMs may prioritize readability over query performance
Sample Prompts for Different Security Scenarios
1. Basic Threat Hunting
I need a KQL query for Microsoft Sentinel to identify unusual PowerShell command executions.
Our logs are in the SecurityEvent table with fields including EventID (int),
Account (string), CommandLine (string), and TimeGenerated (datetime).
I want to find PowerShell commands that use encoding or obfuscation techniques
commonly associated with malicious activity.
2. Compliance Monitoring for PII Exposure
Generate a query for Elasticsearch that will help me identify potential PII exposure
in our application logs. Our logs are stored in the 'app-logs' index with fields
including 'message' (text), 'level' (keyword), 'timestamp' (date), and 'service' (keyword).
I need to find instances where credit card numbers, social security numbers,
or email addresses might appear in log messages.
3. Malware Analysis
I need a Splunk query to identify potential lateral movement after a malware infection.
Our logs are in the 'windows_events' index with fields including 'EventCode',
'SourceIP', 'DestinationIP', 'User', and '_time'.
The infected host was 192.168.1.15, and I want to see all network connections
initiated from this host to other internal systems over the past 48 hours.
4. User Behavior Analytics
Create a query for Azure Log Analytics using KQL to establish a baseline of normal
authentication patterns for our executive users. We have the SigninLogs table with
fields including UserPrincipalName, IPAddress, Location, ClientAppUsed, and TimeGenerated.
I want to identify authentication attempts that deviate from each user's normal
patterns based on time of day, location, and device used.
Providing Log Structure Context to LLMs
One of the most critical aspects of generating accurate queries is helping the LLM understand your log structure. Here are effective ways to communicate this information:
1. Table/Index Schema Description
I'm working with our Azure Log Analytics workspace. The SecurityEvent table
has the following schema:
- TimeGenerated (datetime): When the event occurred
- EventID (int): Windows Security Event ID
- Account (string): User account involved
- Computer (string): Host where the event occurred
- Activity (string): Description of the activity
- SourceIP (string): Source IP address if available
2. Sample Log Entries
Here are two sample log entries from our Apache access logs:
192.168.1.105 - - [10/Oct/2023:13:55:36 -0700] "GET /login.php HTTP/1.1" 200 2326
192.168.1.210 - - [10/Oct/2023:13:57:12 -0700] "POST /login.php HTTP/1.1" 401 1842
I need a query that will find all failed login attempts (HTTP status 401)
from the same IP address occurring more than 10 times in a 5-minute window.
3. Query Language Specifications
I'm using Elasticsearch Query DSL for this query. Please include proper
syntax for date ranges, aggregations, and filters. I specifically need to
use the 'terms' aggregation to group results.
4. Previous Query Examples
Here's a similar query I've used before:
SigninLogs
| where TimeGenerated > ago(24h)
| where ResultType == "0"
| summarize count() by UserPrincipalName, bin(TimeGenerated, 1h)
But now I need to modify it to also include the device information and location.
Real-World Examples: From Natural Language to Query
Let's walk through complete examples showing how an LLM transforms a natural language request into a functional query.
Example 1: Detecting Unusual Admin Activity
Prompt:
I need a KQL query for our Azure Sentinel environment to identify unusual administrative
actions. We use the Azure Activity log with fields like Caller, OperationName, ResourceType,
ResultType, and TimeGenerated. I want to find any operations performed by admin users
(role contains 'Admin') on critical resources (ResourceType contains 'Microsoft.Compute/virtualMachines'
or 'Microsoft.Network/networkSecurityGroups') outside of normal business hours (8am-6pm local time)
in the past week.
LLM-Generated Query:
AzureActivity
| where TimeGenerated > ago(7d)
| where isnotempty(Caller)
| where Caller has "Admin" or ResourceProviderValue has "Admin"
| where ResourceProvider contains "Microsoft.Compute/virtualMachines" or
ResourceProvider contains "Microsoft.Network/networkSecurityGroups"
| extend hour = datetime_part("hour", TimeGenerated)
| where hour < 8 or hour >= 18
| project TimeGenerated, Caller, OperationName, ResourceProvider, Resource, ResultType
| sort by TimeGenerated desc
Example 2: Identifying Data Exfiltration Attempts
Prompt:
Create a Splunk query to detect potential data exfiltration attempts. Our firewall
logs are in the index 'firewall' with fields including src_ip, dest_ip, dest_port,
bytes_out, action, and _time. I want to identify internal hosts (src_ip starts with 10.0.0. or 192.168.)
that are transferring unusually large amounts of data (more than 100MB) to external
destinations (dest_ip doesn't start with 10.0.0. or 192.168.) over a 1-hour period,
specifically looking at HTTPS traffic (dest_port=443).
LLM-Generated Query:
index=firewall (src_ip=10.0.0.* OR src_ip=192.168.*) NOT (dest_ip=10.0.0.* OR dest_ip=192.168.*)
dest_port=443 action=allowed
| bucket _time span=1h
| stats sum(bytes_out) as total_bytes_out by _time, src_ip
| where total_bytes_out > 100000000
| sort -total_bytes_out
| convert timeformat="%Y-%m-%d %H:%M:%S" ctime(_time)
Balancing Automation with Security Concerns
While LLMs offer significant efficiency gains, security professionals must remain vigilant about their limitations:
- Data Privacy: Never share sensitive logs containing PII, credentials, or proprietary information with public LLM services
- Verification: Always review generated queries before execution in production environments
- Governance: Establish clear guidelines for LLM use in security operations
- Threat Model: Consider the security implications of your LLM workflow
As one cybersecurity professional cautioned: "You shouldn't give it ANY PII or other sensitive information, but if you give one a specific problem to solve they can save so much time."
Conclusion: The Future of Log Query Writing
Large Language Models are transforming how security professionals approach log analysis. By bridging the gap between plain English and complex query languages, these AI tools help security teams:
- Accelerate threat hunting and incident response
- Democratize access to log data across varying skill levels
- Reduce the cognitive load of writing complex queries
- Free up analyst time for higher-value security work
While LLMs aren't perfect—they require human oversight and have limitations around complex security logic—they represent a significant advancement in security workflow automation. As one Reddit user summarized: "LLMs can save so much time long-term, but they still need human oversight."
By following the best practices outlined in this article, security professionals can harness the power of generative AI to transform their log analysis workflows while maintaining the human judgment essential to effective cybersecurity.
The next time you find yourself staring at a blank query editor, wondering how to express a complex security question in KQL or another query language, consider leveraging an LLM. With the right prompting techniques and verification processes, you might find yourself saying what many security professionals already have: that using LLMs for query generation "saves me a ridiculous amount of time."
Further Resources
- Microsoft Documentation on KQL - Complete reference for Kusto Query Language
- Microsoft Documentation on Query Optimization - Best practices for optimizing log queries
- Log Analytics Without Writing Queries - Exploring modern approaches to log analysis
- Reddit Discussion on LLMs in Cybersecurity - Practitioner perspectives on AI applications
Remember that effective log query writing is both an art and a science—LLMs can accelerate the process, but your expertise and judgment remain the most valuable assets in your security arsenal.