In this Beginner’s Guide to Regular Expressions in Splunk article we will learn how to unleash the power of pattern matching in your Splunk searches. A Regular Expression (regex) in Splunk is a way to search through text to find pattern matches in your data. Regex is a great filtering tool that allows you to conduct advanced pattern matching. Incorporating regex into Splunk search enables users to apply these operations to existing data sources, providing valuable insights into data analysis. As a flexible method to test regex, we will discuss in this article the basics of regex syntax, how to apply regex in searches, and how to create in-search field extractions. By following along with these examples, any Splunker can acquire the tools necessary to expand their data analysis capabilities.
The Basics of Regular Expressions
Regular Expression syntax is comprised of combinations of characters and symbols that achieve matching patterns in text. The table below contains descriptions of common regex symbols, with examples for matching text.
Symbol / Characters | Description | Examples |
---|---|---|
Literal characters | Characters as they read in text. | `abc` **matches** `abc`, **not** `def` |
`.` | Any single character. | `b.d` **matches** `bcd`, **not** `bd`, `bde` |
`\w` | Any alphanumeric character, sometimes called a "word character". | `a\wc` **matches** `abc`, **not** `a c` |
`\W` | Any non-alphanumeric character. | `a\Wc` **matches** `a c`, **not** `abc`, `def` |
`\d` | Any digit character. | `a\d` **matches** `a1`, `a2`, **not** `ab`, `ac` |
`\D` | Any non-digit character. | `a\D` **matches** `ab`, `ac`, **not** `a1`, `a2` |
`\s` | Any whitespace character. | `a\sc` **matches** `a c`, **not** `abc`, `adc` |
`\S` | Any non-whitespace character. | `a\Sc` **matches** `abc`, `a1c`, **not** `a c` |
`[...]` | A single character of those in brackets. | `a[bc]` **matches** `ab`, `ac`, **not** `ad`, `bc` |
`[^...]` | A single character other than those in brackets. | `a[^bc]` **matches** `ad`, **not** `ab`, `de` |
`[n1-n2]` | Range notation, allowing for alphanumeric matching of an alphabetic or numeric range. Case sensitivity applies. | `[a-z]` **matches** `a`, `b`, `z`, **not** `A`, `Z`; `[0-9]` **matches** `1`, `5`, `9`; `[a-zA-Z0-9]` **matches** `a`, `B`, `3` |
`*` | Zero or more of the preceding character or expression. | `abc*` **matches** `ab`, `abc`, `abcc`, **not** `ac`, `acd` |
`+` | One or more of the preceding character or expression. | `abc+` **matches** `abc`, `abcc`, **not** `ab`, `abd` |
`?` | Zero or One of the preceding character or expression. | `abc?` **matches** `abc`, `ab`, **not** `ac` |
`{n}` | Matches `n` occurences of the preceding character or expression. | `\d{3}` **matches** `123`, **not** `12`, `a23` |
`\|' | Create an "OR" expression. | `a[b\c]` **matches** `ab`, `ac` **not** `ad` |
`\` | Escape special regex characters. | `ab\?` **matches** `ab?`, **not** `ab`, `ab\` |
`^` | Match position to beginning of line. | `^bc` **matches** `bcd`, **not** `abc` |
`$` | Match position to end of line. | `bc$` **matches** `abc`, **not** `bcd` |
`(...)` | Group characters together based on pattern in parentheses. Groups are referenced in numeric order (i.e. \1 is the first group, \2 is the second), typically in replacement or character group isolation. | `(ab)cd` captures `ab`, accessible with `\1` |
`(?group_name>...)` | Create a named group. Splunk uses named groups in field extraction regex. | `(? |
Regular Expression Examples with Splunk
There are many common patterns in data where regex can be used to identify values. Often, this is most beneficial when data has a consistent format, but the values of words and numbers change. Common examples are phone numbers, IP addresses, and timestamps. For each of these examples, many variations of regex can be applied using basic symbols.
Use Cases
Example 1: Phone Numbers
For this example, a 10-digit phone number is expected in data. A simple match of 10 digits can be accomplished with the following:
# Example data we want to find1234567890# Digit symbol\d{10}# Digit symbol with numeric range\d[0-9]{10}
Both of these regular expressions above will find the example data we are looking for. Now, let’s use this regular expression in a Splunk SPL search using the rex command.
index="" sourctype=""| rex "(?\d[0-9]{10})"| where isnotnull(phone_number)| stats count as call_count by phone_number| sort - call_count
To use this search, replace <index>
and <sourcetype>
with data from your Splunk environment. This search uses the rex command to extract all instances of 10-digit numbers from the phone_number
field of each event, creating a new field called phone_number
. The query then filters the results to include only the events that have at least one valid 10-digit number match, then presents the count of events containing each found phone number in a tabular format.
Example 2: Phone Numbers with Hyphens
Undoubtedly, encountering this type of phone data with hyphens is likely. The following regular expression will separate the area code, telephone prefix, and the line number into three distinct sections by looking for the common phone number patterns with the hyphens.
# Example data123-456-7890# Digit symbol\d{3}-\d{3}-\d{4}# Multiple occurences of the group containing 3 digits and a hyphen(\d{3}-){2}\d{4}
Let’s now try the same search with a different regular expression.
index="" sourctype=""| rex "(?\d{3}-\d{3}-\d{4})"| where isnotnull(phone_number)| stats count as call_count by phone_number| sort - call_count
In the same way as our other example, this search will attempt to locate number sequences that look like phone numbers with dashes in them. Again, using the rex command, we are able to utilize a regular expression to find events in Splunk that have phone numbers with hyphens.
Example 3: Phone Numbers in Multiple Formats
In an exploratory exercise, it may be unknown if the data contains hyphens or is prefixed with a country code. Using theOR
symbol with combinations of other frequency symbols like*
, regex can match all formats presented. Groups used to separate out the sections of data can also be used to identify each section with a name.
# Example data123-456-7890+1 123-456-78901234567890+11234567890# Matching all examples with |, *, and groups((\+\d)|)\s*(\d{3}(-|)){2}(\d{4})# Using groups to name all sections of the phone number((?\+\d)|)\s*(?\d{3}(-|))(?\d{3}(-|))(?\d{4})
Using the rex command to extract more unique fields for each occurrence of a phone number allows for more complex search queries to show granular details of the phone number data. This also allows Splunk users to write additional queries focusing on each field for presentation in a multi-panel dashboard.
Example 4: IP Addresses
In Splunk, IP addresses are often critical for data analysis and threat hunting. The principles in the previous example can be easily applied to IP Addresses to account for any variability in IP ranges.
# Example data10.11.12.123172.16.0.453.86.250.13# Match with frequency range for each octet\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}# Use a group to match 3 octets that end with a period(\d{1,3}\.){3}\d{1,3}# Use | to match all octets in an abbreviated form(\d{1,3}(\.|)){4}
In the example below, a regex pattern for extracting IP addresses is used to determine potential IP Addresses involved in a Brute Force attack.
index="authentication" action="blocked"| rex "(?\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"| stats count as login_failures by src_ip| where login_failures > 10| sort - login_failures
Example 5: Timestamps
Timestamps are a primary concern for Splunk administrators when onboarding data. In some cases, data may contain additional timestamps that are useful to extract as fields. Timestamp data will generally follow a consistent pattern across the events of a unique log source, and knowledge of this format can help apply accurate regex.
# Example dataJan 01, 2024 10:14:42# Using whitespace and character frequency as observed\w{3}\s\d{2},\s\d{4}\s\d{2}:\d{2}:\d{2}# Use of + symbol for matching on month name if length is variable\w+\s\d{2},\s\d{4}\s\d{2}:\d{2}:\d{2}
When working with time data in Splunk, regex in search commands should not be used for extracting event timestamps. Moreover, valid data ingestion processes more specifically involved leveraging timestamp parsing utilities in Splunk configuration files, which produce event timestamps automatically in Splunk.
Utilizing regex for timestamps can be useful in Splunk search when raw data contains fields that have additional timestamps that provide useful context in reporting or dashboarding. The example SPL below shows a method of tabling out service ticket data, where the primary timestamp is extracted from ticket creation, but resolution time is needed in the resulting table.
index="itsm"| rex "(?\w{3}\s\d{2},\s\d{4}\s\d{2}:\d{2}:\d{2}"| table _time resolution_date analyst_comments| rename _time as creation_time
Considerations
The examples shown, in some cases, demonstrate methods of abbreviating regex with more complex combinations of symbols. While regex optimization is beyond the scope of this article, readers should be aware that functional but complex regex may slow down text parsing operations. Additionally, shorter strings do not always improve regex performance.
Filtering Searches with Regular Expressions
Regular Expressions in Splunk Search
As a regex beginner, using regexto search Splunk provides a great mechanism to explore data, provide adhoc field extractions, and test regex for application in administrative configurations. We will demonstrate how to apply regex, rex, and erexSPL commands to enhance analytics and reporting capabilities.
The regex command filters data based on regular expression patterns rather than the standard field=value comparison. This is useful when multiple field values with a common format are relevant results for a search. The regex
command uses the following syntax:
| regex =
You can find detailed examples in the Search Command of the Week article Using the regex Command.
In-Search Field Extractions
Splunk in-search field extractions allow for fields to be created and used in SPL that do not have permanent extractions configured in the Splunk environment. While a key part of data administration is configuring Splunk to extract these fields automatically, in-search extractions provide a mechanism both for testing regex to be added as a permanent extraction and for enabling any Splunk user to create fields they need to use in their data.
The erex Command
The erex command is an ideal starting point for a user wanting to extract fields in Splunk searches. This command requires no knowledge of regex to produce an output, as the only required input are example strings from the raw events. These examples are values of the field that should be extracted, as seen in the syntax below:
| erex examples=","
The greatest benefit of erex as regex beginner is output recommending regex for configuring permanent extractions. This output can be modified based on the concepts in Basics of Regex section to explore various regex use cases for Splunk field extraction. You can find detailed examples in the Search Command of the Week article –Using the erex Command
The rex Command
The rex command provides beginners and experts a simple way to apply regex directly to raw event text in Splunk searches. Using the concept of named groups seen in earlier examples, multiple fields can be extracted with a single command.
| rex field= ""
Conclusion
Using regular expressions can be a powerful tool for extracting specific strings in Splunk. It is a skill set that’s quick to pick up and master, and learning it can take your Splunk skills to the next level. There are plenty of self-tutorials, classes, books, and videos available via open sources to help you learn to use regular expressions.
In this article we covered:
1. The basics of regular expression syntax and how to use it
2. Walked through several real-world examples of using regular expressions in your Splunk searches.
3. Using regular expressions for in field extractions in Splunk
If you’d like more information about how to leverage regular expressions in your Splunk searches, reach out to ourteam of experts. We’re here to help!
You can also learn more about our Atlas platform. The Atlas platform by Kinney Group is a comprehensive solution that empowers organizations to optimize their Splunk environments. By leveraging automation, best practices, and a unified interface, Atlas simplifies the management, monitoring, and scaling of Splunk deployments. With Atlas, businesses can enhance the performance, security, and cost-efficiency of their Splunk infrastructure, enabling them to derive maximum value from their machine data and drive informed decision-making. Get started by running the free Atlas Assessment available for free on Splunkbase.