Splunk Search

Why is there a Regex Character Limitation

jadengoho
Builder

I have a very long regex query (12,000) character long- it consist o different hostname and IP Address combinations.

Now when i run the regex it shows :: Regex: regular expression is too large.

 

error.png

As per checking the Regex can only accommodate - 8190 character.

In the image you can see i use "a" letter 8190 time. but if i add another letter it will show the error.

search.png

 Can somebody explain to me why is this happening and how can i execute my regex properly.

 
 
 

 

 

 

Labels (1)
Tags (1)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

For reasons known only to those who wrote the code, Splunk can't handle a regular expression longer than 8190 characters.  The workaround is to make the regex short enough to fit into 8190 characters.  Sometimes a single rex command can be split into multiple smaller rex commands.

---
If this reply helps you, Karma would be appreciated.

jadengoho
Builder

Hi @richgalloway 

We tried to shorten the regex from 14,000 to 11,000 characters.

Is there any limits configuration we can tweak to override this Regex limitation

0 Karma

isoutamo
SplunkTrust
SplunkTrust
Usually that kind of tweaks can do with parameters in limits.conf, but at least I cannot found any suitable for that.
@cpetterborg have you any idea for this?

In curiosity how you can manage that regex? Usually much much shorter are already hard to update etc.
0 Karma

isoutamo
SplunkTrust
SplunkTrust
0 Karma

jadengoho
Builder

 

We have a 20,000+ combination of word/phrase that should be present on the logs to be routed to proper index.

Example"

 

CAT should have DOG - routed to sample1 index
RAT should have COUNT - routed to sample1 index.

In the transforms.conf 
REGEX = (cat.*dog|rat.*count|computer.*calculator|computer.*device.*v2)

https://goolge/sites/cat/page/dog
https://goolge/sites/rat/page/count
https://goolge/sites/computer/page/calculator
https://goolge/sites/computer/page/device/machine/v2

 

I've done all the possibilities to compress the regex but that is the best i can do. 

 

0 Karma

mtulett_splunk
Splunk Employee
Splunk Employee

In case this was never resolved, or for others who are interested, the solution here is to use multiple transforms stanzas to bring the total size under 8190, like so:

props.conf:

[my_sourcetype]
TRANSFORMS-index_routing = ruleset1, ruleset2

transforms.conf:

[ruleset1]
REGEX = (cat.*dog|rat.*count)
FORMAT = sample1
DEST_KEY = _MetaData:Index

[ruleset2]
REGEX = (computer.*calculator|computer.*device.*v2)
FORMAT = sample1
DEST_KEY = _MetaData:Index

 I would also argue in this specific case a different approach should be used as a regex this sizable will cause high CPU overhead during ingestion, especially if the source is high-volume.

Get Updates on the Splunk Community!

Enter the Dashboard Challenge and Watch the .conf24 Global Broadcast!

The Splunk Community Dashboard Challenge is still happening, and it's not too late to enter for the week of ...

Join Us at the Builder Bar at .conf24 – Empowering Innovation and Collaboration

What is the Builder Bar? The Builder Bar is more than just a place; it's a hub of creativity, collaboration, ...

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

This article is the continuation of the “Combine multiline logs into a single event with SOCK - a step-by-step ...