Splunk Search

Interpretation of integers

fabienpe
Explorer

I am wondering why the two following requests, when applied to exactly the same time range, return a different value:

index=<my_index> logid=0000000013 | stats count

index=<my_index> logid=13 | stats count

The first one returns many more results than the second. (The type indicated by Splunk for this field is "number" not "string".)

Labels (1)
Tags (1)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

That's due to how Splunk searches its indexes. Unless the field is indexed and properly configured or you're searching with wildcards, Splunk will try to find the exact value you're searching for in its index files.

For example. I'm searching my home environment for

index="winevents" EventCode=7040

A fairly simple search.

When I look into job inspect and get the job log I see how Splunk executes the search against its indexed data:

01-12-2024 18:06:20.986 INFO  UnifiedSearch [1919978 searchOrchestrator] - Expanded index search = (index="winevents" EventCode=7040)
01-12-2024 18:06:20.986 INFO  UnifiedSearch [1919978 searchOrchestrator] - base lispy: [ AND 7040 index::winevents ]

As you can see, Splunk didn't optimize the search (because there wasn't much to optimze - the search was very simple) but the resulting lispy search is looking literally for the value "7040" with metadata field of index equal to winevents (actually index is treated a bit different than other indexed fields but for us here it doesn't matter). Only after finding those events that do have the "7040" anywhere within their body, Splunk will try to parse out the EventCode field from them and will try to match that value (possibly numerically) to your argument.

 

0 Karma

isoutamo
SplunkTrust
SplunkTrust

Hi

this is quite interesting findings 🙂

What you have in original log/event data? Is it "logid=13" or just 13 and it has extracted and named with search or ingestion time?

Is there any mater how many zeroes you add as prefix?

r. Ismo

0 Karma

fabienpe
Explorer

@PickleRick I have to admit that I do not fully understand your explanation. Note that I get more results with a search for 0000000013 than just 13.

@isoutamo Here are outputs of other queries, still for the very same time intervals:

index=<myindex> logid=13 | stats count
822,434
index=<myindex> logid="13" | stats count
0
index=<myindex> logid=0000000013 | stats count
3,183,571
index=<myindex> logid="0000000013" | stats count
8,183,571
index=<myindex> logid=013 | stats count
0
index=<myindex> | stats count by logid
0000000011             8,000
0000000013         3,183,571
0000000023           127,753
0419016384             5,154

I wanted to see which records match 13 and not 0000000013 with the following request

index=<myindex> logid=0000000013 AND logid!=13 | stats count

but results is 0...

0 Karma

PickleRick
SplunkTrust
SplunkTrust

I know, it's a bit tricky 🙂

It's that since Splunk does something called "schema on read" (mostly, apart from the indexed fields), it searches for data differently than, for example, your typical RDBMS does.

Splunk firstly searches for the value and having found that value in a set of events it checks whether this value location fits the definition of a field.

For example if you have three events like this

field1=whatever field2=wherever field3=whenever
field4=otherwhatever field5=otherwherever field6=otherwheneverfield7=otherwhatever site=otherwherever field8=otherwhenever

Assuming you have your key=value definitions set, if you search for

site=otherwherever

Splunk will firstly chose the second and third event from its index (since the first one doesn't have the "otherwherever" string anywhere), then parse both events into fields and decide that only the third one matches.

It can sometimes create some interesting issues in unusual cases, especially involving partial matches (and in your case "13" is indeed a partial match on "000000013").

isoutamo
SplunkTrust
SplunkTrust

Can you check how those logids are in your original data (outside of splunk) or at least in _raw field?

Just open event and select "Event Actions -> Show Source".

0 Karma

fabienpe
Explorer

In a search for logid=13, I looked up the source of the returned events and they do have the logid information. Here is an extract:

eventtime=1701795599940432762 tz="+0100" logid="0000000013" type="traffic" subtype="forward" level="notice"
0 Karma

isoutamo
SplunkTrust
SplunkTrust

Ok, in reality this logid is not a numeric field, it's a string, but some unknown reason splunk convert it to number. Maybe this is bug and you should create a support case.

What happen if you try this

index=<myindex> logid="0000000013" AND logid!="13" | stats count

If this didn't help, I don't know how to tell in search to splunk that this field should keep as string instead of convert it to numeric.

fabienpe
Explorer

OK. Thanks for your help.

 

index=<myindex> logid="0000000013" AND logid!="13" | stats count

gives 3,183,571.

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...