Splunk Search

Slow running custom search command

sumnerm
Path Finder

I have a requirement to provide histograms of performance through Splunk. Essentially we have a field (for example Page_Load_Time), and we need to find out how may entries for that field (on a particular search) fall into certain fixed categories - e.g. <200ms,200ms-2s etc

To achieve this I've written a custom search command - splitbins

import splunk.Intersplunk
import sys

def sortValueToBin(fieldValue,listOfBins):

    binNumber = 1

    for binRoof in listOfBins:

        if fieldValue < float(binRoof):

            return "Bin-" + str(binNumber)

        else:

            binNumber +=1

    return "Bin-" + str(binNumber)


fieldToSplit = sys.argv[1]
listOfBins = sys.argv[2:]


eventsDict,dummyResults,dummySettings = splunk.Intersplunk.getOrganizedResults()

for event in eventsDict:

    # Check its a number we're trying to split on, otherwise skip the event

    try:

        fieldValue = float(event[fieldToSplit])

    except:

        continue

    event["Bin_Number"] = sortValueToBin(fieldValue,listOfBins)

splunk.Intersplunk.outputResults(eventsDict)

This is then being run through a search command like this:

index="some_indexname" host="some_hostname" some_field="some_otherterm" | splitbins Page_Load_Time 200 2000 4000 8000 | chart count(Bin_Number) over some_other_field by Bin_Number | fields some_other_field Bin-1 Bin-2 bin-3 Bin-4 Bin-5

...and it works fine if the events passed by the initial search terms is in the thousands. However, as the number of events grow - two problems occur:

  1. Results stop being produced once the total number of events processed goes over 50,000
  2. The search is S-L-O-W. For example 20 minutes for 250K events. If I write the splitbins code to take a direct dictionary with some random results, it can process hundreds of thousands of events in less than a second: so there is nothing innately slow about the splitbins code.

I've tried to adjust everything in limit.conf that is set to 50000 to be a higher number with no change to the events processed. I've tried adding in a fields pipe after the initial search string to try and slim the search objects down earlier, and it is still slow.

Running v4.1.2 on Windows, with plenty of spare CPU and memory.

Any ideas?

Thanks

1 Solution

gkanapathy
Splunk Employee
Splunk Employee

Note that the bucket command (which is aliased as bin) probably does something like what you want:

index="some_indexname" host="some_hostname" some_field="some_otherterm" 
| bucket Page_Load_Time as Bin_Number span=1.6log2 
| chart count by Page_Load_Time

I would suggest that this custom search command is basically entirely unnecessary. Even if bucket doesn't give you the exact ranges you want, you can get the same effect with either a

rangemap field=Page_Load_Time bin1=0-199 bin2=200-1999 bin3=2000-3999 bin4=4000-7999 default=bin5 | rename range=Bin_Number 

command or a line of

eval Bin_Number=case(Page_Load_Time<200,"bin1",Page_Load_Time<2000,"bin2",...) 

instead.

For reference, the 50k results limit would be avoided by making the search command "streaming" (see commands.conf.spec)

View solution in original post

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Note that the bucket command (which is aliased as bin) probably does something like what you want:

index="some_indexname" host="some_hostname" some_field="some_otherterm" 
| bucket Page_Load_Time as Bin_Number span=1.6log2 
| chart count by Page_Load_Time

I would suggest that this custom search command is basically entirely unnecessary. Even if bucket doesn't give you the exact ranges you want, you can get the same effect with either a

rangemap field=Page_Load_Time bin1=0-199 bin2=200-1999 bin3=2000-3999 bin4=4000-7999 default=bin5 | rename range=Bin_Number 

command or a line of

eval Bin_Number=case(Page_Load_Time<200,"bin1",Page_Load_Time<2000,"bin2",...) 

instead.

For reference, the 50k results limit would be avoided by making the search command "streaming" (see commands.conf.spec)

0 Karma

sumnerm
Path Finder

Raced off the three methods over 250K events: eval/case - 28s, splitbins (with streaming) - 7m 55s, splitbins (without streaming) - 29m 30s (and only 50K events), rangemap - seemingly forever (got bored waiting - may be some other issue).

Will retire my funcion and use eval/case.

0 Karma

sumnerm
Path Finder

Thanks. My custom search command ran a lot faster once the streaming was set to true - though haven't raced against rangemap yet or against eval. Happy though that eval & case is the way to go. If I get some time later in the week I'll race them off.

No idea on the context-switching constraints, other than just to blame Windows. The hardware is 64-bit eight processor cores, 16GB of memory - running very little activity, virtually no monitor traffic etc, no software other than Splunk, and no other searches. Are there some performance risks with splunk on non *nix platforms?

0 Karma

gkanapathy
Splunk Employee
Splunk Employee

rangemap is a default external search command, so does the same as yours, while eval runs in-process in Splunk. This indicates to me that either your Splunk config is launching too many external search processes, or that something in your OS/system is limiting communication or context-switching between splunkd and the external process.

0 Karma

sumnerm
Path Finder

Thanks for the advice.

The range function worked but was very slow - however using the case statement in the eval not only works but is also fast.

Excellent!

0 Karma

Lowell
Super Champion

can you post your commands.conf entry as well? Specifically you could see different performance with streaming vs not-streaming...

0 Karma

dart
Splunk Employee
Splunk Employee

Hi,

Does the bucket command do what you need?

http://www.splunk.com/base/Documentation/4.1.4/SearchReference/Bucket

bucket field span=200 

If you need to aggregate some of those buckets into bigger ones then you could eval them together?

| stats ... | eval my_big_bucket= bucket_1 + bucket_2
0 Karma

sumnerm
Path Finder

Thanks - great advice.

It certainly solves the speed and limits issue - but I seem to have problems getting the eval functions to work with the lesser used buckets (beyond the first 9 + OTHER).

I shall keep reading and fiddling for a bit first before I come back for help.

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...