Splunk Search

How to improve the speed of Spunk search

qazwsxe
New Member

I want to get hundreds of millions of data from billions of data, but it takes more than an hour each time.
I just used the simplest search: index="test" name=jack But, it's very slow.

Then I checked the memory and CPU usage. Each search takes only 200-300 MB of memory.
So I modified the max_mem_usage_mb, search_process_memory_usage_percentage_threshold and search_process_memory_usage_threshold parameters in $SPLUNK_HOME/etc/apps/search/local/limits.conf, but they didn't seem to play a significant role.
Is there any effective way to improve the speed of my search?
Thanks! 🙂

0 Karma
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

You'll be much faster in finding Jack's company if you also specify how to find a company in your search. What that looks like depends on your data which you didn't share with us - knowing your data would help.

That could look like one of these:

index=foo sourcetype=company_register name=jack
index=foo category=employees name=jack
etc.

If you have an accelerated datamodel, it could look like this:

| tstats summariesonly=t values(company) as companies from datamodel=your_model where your_model.name=jack

To chain that you could build a dashboard with in-page drilldowns that steps through the tree you expect in your data.

View solution in original post

0 Karma

codebuilder
Influencer

See my previous answer on tuning your SHC for performance.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

qazwsxe
New Member

I tried, but it didn't work.
:(

0 Karma

qazwsxe
New Member

Thanks,I'll try more indexers

0 Karma

woodcock
Esteemed Legend

You improve (the speed of) your search by DOING SOMETHING with your millions of events by piping them to MORE SPL by adding |Other SPL here. What is your SPL now? Show us your sample events and a mockup of your desired FINAL OUTPUT. Make it work first, the worry about optimizing it.

0 Karma

woodcock
Esteemed Legend

Splunk is not an ETL tool (it is a Needle-in-the-haystack tool, not a Forklift-the-haystack tool). There is no way to make it perform acceptably when the final output is millions of rows/events. It can process billions down to millions, and millions down to hundreds or maybe thousands, but that's it. You need to figure out what you really need to do and probably you don't need the millions as the final output, but rather as an input to some other calculation which can probably be done in Splunk. Otherwise, use another, more appropriate tool. Splunk is not a part of the pipeline tool, it is an end of the pipeline tool, to give the final conclusions.

0 Karma

qazwsxe
New Member

Because of the large amount of data, the data I get through keyword search will have a lot of correlation, so I will extract a lot of data, but I can not add restrictions, abandon part of the data, which will lead to slow speed. I am eager to solve this problem.
Thanks:)

0 Karma

codebuilder
Influencer

Increase the resources available to Splunk at the search head level.
Modify the settings below (based on your environment) at $SPLUNK_HOME:/etc/system/local/limits.conf and cycle the search head(s).

[defaults]
max_mem_usage_mb = 16000

[search]
# If number of cpu's in your machine is 14 then total system wide number of
# concurrent searches this machine can handle is 20.
# which is base_max_searches + max_searches_per_cpu x num_cpus = 6 + 14 x 1 = 20
base_max_searches = 6
max_searches_per_cpu = 16

[scheduler]
# Percent of total concurrent searches that will be used by scheduler is
# total concurrency x max_searches_perc = 20 x 60% = 12 scheduled searches
# User default value (needed only if different from system/default value) when
# no max_searches_perc.<n>.when (if any) below matches.
max_searches_perc = 80
----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The search currently provided, index=foo field=value, does not consume SH memory at all. It's purely Indexer-unzip-rawdata induced CPU bound.

codebuilder
Influencer

All searches consume memory on the search head. Assuming you are not running your query directly from the indexer UI.

The indexers perform the work of the query then pass those results back to the SH for any additional parsing and display to the end user. Depending on the size of your search artifacts this can produce tremendous resource consumption on both sides.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

rvany
Communicator

You may have a look at this slide deck, especially slide 27. It's a pure streaming command search using some field extraction. That kind of search is done on the indexer and it depends on uncompressing all the packed raw data files, which - as we now know - are all on one disk => slow

Of course, displaying the events will take some small amount of memory, but really, that won't be the bottleneck in this scenario.

0 Karma

rvany
Communicator

just added the missing link...

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

The search currently provided does not do any additional work on the SH, it's all map and no significant reduce.

In order to help this person we first need to understand their goals, not throw around tons of deep-dive tuning.

codebuilder
Influencer

In order to help this person we should also not provide inaccurate information, such as the idea that searches do not consume memory on a search head.

----
An upvote would be appreciated and Accept Solution if it helps!
0 Karma

martin_mueller
SplunkTrust
SplunkTrust

This search doesn't, there is no command running on the SH.
It'd be different if there was a high-cardinality stats, a transaction, etc.

While you're defining what information to provide, I wouldn't recommend recommending max_searches_per_cpu = 16. It's a good way to thrash your indexing tier.

qazwsxe
New Member

@adonio♦ Could you help me ?
Thanks 🙂

0 Karma

mbagali_splunk
Splunk Employee
Splunk Employee

@qazwsxe

For Faster Search, you need to be specific. Use source or sourcetype in your search and make use of Time range picker to search the logs in the specific time range.

Also, we have below modes of searching in Splunk:

  • Fast Mode
  • Smart Mode
  • Verbose Mode
0 Karma

qazwsxe
New Member

I also used sourcetype and specified fast mode. But the speed is still very slow. I don't know how to solve it.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Be more specific in describing what goal you want your search to achieve. I doubt it's "list millions of events on screen" because there's no value in that.

qazwsxe
New Member

I want to extract hundreds of millions of data from billions of data by simple keyword search, but the speed is too slow. No matter how much data is searched, CPU and memory usage have not changed significantly. Excuse me, is there something wrong with my usage? I just want to speed up my search.

0 Karma

martin_mueller
SplunkTrust
SplunkTrust

Okay, what do you want to do with those hundreds of millions of data?

0 Karma

qazwsxe
New Member

Because of the large amount of data, the data I get through keyword search will have a lot of correlation, so I will extract a lot of data, but I can not add restrictions, abandon part of the data, which will lead to slow speed. I am eager to solve this problem.
Thanks:)

0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...