Solved: Re: How to improve the speed of Spunk search - Page 2

qazwsxe · ‎06-27-2019

I want to get hundreds of millions of data from billions of data, but it takes more than an hour each time.
I just used the simplest search: index="test" name=jack But, it's very slow.

Then I checked the memory and CPU usage. Each search takes only 200-300 MB of memory.
So I modified the max_mem_usage_mb, search_process_memory_usage_percentage_threshold and search_process_memory_usage_threshold parameters in $SPLUNK_HOME/etc/apps/search/local/limits.conf, but they didn't seem to play a significant role.
Is there any effective way to improve the speed of my search?
Thanks! 🙂

martin_mueller · ‎07-05-2019

You'll be much faster in finding Jack's company if you also specify how to find a company in your search. What that looks like depends on your data which you didn't share with us - knowing your data would help.

That could look like one of these:

index=foo sourcetype=company_register name=jack
index=foo category=employees name=jack
etc.

If you have an accelerated datamodel, it could look like this:

| tstats summariesonly=t values(company) as companies from datamodel=your_model where your_model.name=jack

To chain that you could build a dashboard with in-page drilldowns that steps through the tree you expect in your data.

View solution in original post

codebuilder · ‎07-04-2019

See my previous answer on tuning your SHC for performance.

----
An upvote would be appreciated and Accept Solution if it helps!

qazwsxe · ‎07-04-2019

I tried, but it didn't work.
:(

qazwsxe · ‎07-02-2019

Thanks,I'll try more indexers

woodcock · ‎07-02-2019

You improve (the speed of) your search by DOING SOMETHING with your millions of events by piping them to MORE SPL by adding |Other SPL here. What is your SPL now? Show us your sample events and a mockup of your desired FINAL OUTPUT. Make it work first, the worry about optimizing it.

woodcock · ‎07-01-2019

Splunk is not an ETL tool (it is a Needle-in-the-haystack tool, not a Forklift-the-haystack tool). There is no way to make it perform acceptably when the final output is millions of rows/events. It can process billions down to millions, and millions down to hundreds or maybe thousands, but that's it. You need to figure out what you really need to do and probably you don't need the millions as the final output, but rather as an input to some other calculation which can probably be done in Splunk. Otherwise, use another, more appropriate tool. Splunk is not a part of the pipeline tool, it is an end of the pipeline tool, to give the final conclusions.

qazwsxe · ‎07-01-2019

Because of the large amount of data, the data I get through keyword search will have a lot of correlation, so I will extract a lot of data, but I can not add restrictions, abandon part of the data, which will lead to slow speed. I am eager to solve this problem.
Thanks:)

codebuilder · ‎07-01-2019

Increase the resources available to Splunk at the search head level.
Modify the settings below (based on your environment) at $SPLUNK_HOME:/etc/system/local/limits.conf and cycle the search head(s).

[defaults]
max_mem_usage_mb = 16000

[search]
# If number of cpu's in your machine is 14 then total system wide number of
# concurrent searches this machine can handle is 20.
# which is base_max_searches + max_searches_per_cpu x num_cpus = 6 + 14 x 1 = 20
base_max_searches = 6
max_searches_per_cpu = 16

[scheduler]
# Percent of total concurrent searches that will be used by scheduler is
# total concurrency x max_searches_perc = 20 x 60% = 12 scheduled searches
# User default value (needed only if different from system/default value) when
# no max_searches_perc.<n>.when (if any) below matches.
max_searches_perc = 80

----
An upvote would be appreciated and Accept Solution if it helps!

martin_mueller · ‎07-01-2019

The search currently provided, index=foo field=value, does not consume SH memory at all. It's purely Indexer-unzip-rawdata induced CPU bound.

codebuilder · ‎07-01-2019

All searches consume memory on the search head. Assuming you are not running your query directly from the indexer UI.

The indexers perform the work of the query then pass those results back to the SH for any additional parsing and display to the end user. Depending on the size of your search artifacts this can produce tremendous resource consumption on both sides.

----
An upvote would be appreciated and Accept Solution if it helps!

rvany · ‎07-02-2019

You may have a look at this slide deck, especially slide 27. It's a pure streaming command search using some field extraction. That kind of search is done on the indexer and it depends on uncompressing all the packed raw data files, which - as we now know - are all on one disk => slow

Of course, displaying the events will take some small amount of memory, but really, that won't be the bottleneck in this scenario.

rvany · ‎07-03-2019

just added the missing link...

martin_mueller · ‎07-01-2019

The search currently provided does not do any additional work on the SH, it's all map and no significant reduce.

In order to help this person we first need to understand their goals, not throw around tons of deep-dive tuning.

codebuilder · ‎07-01-2019

In order to help this person we should also not provide inaccurate information, such as the idea that searches do not consume memory on a search head.

----
An upvote would be appreciated and Accept Solution if it helps!

martin_mueller · ‎07-01-2019

This search doesn't, there is no command running on the SH.
It'd be different if there was a high-cardinality stats, a transaction, etc.

While you're defining what information to provide, I wouldn't recommend recommending max_searches_per_cpu = 16. It's a good way to thrash your indexing tier.

qazwsxe · ‎06-30-2019

@adonio♦ Could you help me ?
Thanks 🙂

mbagali_splunk · ‎06-28-2019

@qazwsxe

For Faster Search, you need to be specific. Use source or sourcetype in your search and make use of Time range picker to search the logs in the specific time range.

Also, we have below modes of searching in Splunk:

Fast Mode
Smart Mode
Verbose Mode

qazwsxe · ‎06-30-2019

I also used sourcetype and specified fast mode. But the speed is still very slow. I don't know how to solve it.

martin_mueller · ‎06-28-2019

Be more specific in describing what goal you want your search to achieve. I doubt it's "list millions of events on screen" because there's no value in that.

qazwsxe · ‎06-30-2019

I want to extract hundreds of millions of data from billions of data by simple keyword search, but the speed is too slow. No matter how much data is searched, CPU and memory usage have not changed significantly. Excuse me, is there something wrong with my usage? I just want to speed up my search.

martin_mueller · ‎07-01-2019

Okay, what do you want to do with those hundreds of millions of data?

qazwsxe · ‎07-01-2019

Because of the large amount of data, the data I get through keyword search will have a lot of correlation, so I will extract a lot of data, but I can not add restrictions, abandon part of the data, which will lead to slow speed. I am eager to solve this problem.
Thanks:)

How to improve the speed of Spunk search

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!

Introducing the Splunk Community Dashboard Challenge!