I want to get hundreds of millions of data from billions of data, but it takes more than an hour each time.
I just used the simplest search: index="test" name=jack
But, it's very slow.
Then I checked the memory and CPU usage. Each search takes only 200-300 MB of memory.
So I modified the max_mem_usage_mb, search_process_memory_usage_percentage_threshold and search_process_memory_usage_threshold parameters
in $SPLUNK_HOME/etc/apps/search/local/limits.conf
, but they didn't seem to play a significant role.
Is there any effective way to improve the speed of my search?
Thanks! 🙂
You'll be much faster in finding Jack's company if you also specify how to find a company in your search. What that looks like depends on your data which you didn't share with us - knowing your data would help.
That could look like one of these:
index=foo sourcetype=company_register name=jack
index=foo category=employees name=jack
etc.
If you have an accelerated datamodel, it could look like this:
| tstats summariesonly=t values(company) as companies from datamodel=your_model where your_model.name=jack
To chain that you could build a dashboard with in-page drilldowns that steps through the tree you expect in your data.
Let me know when you're ready to stop trolling and want to tell us what you want to achieve.
I just want to improve the speed of my search.I don't know what you mean!
@ qazwsxe
Simplest searches are the most vague searches as well sometimes. For best performances below are a few suggestions you can use to optimize the performance of your query.
Be precise with your base search as much as you can. It helps reduce the scope of search. Great help in case of larger indexes. So Include everything you know about the events you want.
Adding a sourcetype after your index will help narrowing the scope of searched events i.e. like
`index="test" sourcetype=<yourST> name=jack`
Keep your time range precise about your requirement.
If you working in an indexer cluster. Make sure your index data is being split across all indexers. This will help avoid the load on single indexer instance for a search.
And Lastly, do take a look at the job inspector for your searches and analyse where the most time is getting spent. They will be the areas to work upon.
Thanks.
you can save your query as a report and the accelerate it for the period you want. This can be an alternative to give you faster results at the time of your search but know that this would still cause extensive resource usage (only in the background i.e. at off times when you might not be searching).
You would also have to use a transforming command in your search to make it capable of acceleration.
Check this for more details -
https://docs.splunk.com/Documentation/Splunk/7.3.0/Report/Acceleratereports
https://docs.splunk.com/Documentation/Splunk/7.3.0/Knowledge/Manageacceleratedsearchsummaries
I will try it ,thanks
Even if the search conditions are accurate, the speed is slow.There's a lot of data, maybe billions,so the search speed is slow.
When the search command is executed, memory and CPU usage are minimal.Using the same data, ELK takes up a lot of memory and cpu, and it's much faster than splunk.
So, what should I do?
Thanks.
So what's your environment (regarding Indexer(s) and searchhead(s), storage)? And where are you examining the resource consumption?
I just created an index called test. I view CPU and memory usage through task management.
My first question was: what is your environment? Do you run Splunk in a standalone installation? What kind of storage do you have attached?
My test environments are win10 and Ubuntu 16.I run Splunk in a standalone installation.My storage is SSD,and there's a lot of free space
Ok, this way we could get further. Should have asked earlier, but: are you using virtual machines? What are the characteristics of your servers/VMs regarding CPU, RAM. I assume you have one (i.e. "1") SSD in your environment?
I use physical machines. I am not sure the information of CPU.It's for server,so its performance will not be weak.And the RAM is 240GB.Finally,you're right,thereis one SSD.
For the Linux machine you could use cat /proc/cpuinfo
for getting processor information. The interesting figures are "number of cores" and "cpu speed". Regarding the one SSD: Splunk recommends 800IOPS, which one SSD probably isn't able to deliver. So this may be your bottleneck.
In your first post you wrote "hundreds of millions of data from billions of data" - what is this in GB? What is this as event count?
Still there's not sure, what you want to achieve with your query. As @martin_mueller pointed out - there's not much sense in just displaying "hundreds of millions of data". So you probably want to do something else with your data. Would be very helpful - if not essential - to get an idea of what you're planning to do?
@martin_mueller♦
@qazwsxe one of the best way to index and monitor KPI information would be to use Metrics Index available from version 7.x. With each release there is significant new features introduced to Metrics Indexing, so do explore the latest version and features (like current latest version 7.3.0 introduces Metrics Rollup). https://docs.splunk.com/Documentation/Splunk/latest/Metrics/Overview
https://www.splunk.com/blog/2019/06/18/navigating-data-chaos-with-splunk-metrics-workspace.html