|
I'm running a search against about 1.2 million log records. Each record contains some geo tags and numeric values representing performance metrics. There are a total of about 45 key/values per record including the following:
The search query I'm running calculates a 90th percentile median performance value grouped by service id within a specific geographical region, service type and test ID. Here is an example query:
To my disappointment, this query is taking about 5 minutes to run completely on a fairly high end dedicated server (quad core X5570 2.93 GHz, 128GB memory, Raid 0 15K SAS + SSD cache) and much longer on the new hosted splunkstorm service. My question is if this level of performance should be expected for this amount of data and this type of search query. Are there any optimizations that could be made at index or search time in order to improve performance? Is there a significant hit on performance when applying |
|
A search like that across that amount of data on that hardware should take something closer to 30 seconds on a single-instance Splunk system, even assuming your If you're way off from that, I would try a couple of things first:
Just for reference, when I do a slightly smaller (just over 1,000,000 events) and somewhat simpler search than yours on a three-plus year old laptop, I go from taking about 130 seconds to about 70 (doubling the speed) when I turn off field discovery, and then down to 30 seconds (another doubling) in the "Advanced Charting" view (with preview still on), and down to 25 seconds on the command line without preview. I don't actually see any obvious improvements that can be made to your query while keeping the same results. However, I would be curious as to how it runs if you try each of the following:
It would also be helpful to know the final number of results returned as well as the scan count. The information in the "Inspect Search Job" page (under the "Actions" menu on the timeline search view) would be useful too, though maybe a bit obscure. One thing to note is that a single-instance Splunk is not able to take advantage of your hardware when running a single search. I would say that if you're trying to get this to run faster, you could probably run three, four, or even more Splunk instances in a distributed config on that same machine to better utilize it, but that setup takes a bit of work and knowledge to get right. Finally, seeing a few lines of your data might indicate something, though if it's CDN access logs, that's unlikely. UPDATE: Try this and see how it compares. Be sure to turn preview off:
Also, what I suspect is happening is that the
which should only look at the most recent 9999 events to compute the 90th percentile, rather than scanning all 1.2 million events. This should be a lot faster than the previous, though different. Oh, i forgot something important. The GUI and various settings in it can make a huge difference. Editing this above.
(04 Nov '11, 14:07)
gkanapathy ♦
Thanks. Here are the stats for the searches you recommended: GUI with preview/field discovery on: ~5 minutes
GUI with preview/field discover off without CLI as-is: 1:54
CLI without So, the big hit is for
(05 Nov '11, 07:39)
cloudharmony
I will update my answer with a suggestion on something to try to improve performance, but I do not know if it will help. (I believe it will help if you have a distributed/multi-indexer Splunk systems, but I don't know about a single-node.) As for pre-indexing specific fields, retrieval is not really the problem here, and there isn't something currently that will help. If you need to do this over time, using new and more data sets however, you can and should use summary indexing to pre-compute results over subsets of the data, so that you can get the full results faster.
(07 Nov '11, 09:25)
gkanapathy ♦
Just updated again. Made a mistake. Basically, I forgot to remove
(07 Nov '11, 10:10)
gkanapathy ♦
|
|
Cloudharmony, There are a few things to consider that might perk up your result times:
Beyond the above, yes you do take performance hits with various splunk analytic commands and there is some guidance to help improve this in the Splunk docs. Sean |
