|
Are search-time fields slow? Can I rely on them to efficiently sort through my data? Are there significant differences in searching on automatically created fields from the text of my events, vs fields that I configure in manually? Are some types of extractions faster than others? |
|
Mostly, search-time fields have superior performance to parse-time (indexed) fields, regardless of whether they are explicitly configured. When running a search that includes a term such as The Splunk search machinery presumes that In all cases, the post-filtering is applied to the (hopefully) small
set of events that actually contain the Ideally the index-based filtering is the most important factor in the
speed of your search, but there are cases when search-time extraction
must be applied to a large percentage of events. For example if
almost all of your events have the word |
|
jrodman has a good answer. I just want to add give a couple of examples of scenarios where indexed fields are the best answer:
These are aren't your normal scenarios. Using indexed fields are certainly a great option for these situations, but require a lot more maintenance than the normal search-time field extraction setups. So whenever possible, go with field extractions over indexed fields. Also keep in mind that for the occasional field extraction, you can always use a BTW, Does anyone know if there is a way to profile the regex within splunk? Perhaps find out which ones are the most costly? (I think that's on topic here) To add color to your examples: Yes, and yes, but do evaluate if these fields will be ones that narrow your dataset by a several orders of magnitude. If yes, and you search on them significantly, the indexed field choice is probably worthwhile. If they are only likely to filter your search by 100 to 1 or so, it may not be worth it. At around 10 to 1, it is unlikely to be worth it. For your myclass/mypackage/myfunction example, it can be performant to just search on the three fields, unless these terms are quite common in other contexts.
(24 Mar '10, 18:10)
jrodman ♦
As for your trick for doing the extractions for only some searches, this should be less necessary as we get better at making the UI be less demanding of all fields. Certainly for scheduled searches and for command line searches in 4.0, you don't have to pay for fields you don't use. We don't have search profiling of any significant sort in 4.0, but it's currently under discussion (which makes me think it's 4.2-ish).
(24 Mar '10, 18:14)
jrodman ♦
|
