I'm running into some really slow performance searching on WMI sources. In this case I'm just trying to get some general counts, I don't even need most of the fields, but the job inspector is telling me that command.search.kv
is using probably some 90% of my overall search time. I took a look in the search.log
file and saw that splunk keep trying to disable all three of the wel-*
extractors but it apparently failed to do so.
So I started digging in. I looked at the default extractors and quickly came to the conclusion that I don't want any of them.
From the default system props.conf
:
[source::WMI...]
REPORT-MESSAGE = wel-message, wel-eq-kv, wel-col-kv
From the default system tranforms.conf
:
[wel-message]
REGEX = (?sm)(?<_pre_msg>.+)\nMessage=(?<Message>.+)
CLEAN_KEYS = false
[wel-eq-kv]
SOURCE_KEY = _pre_msg
DELIMS = "\n","="
MV_ADD = true
[wel-col-kv]
SOURCE_KEY = Message
REGEX = \n([^:\n\r]+):[ \t]++([^\n]*)
FORMAT = $1::$2
MV_ADD = true
Here's what I found, the wel-message
extractor has to run first, it extracts two fields which are used by the wel-eq-kv
and wel-col-kv
extractors which extract their own set of fields.
The wel-message
seems to basically just separate everything before "Message=" from everything after it. (Apparently "Message" is a multi-line field.) In my WMI events, I don't seem to have any "Message" entries at all, which I'm assuming is because I'm not using WMI to collect log files, I'm using WinEventLog for that purpose.)
So if wel-message
matches, then _pre_msg
is searched for "key=value" pairs using wel-eq-kv
, and then Message
is searched for "key:value" fields with the wel-col-kv
extractor.
At first glance it seemed odd that I didn't see a KV_MODE=none
props entry, since I figured that with 3 custom transforms stanzas all the fields would be covered, but their actually not. The reason for this is that in wel-message
(the first extractor), the _pre_msg
field is ONLY extracted if the Message
field is found, otherwise the regex doesn't match; therefore the auto key/value extraction is still needed based on the default setup.
...
So, does any one have any thoughts as to why these three extractors are singly handedly killing my source=WMI:*
performance when I don't have "Message=" in any of my events, and therefore most of these extractors have no nothing to do.
For anyone who cares, here is the search I'm using. It's running for the last whole day (timeframe name "Yesterday")
source=WMI:* | fields - eventtype | stats count as events, avg(linecount) as lpe, dc(Name) as names, by sourcetype, host | rename sourcetype as st, host as h | eval lpe=round(lpe,1)
Well, here's yet another option. I did a bunch more test and can confirm that the problem is completely contained within the wel-message
transformer.
I simply added the following to $SPLUNK_HOME/etc/system/local/transformer.conf
Note: There are two different options that both work rather well.
Option 1: If you like only making minimalistic changes, then this single character change will give you the most drastic speedup:
[wel-message]
REGEX = (?sm)^(?<_pre_msg>.+)\nMessage=(?<Message>.+)
The only change here is anchoring the beginning of _pre_msg
to the start of the string using "^". It appears that without this the regex engine wanders around aimlessly for a while to try to find the correct match (since the regex runs in "search" mode). Given the greedy nature of both .+
, the whole string should always be matched (that is, if "\nMessage=" exists within the string) but the PCRE must not be optimizing this case so well. (Side note, if I use the default regex using the "rex" command, I don't see this performance issue at all, which seems weird to me. Perhaps there is some difference in how PCRE is used between transform entries vs rex
commands... rather weird.)
Option 2: I think this is an even better regex.
[wel-message]
REGEX = (?sm)^[^\n]+\n(?<_pre_msg>.+?)\nMessage=(?<Message>.+)$
This anchors both ends of the regex to the first/last character of the text string (so the regex engine doesn't have any guess work in that regard). It also matches and disregards the timestamp portion of the event. Also _pre_msg
is no longer greedy, so at the first occurrence of "Message=" the first half of the regex can be done and we don't need to keep searching for later occurrences of "Message=" and then backtracking. (This should be slightly faster and the only time you would see an actual different in the output would be if your event had more than one "Message" fields; which seems unlikely.)
Either of these two options have shown to be considerably faster on my system then the default wmi-message
REGEX. I was able to extract a days worth of WMI events with the kv extractor only using 2 seconds of time, which is incredibly better than the original 40 seconds.
I'm hoping this will be address in future releases of splunk, but in the meantime you can speed up your own install with one of the entries show above.
I haven't quite parsed through all of this, but it looks like it is full of awesomeness!
Well, here's yet another option. I did a bunch more test and can confirm that the problem is completely contained within the wel-message
transformer.
I simply added the following to $SPLUNK_HOME/etc/system/local/transformer.conf
Note: There are two different options that both work rather well.
Option 1: If you like only making minimalistic changes, then this single character change will give you the most drastic speedup:
[wel-message]
REGEX = (?sm)^(?<_pre_msg>.+)\nMessage=(?<Message>.+)
The only change here is anchoring the beginning of _pre_msg
to the start of the string using "^". It appears that without this the regex engine wanders around aimlessly for a while to try to find the correct match (since the regex runs in "search" mode). Given the greedy nature of both .+
, the whole string should always be matched (that is, if "\nMessage=" exists within the string) but the PCRE must not be optimizing this case so well. (Side note, if I use the default regex using the "rex" command, I don't see this performance issue at all, which seems weird to me. Perhaps there is some difference in how PCRE is used between transform entries vs rex
commands... rather weird.)
Option 2: I think this is an even better regex.
[wel-message]
REGEX = (?sm)^[^\n]+\n(?<_pre_msg>.+?)\nMessage=(?<Message>.+)$
This anchors both ends of the regex to the first/last character of the text string (so the regex engine doesn't have any guess work in that regard). It also matches and disregards the timestamp portion of the event. Also _pre_msg
is no longer greedy, so at the first occurrence of "Message=" the first half of the regex can be done and we don't need to keep searching for later occurrences of "Message=" and then backtracking. (This should be slightly faster and the only time you would see an actual different in the output would be if your event had more than one "Message" fields; which seems unlikely.)
Either of these two options have shown to be considerably faster on my system then the default wmi-message
REGEX. I was able to extract a days worth of WMI events with the kv extractor only using 2 seconds of time, which is incredibly better than the original 40 seconds.
I'm hoping this will be address in future releases of splunk, but in the meantime you can speed up your own install with one of the entries show above.
I setup the following in my local system folder:
props.conf:
[source::WMI...]
KV_MODE = none
# Disable splunk's built in message extraction stuff
REPORT-MESSAGE =
REPORT-kveq = my_wmi-kv-eq
transforms.conf:
[my_wmi-kv-eq]
DELIMS = "\n","="
MV_ADD = true
I can now re-run the same search, and instead of command.search.kv
take 40+ seconds, it now takes just slightly over 2 seconds. (Now, if I could just stop typer
from running I could shave off another 8 seconds.)
Update: I just found out that if you go with the standard auto key/value extraction and no custom transformer sections, the command.search.kv
time drops to under a second. Which moves the bottleneck purely on typer
and rawdata
(I/O performance), right where it should be.