Solved: WMI kv field extraction is very slow... (How to m...

Lowell · ‎07-02-2010

I'm running into some really slow performance searching on WMI sources. In this case I'm just trying to get some general counts, I don't even need most of the fields, but the job inspector is telling me that command.search.kv is using probably some 90% of my overall search time. I took a look in the search.log file and saw that splunk keep trying to disable all three of the wel-* extractors but it apparently failed to do so.

So I started digging in. I looked at the default extractors and quickly came to the conclusion that I don't want any of them.

From the default system props.conf:

[source::WMI...]
REPORT-MESSAGE = wel-message, wel-eq-kv, wel-col-kv

From the default system tranforms.conf:

[wel-message]
REGEX = (?sm)(?<_pre_msg>.+)\nMessage=(?<Message>.+)
CLEAN_KEYS = false

[wel-eq-kv]
SOURCE_KEY = _pre_msg
DELIMS     = "\n","="
MV_ADD     = true

[wel-col-kv]
SOURCE_KEY = Message
REGEX      = \n([^:\n\r]+):[ \t]++([^\n]*)
FORMAT     = $1::$2
MV_ADD     = true

Here's what I found, the wel-message extractor has to run first, it extracts two fields which are used by the wel-eq-kv and wel-col-kv extractors which extract their own set of fields.

The wel-message seems to basically just separate everything before "Message=" from everything after it. (Apparently "Message" is a multi-line field.) In my WMI events, I don't seem to have any "Message" entries at all, which I'm assuming is because I'm not using WMI to collect log files, I'm using WinEventLog for that purpose.)

So if wel-message matches, then _pre_msg is searched for "key=value" pairs using wel-eq-kv, and then Message is searched for "key:value" fields with the wel-col-kv extractor.

At first glance it seemed odd that I didn't see a KV_MODE=none props entry, since I figured that with 3 custom transforms stanzas all the fields would be covered, but their actually not. The reason for this is that in wel-message (the first extractor), the _pre_msg field is ONLY extracted if the Message field is found, otherwise the regex doesn't match; therefore the auto key/value extraction is still needed based on the default setup.

...

So, does any one have any thoughts as to why these three extractors are singly handedly killing my source=WMI:* performance when I don't have "Message=" in any of my events, and therefore most of these extractors have no nothing to do.

For anyone who cares, here is the search I'm using. It's running for the last whole day (timeframe name "Yesterday")

source=WMI:* | fields - eventtype | stats count as events, avg(linecount) as lpe, dc(Name) as names, by sourcetype, host | rename sourcetype as st, host as h | eval lpe=round(lpe,1)

Lowell · ‎07-03-2010

Well, here's yet another option. I did a bunch more test and can confirm that the problem is completely contained within the wel-message transformer.

I simply added the following to $SPLUNK_HOME/etc/system/local/transformer.conf

Note: There are two different options that both work rather well.

Option 1: If you like only making minimalistic changes, then this single character change will give you the most drastic speedup:

[wel-message]
REGEX = (?sm)^(?<_pre_msg>.+)\nMessage=(?<Message>.+)

The only change here is anchoring the beginning of _pre_msg to the start of the string using "^". It appears that without this the regex engine wanders around aimlessly for a while to try to find the correct match (since the regex runs in "search" mode). Given the greedy nature of both .+, the whole string should always be matched (that is, if "\nMessage=" exists within the string) but the PCRE must not be optimizing this case so well. (Side note, if I use the default regex using the "rex" command, I don't see this performance issue at all, which seems weird to me. Perhaps there is some difference in how PCRE is used between transform entries vs rex commands... rather weird.)

Option 2: I think this is an even better regex.

[wel-message]
REGEX = (?sm)^[^\n]+\n(?<_pre_msg>.+?)\nMessage=(?<Message>.+)$

This anchors both ends of the regex to the first/last character of the text string (so the regex engine doesn't have any guess work in that regard). It also matches and disregards the timestamp portion of the event. Also _pre_msg is no longer greedy, so at the first occurrence of "Message=" the first half of the regex can be done and we don't need to keep searching for later occurrences of "Message=" and then backtracking. (This should be slightly faster and the only time you would see an actual different in the output would be if your event had more than one "Message" fields; which seems unlikely.)

Either of these two options have shown to be considerably faster on my system then the default wmi-message REGEX. I was able to extract a days worth of WMI events with the kv extractor only using 2 seconds of time, which is incredibly better than the original 40 seconds.

I'm hoping this will be address in future releases of splunk, but in the meantime you can speed up your own install with one of the entries show above.

View solution in original post

muebel · ‎07-06-2010

I haven't quite parsed through all of this, but it looks like it is full of awesomeness!

Lowell · ‎07-03-2010

Well, here's yet another option. I did a bunch more test and can confirm that the problem is completely contained within the wel-message transformer.

I simply added the following to $SPLUNK_HOME/etc/system/local/transformer.conf

Note: There are two different options that both work rather well.

Option 1: If you like only making minimalistic changes, then this single character change will give you the most drastic speedup:

[wel-message]
REGEX = (?sm)^(?<_pre_msg>.+)\nMessage=(?<Message>.+)

The only change here is anchoring the beginning of _pre_msg to the start of the string using "^". It appears that without this the regex engine wanders around aimlessly for a while to try to find the correct match (since the regex runs in "search" mode). Given the greedy nature of both .+, the whole string should always be matched (that is, if "\nMessage=" exists within the string) but the PCRE must not be optimizing this case so well. (Side note, if I use the default regex using the "rex" command, I don't see this performance issue at all, which seems weird to me. Perhaps there is some difference in how PCRE is used between transform entries vs rex commands... rather weird.)

Option 2: I think this is an even better regex.

[wel-message]
REGEX = (?sm)^[^\n]+\n(?<_pre_msg>.+?)\nMessage=(?<Message>.+)$

This anchors both ends of the regex to the first/last character of the text string (so the regex engine doesn't have any guess work in that regard). It also matches and disregards the timestamp portion of the event. Also _pre_msg is no longer greedy, so at the first occurrence of "Message=" the first half of the regex can be done and we don't need to keep searching for later occurrences of "Message=" and then backtracking. (This should be slightly faster and the only time you would see an actual different in the output would be if your event had more than one "Message" fields; which seems unlikely.)

Either of these two options have shown to be considerably faster on my system then the default wmi-message REGEX. I was able to extract a days worth of WMI events with the kv extractor only using 2 seconds of time, which is incredibly better than the original 40 seconds.

I'm hoping this will be address in future releases of splunk, but in the meantime you can speed up your own install with one of the entries show above.

Lowell · ‎07-02-2010

I setup the following in my local system folder:

props.conf:

[source::WMI...]
KV_MODE = none
# Disable splunk's built in message extraction stuff
REPORT-MESSAGE =
REPORT-kveq = my_wmi-kv-eq

transforms.conf:

[my_wmi-kv-eq]
DELIMS     = "\n","="
MV_ADD     = true

I can now re-run the same search, and instead of command.search.kv take 40+ seconds, it now takes just slightly over 2 seconds. (Now, if I could just stop typer from running I could shave off another 8 seconds.)

Update: I just found out that if you go with the standard auto key/value extraction and no custom transformer sections, the command.search.kv time drops to under a second. Which moves the bottleneck purely on typer and rawdata (I/O performance), right where it should be.

WMI kv field extraction is very slow... (How to make your WMI searches 20x faster)

Introducing the Splunk Community Dashboard Challenge!

Get the T-shirt to Prove You Survived Splunk University Bootcamp

Wondering How to Build Resiliency in the Cloud?