I'm populating a summary index with data that I would like to be able to search very quickly using tstats
. I've got this mostly working but can't quite seem to figure out if I'm doing something wrong or why it isn't working as expected.
Summary index generating search: search_foo
Fields to index: a, b, c, d
I want to be able to write a search like this: | tstats sum(a), sum(b), values(c) WHERE index=summary source=search_foo by d
Here are the settings I'm trying to make work:
props.conf:
[source::search_foo]
TRANSFORMS-index-fields = search_foo_indexfields
transforms.conf:
[search_foo_indexfields]
REGEX = \b(a|b|c|d)=("?)([^"]*?)\2(?:,|$)
FORMAT = $1::$3
WRITE_META = true
REPEAT_MATCH = true
I know that I have all the names and meta settings correctly because the first field does get added as an indexed field. (I confirmed this by running exporttool -csv
on one of the buckets and confirmed that the field showed up in the _meta
field. Splunk seems to be ignoring the REPEAT_MATCH
setting.
So as a workaround, I've made REGEX match all 4 fields directly and index them all at once. (e.g., FORMAT = a::$1 b::$2 c::$3 d::$4
) This works, but I really don't like the approach because it assumes a hard-coded order of the fields, which seems unnecessarily fragile. In my actual use case, sometimes "a" or "b'' is missing from the data. I've been able to make the regex cope with that fact, but that still results in an empty indexed field. (In other words, if "b" is missing form the data, I still see b::
in _meta
when I run exporttool.) I also considered making 4 transforms entries, one for each field, but that seems silly as well.
Bonus question: Here's one somewhat related question, how to I avoid double escaping backslashes in my solution. One of my actual fields a "source", so Window's paths show up in the raw data with escaped backslashes ( \\
) which gets translated to double escaped ( \\\\
) in the _meta
field, which then means that at search time, the indexed fields look like "C:\Windows\.." instead of "C:\Window...".
Double-check that the source
value for the data in your Summary Index matches your stanza header specification.
Many people do not know about _KEY_1
and _VAL_1
(you can search on it). Try this:
[search_foo_indexfields]
REGEX = \b(?<_KEY_1>a|b|c|d)=("?)(?<_VAL_1>[^"]*?)\2(?:,|$)
WRITE_META = true
MV_ADD= true
Okay, so this adds a new field with the name of the transforms stanza ("search_foo_indexfields") with the value of either "a" or "b".
Just confirmed it in the _meta
field dumped out with exporttool. "... date_mday::25 date_zone::0 search_foo_indexfields::a"
From the docs, it's not 100% clear if _KEY_x
and _VAL_x
is supported at index time, but it doesn't seem to be working.
You have to deploy these configurations to the INDEXING SERVER. In most cases this is your indexers HOWEVER in the case of Summary Indices, by default (unless you went out of your way to change it), these are stored on the SEARCH HEAD so you will have to EITHER deploy the configurations to the Search Head OR make sure that Summary Indexing happens on the Indexers.
I assume _KEY_!
is a typo for _KEY_1
? I was aware of that syntax, but didn't think it held any advantages here. (But I'll give it a try.) I haven't tried MV_ADD as the docs say, "This attribute is only valid for search-time field extractions."
Yes, fixed.
How about creating separate TRANSFORMS stanza for each field, so that even if one field is missing, the other show up independently?
For double escaping, may be try applying some command in the summary index search to remove escaped backslash.
I'd like to avoid on transforms stanza per field if possible. My real use case has more than just 4 fields. (Not an unmanageable number, just seems like the has to be a better solution.)
I'm pretty sure the backslash escaping is happing automatically by the summary indexing plumbing commands (I'm just using the defaults builtin alert actions for summary indexing) And in fact, I'm already dealing with escaped backlashes in part of my search, so I know the've been taken care of in my base search.
And yes I could remove them at search time, but since I'm in control of the data generation, it seems silly to deal with something in every search I write, if I could fix the issue once when the data is written.
Give this a try?
[search_foo_indexfields]
REGEX = \b(?<_KEY_1>(a|b|c|d))=("?)(?<_VAL_1>[^"]*?)\2(?:,|$)
WRITE_META = true
REPEAT_MATCH = true