Solved: Event pattern for sourcetype

krishnani · ‎08-10-2016

I'm troubleshooting some issues with one sourcetype and realized that Splunk is not indexing events very well. The format for these events is a little different, but there are clear boundaries and these are always prefixed by =LOGLEVEL REPORT====Date====, and end with two lines feeds. it would be nice if splunk could split events on these boundaries.

Break events based on these boundaries
Define a logLevel field based on the text before "REPORT"

Example events:
=TYPE REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX

How to configure the props.conf?

jkat54 · ‎08-10-2016

[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

View solution in original post

krishnani · ‎08-10-2016

Thanks guys 🙂

jkat54 · ‎08-10-2016

[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

lguinn2 · ‎08-10-2016

Well, new events do not always begin with "=LOGLEVEL REPORT====" as your example shows. (Unless "TYPE" is a log level, or maybe an abstract example.) But I would do this in props.conf

[yoursourcetypehere]
TIME_PREFIX = \=\w+ REPORT====
MAX_TIMESTAMP_LOOKAHEAD=35
TIME_FORMAT=%d-%b-%Y::%H:%M:%S
EXTRACT-e1 = \=(<?loglevel>\w+) REPORT====
MAX_EVENTS = 500

This should actually be enough to get the events broken out correctly and with the right timestamp on each event. While it would be more efficient to create a LINEBREAKER to precisely identify the event boundary, I don't recommend that if you are new to Splunk or inexperienced with regular expressions.
By default, Spunk considers the line containing the timestamp to be the first line of the event. That default should work fine in your case.

BREAK_ONLY_BEFORE_DATE = true   #is the default

Note that I also included a setting for MAX_EVENTS. This controls the maximum number of lines per event (it isn't well named). The default is 128 lines per event - if Splunk is not separating events properly, this also could be the cause. I set the limit to 500 arbitrarily, but you should make sure that it is set to something reasonable for your data.

Event pattern for sourcetype

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

They're back! Join the SplunkTrust and MVP at .conf24

Enterprise Security Content Update (ESCU) | New Releases