Getting Data In

Event pattern for sourcetype

krishnani
New Member

I'm troubleshooting some issues with one sourcetype and realized that Splunk is not indexing events very well. The format for these events is a little different, but there are clear boundaries and these are always prefixed by =LOGLEVEL REPORT====Date====, and end with two lines feeds. it would be nice if splunk could split events on these boundaries.

  1. Break events based on these boundaries
  2. Define a logLevel field based on the text before "REPORT"

Example events:
=TYPE REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX

How to configure the props.conf?

Tags (1)
0 Karma
1 Solution

jkat54
SplunkTrust
SplunkTrust
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

alt text

View solution in original post

0 Karma

krishnani
New Member

Thanks guys 🙂

0 Karma

jkat54
SplunkTrust
SplunkTrust
[sourcetypeName]
TIME_PREFIX=\w+\sREPORT====
SHOULD_LINEMERGE=false
NO_BINARY_CHECK=true
LINE_BREAKER=(=)\w+\s\w+====
EXTRACT-loglevel=^(?<loglevel>\w+)

This method assumes "TYPE" in your example was the loglevel.

Works fine with sample data I created based on your examples:

=ERROR REPORT==== 23-May-2016::16:19:05 ===
HTTP access requested:XXXXXX
=WARN REPORT==== 23-May-2016::16:12:05 ===
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX
HTTP access requested:XXXXXX

And it uses a LINE_BREAKER instead of SHOULD_LINEMERGE=True which means it doesnt need the LINEMERGER part of the pipeline and thus it speeds up data ingestion / reduces resource usage.

This also removes the beggining "=" sign on each event, but hey... that's what we call license optimization where I come from. 😉

alt text

0 Karma

lguinn2
Legend

Well, new events do not always begin with "=LOGLEVEL REPORT====" as your example shows. (Unless "TYPE" is a log level, or maybe an abstract example.) But I would do this in props.conf

[yoursourcetypehere]
TIME_PREFIX = \=\w+ REPORT====
MAX_TIMESTAMP_LOOKAHEAD=35
TIME_FORMAT=%d-%b-%Y::%H:%M:%S
EXTRACT-e1 = \=(<?loglevel>\w+) REPORT====
MAX_EVENTS = 500

This should actually be enough to get the events broken out correctly and with the right timestamp on each event. While it would be more efficient to create a LINEBREAKER to precisely identify the event boundary, I don't recommend that if you are new to Splunk or inexperienced with regular expressions.
By default, Spunk considers the line containing the timestamp to be the first line of the event. That default should work fine in your case.

BREAK_ONLY_BEFORE_DATE = true   #is the default

Note that I also included a setting for MAX_EVENTS. This controls the maximum number of lines per event (it isn't well named). The default is 128 lines per event - if Splunk is not separating events properly, this also could be the cause. I set the limit to 500 arbitrarily, but you should make sure that it is set to something reasonable for your data.

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...