Solved: Need to customize log4j sourcetype

Adam · ‎06-15-2010

The logs I'm trying to index are in a log4j style, and entries such as

2010-06-15 09:04:08,204 [[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel
.Default (self-tuning)'][intOrdId=17746,intVrsn=18846] DEBUG com.att.canopi.idis
.ordermanagement.sms.cramer.adapter.util.ReflectionsTools.ipagi1a1.snt.bst.bls.c
om - Set value1:Customer100 field name1:customerName

are properly split into unique entries, however some entries are multi-line, and have embedded XML, and these are split wherever a date (with a different format than the date at the start of an entry) is found, so the log entry

2010-06-15 09:04:07,686 [[ACTIVE] ExecuteThread: '9' for queue: 'weblogic.kernel
.Default (self-tuning)'][intOrdId=17746,intVrsn=18846] INFO  fooHandler - <env:Envelope xmlns:env="
http://schemas.xmlsoap.org/soap/envelope/">
  <env:Header/>
  <env:Body>
    <v1:getFooDetailRequest xmlns:v1="ns1" correlationId="coid" systemId="mybox" clientId="cid" mock="false" requestTime="2010-06-15T09:04:07.643-04:00">
       <v1:fooId>foo<v1:fooId>
    </v1:getFooDetailRequest>
  </env:Body>
</env:Envelope>

gets split on the "getFooDetailRequest" line into two entries.

I've tried writing my own sourcetype in a props.conf with

MORE_THAN_100 = ^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2},\d{3} \[.*?

and setting the files to that sourcetype manually, but I get the same result.

Does anyone know how to modify the built-in log4j sourcetype (since it is so close to being perfect), or have any other suggestions?

gkanapathy · ‎06-15-2010

You can override any Splunk default configurations by setting the corresponding setting name (under the same stanza header, in this case [log4j]), in the "local" directory. Basically, a setting in a "local" dir version of a file overrides the corresponding setting in the corresponding stanza in the "default" dir. Although the fact is, it's probably better to just define it over:

[log4j]
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

This is probably the easiest to understand, though for high-volume systems (> 100 GB/day), use:

[log4j]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

View solution in original post

gkanapathy · ‎06-15-2010

You can override any Splunk default configurations by setting the corresponding setting name (under the same stanza header, in this case [log4j]), in the "local" directory. Basically, a setting in a "local" dir version of a file overrides the corresponding setting in the corresponding stanza in the "default" dir. Although the fact is, it's probably better to just define it over:

[log4j]
BREAK_ONLY_BEFORE =
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE = true
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

This is probably the easiest to understand, though for high-volume systems (> 100 GB/day), use:

[log4j]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})
TIME_FORMAT = %Y-%m-%d %H:%M:%S,%3N
TIME_PREFIX = ^
MAX_TIMESTAMP_LOOKAHEAD = 25

altinp · ‎12-01-2015

Watch out, reader! In the second, very useful, snippet, the backslash has been lost and is needed in front of each 'r', 'n', and 'd':

LINE_BREAKER = ([\r\n]+)(?=\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2},\d{3})

Adam · ‎06-15-2010

That worked! I was a little concerned because I saw a few items that still weren't split properly, but after a couple of minutes all the kinks worked out and now even XML with 10 timestamps isn't split. Thanks!

Need to customize log4j sourcetype

Threat Hunting Unlocked: How to Uplevel Your Threat Hunting With the PEAK Framework ...

Splunk APM: New Product Features + Community Office Hours Recap!

Index This | Forward, I’m heavy; backward, I’m not. What am I?