Splunk Search

How to do field Extraction for Complex Data Structure?

SplunkDash
Motivator

Hello,

I have source files with very inconsistent/ complex events/data structure. I wrote field extraction (inline) codes which are working for most of the cases, however not extracting field as expected for some cases. I included 3 sample events and my inline field extraction codes. Ayn help will be highly appreciated. Thank you!

Three Sample Events

June 10, 2021 10:41:39:993-0400 - INFO: 439749134|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439749134|4deef81s-6455-460b-bf41-c126700d1e9d|2607:fb91:118e:89c9:ad53:43b0:ccce:417c|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:41:39:993-0400  EDT 2021^iat= June 10, 2021 10:41:39:993-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

June 10, 2021 10:42:36:991-0400 - INFO: 439741123|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439741123|4deef81s-6455-460b-bf41-c126700d1e9d|65.115.214.106|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:42:36:991-0400  EDT 2021^iat= June 10, 2021 10:42:36:991-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

May 03, 2021 10:33:50:223-0400 - INFO: NON-8016|IdtokenAuth||authenticate‖lookupClaimVal is null|ERROR|SITEMINDER| QDIAUTH|vp22wsnnn012 |null|null|

 

My Inline field extraction codes: (Working for first 2 events but not the 3rd event)

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w+)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w+)\|(?P<SUBJECT>\w+)\|(?P<LESSION>\w+?\-?\w+?\-?\w+?\-?\w+?-\w+?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Does this help?

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w*)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w*)\|(?P<SUBJECT>\w*)\|(?P<LESSION>\w*?\-?\w*?\-?\w*?\-?\w*?\-?\w*?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

By the way, the pasting of the third message may have been corrupted and I have assumed that there should be 4 pipes in the middle

authenticate||||lookupClaimVal is null

It is often clearer to paste events etc into code blocks to avoid spurious substitutions being made!

SplunkDash
Motivator

Hello,

Thank you so much for your quick response, truly appreciate it. I think we don't have a better choice based on the quality of data. Thank you again.

0 Karma
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...