Splunk Search

How to do field Extraction for Complex Data Structure?

SplunkDash
Motivator

Hello,

I have source files with very inconsistent/ complex events/data structure. I wrote field extraction (inline) codes which are working for most of the cases, however not extracting field as expected for some cases. I included 3 sample events and my inline field extraction codes. Ayn help will be highly appreciated. Thank you!

Three Sample Events

June 10, 2021 10:41:39:993-0400 - INFO: 439749134|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439749134|4deef81s-6455-460b-bf41-c126700d1e9d|2607:fb91:118e:89c9:ad53:43b0:ccce:417c|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:41:39:993-0400  EDT 2021^iat= June 10, 2021 10:41:39:993-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

June 10, 2021 10:42:36:991-0400 - INFO: 439741123|REGT|TEST|SITEMINDER|VALIDATE_ASSERTION|439741123|4deef81s-6455-460b-bf41-c126700d1e9d|65.115.214.106|00||Application data=^CSPProviderName=IDME^givenName=KELLIE^surName=THOMPSON^dateofBirth=1975-04-25^address=21341 E Valley Vista Dr^city=Liberty June 10, 2021 10:42:36:991-0400  EDT 2021^iat= June 10, 2021 10:42:36:991-0400 EDT 2021^AppID=OLA^cspTransactionID=7bdd62bb-966a-426a-9e47-8d2a5a772162

May 03, 2021 10:33:50:223-0400 - INFO: NON-8016|IdtokenAuth||authenticate‖lookupClaimVal is null|ERROR|SITEMINDER| QDIAUTH|vp22wsnnn012 |null|null|

 

My Inline field extraction codes: (Working for first 2 events but not the 3rd event)

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w+)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w+)\|(?P<SUBJECT>\w+)\|(?P<LESSION>\w+?\-?\w+?\-?\w+?\-?\w+?-\w+?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Does this help?

^(?P<TIMESTAMPT>.+)\s+\-\s\w+\:\s(?P<USER>.+)\|(?P<TYPE>\w+)\|(?P<SYSTEM>\w*)\|(?P<EVENT>\w+)\|(?P<EVENTID>\w*)\|(?P<SUBJECT>\w*)\|(?P<LESSION>\w*?\-?\w*?\-?\w*?\-?\w*?\-?\w*?)\|(?P<SRCADDR>.+)\|(?P<STATUS>\w+)\|(?P<MSG>\w*?)\|(?P<DATA>.+)

By the way, the pasting of the third message may have been corrupted and I have assumed that there should be 4 pipes in the middle

authenticate||||lookupClaimVal is null

It is often clearer to paste events etc into code blocks to avoid spurious substitutions being made!

SplunkDash
Motivator

Hello,

Thank you so much for your quick response, truly appreciate it. I think we don't have a better choice based on the quality of data. Thank you again.

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...