Getting Data In

How to parse out fields

a212830
Champion

Hi, I have an XML-like (but not proper XML) feed that I need to parse.

A sample is below, and I need to parse out each field.

Each field will not necessarily be in each event, so I need a method that will find it, without depending upon a previous field or the location within the event itself.

Can anyone help?

Apr 22 19:54:29 138.126.78.80 <STONEGATE_LOG><TIMESTAMP>2019-04-22 15:54:28</TIMESTAMP><LOGID>9999999</LOGID><NODEID>1.2.3.4</NODEID><FACILITY>Packet Filtering</FACILITY><TYPE>Notification</TYPE><EVENT>New connection</EVENT><ACTION>Allow</ACTION><SRC>4.5.6.7</SRC><DST>X.X.X.X</DST><SERVICE>HTTP</SERVICE><PROTOCOL>2</PROTOCOL><SPORT>12345</SPORT><DPORT>99</DPORT><RULEID>60732.1</RULEID><SRCIF>5</SRCIF><COMPID>some text here</COMPID><RECEPTIONTIME>2019-04-22 15:54:29</RECEPTIONTIME><SENDERTYPE>Firewall</SENDERTYPE><SITUATION>Connection_Allowed</SITUATION><EVENTID>99999999999</EVENTID></STONEGATE_LOG>
1 Solution

harsmarvania57
Ultra Champion

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

View solution in original post

woodcock
Esteemed Legend

All these answers are missing this setting in transforms.conf:

MV_ADD = true

So the full stanza is:

[YourNameHere]
REGEX = <([^\/][^>]+)>(.*?)<\/[^>]+>
FORMAT = $1::$2
MV_ADD = true
0 Karma

harsmarvania57
Ultra Champion

This will not work because REPEAT_MATCH is only valid for Indexed-time field extraction and solution which I have provided is for search time extraction.

0 Karma

woodcock
Esteemed Legend

Quite correct; I always get MV_ADD and REPEAT_MATCH confused. I have corrected my answer.

0 Karma

a212830
Champion

Thanks. This works quite well. Is there anyway of forcing field names to be lowercase?

0 Karma

woodcock
Esteemed Legend

You will have to stack a calculated field on top of this using lower(fieldname).

0 Karma

sloshburch
Splunk Employee
Splunk Employee

I expect that a props.conf entry for calculated field would work with eval's lower()

0 Karma

harsmarvania57
Ultra Champion

Hi,

To extract XML data at search time, you can use below config on Search Head.

props.conf

[yourSourcetype]
REPORT-test = xmlkv_alt

transforms.conf

[xmlkv_alt]
FORMAT = $1::$2
REGEX = <([^>]*)>([^<]*)<\/\1>

EDIT: Please find regex extraction with sample data on https://regex101.com/r/tJVD20/1

ddrillic
Ultra Champion

Interesting, so the xml doesn't have to be well-formed, as the sample above isn't well-formed.

Amazing, because back-then, a similar solution for json was a big hit here - How can we extract a json document within an event?

We ended up with -

REPORT-extract = json_embedded


[json_embedded]
REGEX = "(\w+)"."(\S+?)"
FORMAT = $1::$2
0 Karma

harsmarvania57
Ultra Champion

Yes you can use regex for magic 😉

0 Karma

a212830
Champion

Thanks. I see them appearing on the regex site, but they don't appear as fields on the SH when I try that - are there additional steps requried?

0 Karma

harsmarvania57
Ultra Champion

If you modified config file directly then you need to restart splunk service or you can use /debug/refresh web endpoint

0 Karma

a212830
Champion

How will the fields appear? Will they automatically appear with the names?

0 Karma

harsmarvania57
Ultra Champion

Yes it will automatically appear, I have tested this config in my lab and it is working fine.

0 Karma
Get Updates on the Splunk Community!

Database Performance Sidebar Panel Now on APM Database Query Performance & Service ...

We’ve streamlined the troubleshooting experience for database-related service issues by adding a database ...

IM Landing Page Filter - Now Available

We’ve added the capability for you to filter across the summary details on the main Infrastructure Monitoring ...

Dynamic Links from Alerts to IM Navigators - New in Observability Cloud

Splunk continues to improve the troubleshooting experience in Observability Cloud with this latest enhancement ...