Splunk Search

Field not being extracted from mainframe event.

SteveIves1
Engager

I have 2 eventa from a mainframe running z/OS (not sure that affects things):

1.{"MFSOURCETYPE":"SYSLOG","DATETIME":"2024-04-24 13:35:18.05 +0100","SYSLOGSYSTEMNAME":"A090","JOBID":"STC15694","JOBNAME":"RDSONLVP","SYSPLEX":"UKPPLX01","ACTION":"INFORMATIONAL","MSGNUM":"IEF234E","MSGTXT":"IEF234E K 449F,JE5207,PVT,RDSONLVP,RDSONLVP","MSGREQTYPE":""}

2. {"MFSOURCETYPE":"SYSLOG","DATETIME":"2024-04-24 13:34:47.92 +0100","SYSLOGSYSTEMNAME":"A090","JOBID":"STC15694","JOBNAME":"RDSONLVP","SYSPLEX":"UKPPLX01","CONSOLE":"INTERNAL","ACTION":"INFORMATIONAL","MSGNUM":"IEC147I","MSGTXT":"IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207,\nRDS.VPLS.PDLY0001.PFDRL.U142530.E240220\x9C\n \x80\x80","MSGREQTYPE":""} 

 

for event 1, everything works as it should. For event 2, the MSGTXT field is coming up blank:

SteveIves1_0-1714043169610.png

I thought that the MDSGTCT field might be populated and just not displaying becasue of the control characters (the mainframe doesn't use these, so not sure where they are coming from) but running rex against MSGTXT or substr still gives me nothing. 

Adding the search command:

rex "MSGTXT(?<msgtext>.+):"

does create a msgtext field with the MSGTXT plus a few more characters :

":"IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207,\nRDS.VPLS.PDLY0001.PFDRL.U142530.E240220\x9C\n \x80\x80","MSGREQTYPE"           

, so the data is in the event to be extracted. 

I can work with this to extract the comma-deliminated field that I actually want, but it's a pain having to prcess this particula MSGNUM (IEC147I) differently.

Any suggestions as to how to go about getting htese events parsed correctly?

Thanks,

Steve

 

 

 

 

 

 

Labels (1)
0 Karma

SteveIves1
Engager

I've no idea where those control characters (\n, \x etc.) are coming from. They are not in the data that the mainframe send to Splunk.

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Not just Splunk.  Python also obligatorily treat "\xHH" in double quotes as escape sequences and rejects this data as JSON.  Like Splunk, it doesn't do this with "\n" if they are in input.


I've no idea where those control characters (\n, \x etc.) are coming from. They are not in the data that the mainframe send to Splunk.

Could you clarify the method you use to verify that \xHH are not in mainframe data?  What do you use to inspect that data?  Do you see newlines in places where "\n" shows in Splunk?  As @ITWhisperer says, Splunk doesn't have the habit of inserting characters into ingested data.  Meanwhile, mainframes use an IBM-specific character set (EBCDIC) internally.  So, when it sends data out, something has to perform conversion.  But most importantly, if you view data in mainframe terminal and do not see those characters, that's not proof that those are not in the data; even if you view data in an intermediary terminal emulator such as those on a Unix machine, those emulators can also interpret translated control characters according to IBM's definition.  After all, control characters are used to control visual effect in terminals and by definition invisible to terminal users of the native platform, and a terminal emulator is expected to interpret converted control characters according to their native functions.

My hypothesis is that those control characters are present in data stream sent from mainframe.  The best solution is to either fix that on mainframe, or insert a pre-processor to escape/strip control characters.

In the short term, instead of resorting to regex in a structured dataset, I recommend using regex to escape those control characters, then let Splunk's robust functions do its job.

 

| fields _raw
| rex mode=sed "s/\\\\x/\\\\\\x/g"
| spath

 

Using the sample data, my output is

ACTIONCONSOLEDATETIMEJOBIDJOBNAMEMFSOURCETYPEMSGNUMMSGREQTYPEMSGTXTSYSLOGSYSTEMNAMESYSPLEX_raw
INFORMATIONALINTERNAL2024-04-24 13:34:47.92 +0100STC15694RDSONLVPSYSLOGIEC147I IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207, RDS.VPLS.PDLY0001.PFDRL.U142530.E240220\x9C \x80\x80A090UKPPLX01{"MFSOURCETYPE":"SYSLOG","DATETIME":"2024-04-24 13:34:47.92 +0100","SYSLOGSYSTEMNAME":"A090","JOBID":"STC15694","JOBNAME":"RDSONLVP","SYSPLEX":"UKPPLX01","CONSOLE":"INTERNAL","ACTION":"INFORMATIONAL","MSGNUM":"IEC147I","MSGTXT":"IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207,\nRDS.VPLS.PDLY0001.PFDRL.U142530.E240220\\x9C\n \\x80\\x80","MSGREQTYPE":""}

(Note all "\xHH" sequences becomes "\\xHH" in _raw.)

This is an emulation you can play with and compare with real data

 

| makeresults
| eval _raw =
"{\"MFSOURCETYPE\":\"SYSLOG\",\"DATETIME\":\"2024-04-24 13:34:47.92 +0100\",\"SYSLOGSYSTEMNAME\":\"A090\",\"JOBID\":\"STC15694\",\"JOBNAME\":\"RDSONLVP\",\"SYSPLEX\":\"UKPPLX01\",\"CONSOLE\":\"INTERNAL\",\"ACTION\":\"INFORMATIONAL\",\"MSGNUM\":\"IEC147I\",\"MSGTXT\":\"IEC147I 613-04,IFG0195B,RDSONLVP,RDSONLVP,IIII4004,449F,JE5207,\\nRDS.VPLS.PDLY0001.PFDRL.U142530.E240220\\x9C\\n \\x80\\x80\",\"MSGREQTYPE\":\"\"} "
``` data emulation above ```

 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

The question is what actually is in this event. After all Splunk wouldn't escape a newline character. Depending on props for the data it would either break input stream into separate events or break the line into multiline event. It would not get rendered as \n.

I think similar thing goes for \x80 and \x9c - they are not (at least in ascii-derivative encodings) control characters but extended characters).

So unless I'm missing something, this means that this is not Splunk escaping data (as it does sometimes with control characters) but these are actually raw character sequences received on input.

I also don't understand why it should break json parsing. After all these are properly escaped characters so they should be parsed out of the json data...

My bet would be that someone didn't rely on automatic json extraction but instead fiddled with regexes to parse fields out of this sourcetype.

Edit: No. I forgot one thing. Json specification only allows specifying extended characters by unicode as \uXXXX. The \xXX notation is not allowed. That's why json parsing fails.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It is unlikely that Splunk is adding them to the data it receives - what is your ingest path, i.e. how does the data get into Splunk and what configuration have you used along the way?

0 Karma

SteveIves1
Engager

I'm not sure but only a tiny fraction of a % of messages seem to be affected. Our Splunk team haven't been able to help. 

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Splunk on its own doesn't have the zOS component so your data has to be going through some external stages before it reaches Splunk.

We don't know what your ingestion process looks like.

If the events are written by some solution to an intermediate file picked up later by forwarder - check the file contents and see if those \xXX codes are there.

If the events are pushed by syslog - sniff the traffic with tcpdump and see if they are there.

Most probably the response to one of those questions (or a similar one regarding your particular transport channel) will be affirmative. And that will mean that the issue is external to Splunk  - you're ingesting badly formatted data.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Sounds like a data problem - you need to do some further analysis on the commonalities amongst the failing messages and thee differences to the successful messages, not just in the text, but how the messages are produced, where are they produced, how are they stored, when are they produced, etc.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

It looks like it is the control characters which are giving you grief. You could try replacing "\x" with "\\x" and then reparsing (with spath) (you may need to remove all the other fields already parsed though)

0 Karma
Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...