Splunk Search

How to extract a value from the message using rex command?

Renunaren
Loves-to-Learn Everything

 

Dataframe row : {"_c0":{"0":"[","1":" {","2":" \"table_name\": \"pc_dwh_rdv.gdh_ls2lo_s99\"","3":" \"deleted_count\": 18","4":" \"redelivered_count\": 0","5":" \"load_date\": \"2023-07-27\"","6":" }","7":" {","8":" \"table_name\": \"pc_dwh_rdv.gdh_spar_s99\"","9":" \"deleted_count\": 8061","10":" \"redelivered_count\": 1","11":" \"load_date\": \"2023-07-27\"","12":" }","13":" {","14":" \"table_name\": \"pc_dwh_rdv.gdh_tf3tx_s99\"","15":" \"deleted_count\": 366619","16":" \"redelivered_count\": 0","17":" \"load_date\": \"2023-07-27\"","18":" }","19":" {","20":" \"table_name\": \"pc_dwh_rdv.gdh_wechsel_s99\"","21":" \"deleted_count\": 2","22":" \"redelivered_count\": 0","23":" \"load_date\": \"2023-07-27\"","24":" }","25":" {","26":" \"table_name\": \"pc_dwh_rdv.gdh_revolvingcreditcard_s99\"","27":" \"deleted_count\": 1285","28":" \"redelivered_count\": 0","29":" \"load_date\": \"2023-07-27\"","30":" }","31":" {","32":" \"table_name\": \"pc_dwh_rdv.gdh_phd_s99\"","33":" \"deleted_count\": 2484","34":" \"redelivered_count\": 204","35":" \"load_date\": \"2023-07-27\"","36":" }","37":" {","38":" \"table_name\": \"pc_dwh_rdv.gdh_npk_s99\"","39":" \"deleted_count\": 1705","40":" \"redelivered_count\": 0","41":" \"load_date\": \"2023-07-27\"","42":" }","43":" {","44":" \"table_name\": \"pc_dwh_rdv.gdh_npk_s98\"","45":" \"deleted_count\": 1517","46":" \"redelivered_count\": 0","47":" \"load_date\": \"2023-07-27\"","48":" }","49":" {","50":" \"table_name\": \"pc_dwh_rdv.gdh_kontokorrent_s99\"","51":" \"deleted_count\": 12998","52":" \"redelivered_count\": 0","53":" \"load_date\": \"2023-07-27\"","54":" }","55":" {","56":" \"table_name\": \"pc_dwh_rdv.gdh_gds_s99\"","57":" \"deleted_count\": 13","58":" \"redelivered_count\": 0","59":" \"load_date\": \"2023-07-27\"","60":" }","61":" {","62":" \"table_name\": \"pc_dwh_rdv.gdh_dszins_s99\"","63":" \"deleted_count\": 57","64":" \"redelivered_count\": 0","65":" \"load_date\": \"2023-07-27\"","66":" }","67":" {","68":" \"table_name\": \"pc_dwh_rdv_gdh_monat.gdh_phd_izr_monthly_s99\"","69":" \"deleted_count\": 1315","70":" \"redelivered_count\": 0","71":" \"load_date\": \"2023-07-27\"","72":" }","73":"]"}}

 

The above is the sample message of an event which we have in splunk we want to extract the deleted count values like "1315", "57", "13" etc and add those values as a separate fields using rex command .
Also from the above message we want to extract load_date value such as 2023-07-27 and add that value as a separate field. Please help us in this.

Labels (1)
Tags (3)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

As @yuanliu already said - this is a very strange piece of data. Something "json-like" containing a string which is again json-like.

1. Even if it was a proper json structure, the string field values are just strings. So you'd have to do multiple spath passes to parse it all properly.

2. Don't fiddle with regexes around structured data. You'll end up accounting for so many border cases that will make your head spin. It's not worth it.

0 Karma

gcusello
SplunkTrust
SplunkTrust

Hi @Renunaren ,

this seems to be a json format so ypu could extract fields using INDEXED_EXTRACTIONS=json in props.conf os the spath command.

Anyway, if you want to use a regex, You can use these two regexes:

| rex max_match=0 "deleted_count\\\":\s+(?<deleted_count>\d+)"
| rex max_match=0 "load_date\\\":\s+\\\"(?<load_date>[^\\]+)"

that you can test at https://regex101.com/r/w1XGFx/1 and https://regex101.com/r/w1XGFx/2

Ciao.

Giuseppe

0 Karma

Renunaren
Loves-to-Learn Everything

Hi Giuseppe,

Thanks for your response. We have tried using the mentioned REGEX commands but it hasn't worked here. In our environment we are not authorized to modify props.conf configuration files. Below are some of the snippets that will mention the result. 

Renunaren_0-1690780774503.png

 

Renunaren_1-1690781061633.png

Please look into this and help us on this.

0 Karma

gcusello
SplunkTrust
SplunkTrust

hi @Renunaren,

did you try the "spath" command (https://docs.splunk.com/Documentation/Splunk/9.1.0/SearchReference/Spath) ?

The problem wit backslashes sometimes is solved adding a fourth backslash.

Ciao.

Giuseppe

0 Karma

Renunaren
Loves-to-Learn Everything

Hi Giuseppe,

Thanks for your response. One of our team member has tried using the SPATH command. But he was unable to extract the required data, so we want to extract the data using rex command only. Please help us in this issue.

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

This data format is indeed very intriguing.  Each of the JSON node represents a fragment of a quasi-JSON object.  You will have to first extract the compliant JSON, then reassemble the fragments.  But according to the sample data you posted, the reassembled string resembles JSON but is not compliant.  So, you fix syntax errors in the assembly.  Here is one way to achieve that.  But note: each message contains many values.  So I need to apply mvexpand to the reassembled JSON array.

 

| eval message = replace(message, "Dataframe row : ", "")
| spath input=message
| foreach c0.*
    [eval assemble = mvappend(assemble, (printf("%04d", <<MATCHSTR>>) . "<>" . '<<FIELD>>'))]
| fields - _ c0.*
| eval assemble = mvsort(assemble)
| eval message = ""
| foreach assemble mode=multivalue
    [eval message = message . "," . replace(<<ITEM>>, ".+<>", "")]
| rex field=message mode=sed "s/,\[,\s*/[/ s/{,/{/g s/,\s*}/}/g s/,]/]/"
| spath input=message path={}
| mvexpand {}
| spath input={}

 

As you can see, the reverse engineering is maddening.  If you have influence on developers who rendered these logs, beg them to use JSON array for c0 instead of subnodes c0.*; if c0.* subnodes must be used, name them in lexicographic order; also to add commas in the right places inside escaped quotes.

Anyway, your sample data would give

deleted_countload_dateredelivered_counttable_name
182023-07-270pc_dwh_rdv.gdh_ls2lo_s99
80612023-07-271pc_dwh_rdv.gdh_spar_s99
3666192023-07-270pc_dwh_rdv.gdh_tf3tx_s99
22023-07-270pc_dwh_rdv.gdh_wechsel_s99
12852023-07-270pc_dwh_rdv.gdh_revolvingcreditcard_s99
24842023-07-27204pc_dwh_rdv.gdh_phd_s99
17052023-07-270pc_dwh_rdv.gdh_npk_s99
15172023-07-270pc_dwh_rdv.gdh_npk_s98
129982023-07-270pc_dwh_rdv.gdh_kontokorrent_s99
132023-07-270pc_dwh_rdv.gdh_gds_s99
572023-07-270pc_dwh_rdv.gdh_dszins_s99
13152023-07-270pc_dwh_rdv_gdh_monat.gdh_phd_izr_monthly_s99

Here is a data emulation you can play with and compare with real data

 

 

| makeresults
| eval message = "Dataframe row : {\"_c0\":{\"0\":\"[\",\"1\":\" {\",\"2\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_ls2lo_s99\\\"\",\"3\":\" \\\"deleted_count\\\": 18\",\"4\":\" \\\"redelivered_count\\\": 0\",\"5\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"6\":\" }\",\"7\":\" {\",\"8\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_spar_s99\\\"\",\"9\":\" \\\"deleted_count\\\": 8061\",\"10\":\" \\\"redelivered_count\\\": 1\",\"11\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"12\":\" }\",\"13\":\" {\",\"14\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_tf3tx_s99\\\"\",\"15\":\" \\\"deleted_count\\\": 366619\",\"16\":\" \\\"redelivered_count\\\": 0\",\"17\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"18\":\" }\",\"19\":\" {\",\"20\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_wechsel_s99\\\"\",\"21\":\" \\\"deleted_count\\\": 2\",\"22\":\" \\\"redelivered_count\\\": 0\",\"23\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"24\":\" }\",\"25\":\" {\",\"26\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_revolvingcreditcard_s99\\\"\",\"27\":\" \\\"deleted_count\\\": 1285\",\"28\":\" \\\"redelivered_count\\\": 0\",\"29\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"30\":\" }\",\"31\":\" {\",\"32\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_phd_s99\\\"\",\"33\":\" \\\"deleted_count\\\": 2484\",\"34\":\" \\\"redelivered_count\\\": 204\",\"35\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"36\":\" }\",\"37\":\" {\",\"38\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_npk_s99\\\"\",\"39\":\" \\\"deleted_count\\\": 1705\",\"40\":\" \\\"redelivered_count\\\": 0\",\"41\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"42\":\" }\",\"43\":\" {\",\"44\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_npk_s98\\\"\",\"45\":\" \\\"deleted_count\\\": 1517\",\"46\":\" \\\"redelivered_count\\\": 0\",\"47\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"48\":\" }\",\"49\":\" {\",\"50\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_kontokorrent_s99\\\"\",\"51\":\" \\\"deleted_count\\\": 12998\",\"52\":\" \\\"redelivered_count\\\": 0\",\"53\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"54\":\" }\",\"55\":\" {\",\"56\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_gds_s99\\\"\",\"57\":\" \\\"deleted_count\\\": 13\",\"58\":\" \\\"redelivered_count\\\": 0\",\"59\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"60\":\" }\",\"61\":\" {\",\"62\":\" \\\"table_name\\\": \\\"pc_dwh_rdv.gdh_dszins_s99\\\"\",\"63\":\" \\\"deleted_count\\\": 57\",\"64\":\" \\\"redelivered_count\\\": 0\",\"65\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"66\":\" }\",\"67\":\" {\",\"68\":\" \\\"table_name\\\": \\\"pc_dwh_rdv_gdh_monat.gdh_phd_izr_monthly_s99\\\"\",\"69\":\" \\\"deleted_count\\\": 1315\",\"70\":\" \\\"redelivered_count\\\": 0\",\"71\":\" \\\"load_date\\\": \\\"2023-07-27\\\"\",\"72\":\" }\",\"73\":\"]\"}}"
``` data emulation above ```

 

 

Tags (1)

gcusello
SplunkTrust
SplunkTrust

Hi @Renunaren,

it's strange that the spath comand doesn't run and the use of INDEXED_EXTRACTIONS at source level is always the best solution, anyway sometimes there's an issue on regexes with backslash, so please try this:

| rex max_match=0 "deleted_count\\\\":\s+(?<deleted_count>\d+)"
| rex max_match=0 "load_date\\\\":\s+\\\\"(?<load_date>[^\\]+)"

Ciao.

Giuseppe

0 Karma

PickleRick
SplunkTrust
SplunkTrust

Let me interject here a bit.

Indexed extractions are _not_ always the best solution. Don't mistake indexed extractions with auto-kv. Indexed extractions are run during the indexing process and extract the fields permanently whereas auto-kv extract the fields in search-time.

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...