Splunk Search

How can I extract multiple file names from an event and add it as a separate field using rex command?

Renunaren
Loves-to-Learn Everything

Hi Team,

We have a raw event where the message field consists of multiple file names, we want to extract those and add them as a separate field. Please help us on this. Below is the sample event for reference.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

Below is the sample SPL command used for this purpose.

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

Please help us on this.

 

Labels (1)
0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

Please repost your raw event in a code block </> so that it doesn't get corrupted by formatting 

0 Karma

Renunaren
Loves-to-Learn Everything

HI IT Whisperer,

Thanks for your response. As mentioned by you, below is the raw event.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\"😕"{\",\"1\"😕\\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\"😕\\\"status\\\": \\\"files arrived\\\"\",\"3\"😕\\\"files\\\": [\",\"4\"😕\\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\\\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\\\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\\\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\\\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\\\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\\\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\\\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\\\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\\\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\\\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\\\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\\\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\\\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\\\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\\\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\\\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\\\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\"😕"}\"}} ", "process": 32633, "processName": "MainProcess"}

I tried to extract the file names like  PAKS_FACT_DWH2_D20220221.okPAKS_UBER_DWH2_D20220221.okHHE_SIT_check_file1.txt.okHHE_SIT_check_file2.txt.okHHE_SIT_check_file3.txt.ok

separately and add them as a separate field using the below query 

index= app_events_dwh2_de_int | rex max_match=0 "\\\\\\\\\\\\\"files\\\\\\\\\\\\\":\s*\\\\\\\\\\\\\"(?<File_Arrived>[^\\\]+)"

but this doesn't worked. Please help us on this issue.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

By not putting your event in a code block </> as requested it gets corrupted

ITWhisperer_0-1686746848325.png

Please use this button

ITWhisperer_1-1686746905019.png

to insert your example event

Renunaren
Loves-to-Learn Everything

Hi IT Whisperer,

Thanks for your response. Please look into the sample event below.

{"timestamp": "2023-06-13T09:35:27.498033Z", "level": "INFO", "filename": "splunk_sample_csv.py", "funcName": "main", "lineno": 38, "message": "Dataframe row : {\"_c0\":{\"0\":\"{\",\"1\":\" \\\"Timestamp\\\": \\\"2023\\/06\\/13 11:22:45\\\"\",\"2\":\" \\\"status\\\": \\\"files arrived\\\"\",\"3\":\" \\\"files\\\": [\",\"4\":\" \\\"PAKS_FACT_DWH2_D20220221.ok\\\"\",\"5\":\" \\\"PAKS_UBER_DWH2_D20220221.ok\\\"\",\"6\":\" \\\"HHE_SIT_check_file1.txt.ok\\\"\",\"7\":\" \\\"HHE_SIT_check_file2.txt.ok\\\"\",\"8\":\" \\\"HHE_SIT_check_file3.txt.ok\\\"\",\"9\":\" \\\"PAKS_FACT_DWH2_D20220412.ok\\\"\",\"10\":\" \\\"PAKS_FACT_DWH2_D20220420.ok\\\"\",\"11\":\" \\\"PAKS_FACT_DWH2_D20211223.ok\\\"\",\"12\":\" \\\"PAKS_FACT_DWH2_D20211224.ok\\\"\",\"13\":\" \\\"PAKS_FACT_DWH2_D20211225.ok\\\"\",\"14\":\" \\\"NOSPKP2P_DLY_NOK_D230708.ok\\\"\",\"15\":\" \\\"DUMMY_DLY_NOK_D230613.ok\\\"\",\"16\":\" \\\"DUMMY_TEST_DLY_NOK_D230613.ok\\\"\",\"17\":\" \\\"TLX2DB.PROVD.DREAM_12.ok\\\"\",\"18\":\" \\\"TLX2DB.PROVD.DREAM_152.ok\\\"\",\"19\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-19-04.04.32.679000.csv.ok\\\"\",\"20\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-20-05.09.39.679000.csv.ok\\\"\",\"21\":\" \\\"TLX2DB.PROVD.DREAM_2023-04-18-05.09.39.679000.csv.ok\\\"\",\"22\":\" ]\",\"23\":\"}\"}} ", "process": 32633, "processName": "MainProcess"}

Please look into the above code and kindly help us in extracting the file names like mentioned above using rex command.

0 Karma

ITWhisperer
SplunkTrust
SplunkTrust

First extract the list, then each file

| rex "(?:\"files[\\\\]+\": \[)(?<fileslist>[^\s:]+[^\]]+)"
| rex field=fileslist max_match=0 "(?:[^\s:]+[^\s]+\s[\"\\\]+)(?<files>[^\\\]+)"
0 Karma
Get Updates on the Splunk Community!

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...