Splunk Search

How to associate two data set when one set need to lookup for associated key but the other data set doesn't need ?

Jouman
Path Finder

Dear All,

I have one index and I use this index to store messages and summary report as well.

In report="report_b", it stores the running case name and the used device id(DEV_ID) in timestamp _time.

ex.

_timeDEV_IDcase_namecase_action
01:00111ping111.pystart
01:20111ping111.pyend
02:00222ping222.pystart
02:30222ping222.pyend
02:40111ping222.pystart
03:00111ping222.pyend

 

For Message_Name="event_a",  it is stored in index=A as below:

_timeLOG_IDMessage_Name
01:1001event_a
02:5002event_a


I would like to associate the case that is running when the event_a is sent.
So I use the code below:
Firstly, to find out the device id(DEV_ID) associated with this log(LOG_ID) 
Secondly, to associate event_a and case_name by DEV_ID
Finally, list those event_a only.

 

(index=A Message_Name="event_a") OR (index=A report="report_b")
| lookup table_A.csv LOG_ID OUTPUT DEV_ID
| sort 0 + _time 
| streamstats current=false last(case_name) as last_case_name, , last(case_action) as last_case_action by DEV_ID 
| eval case_name=if(isnull(case_name) AND last_case_action="start",last_case_name,case_name)

| where isnotnull(Message_Name)
| table _time Message_Name LOG_ID DEV_ID case_name

 

 
The output would be:

_timeMessage_NameLOG_IDDEV_IDcase_name
01:10event_a01111ping111.py
02:50event_a02111ping222.py

 

The code works fine but the amount of data is huge so the lookup command takes a very long time.  Furthermore, actually, it is no need to apply lookup command for report="report_b".
(index=A Message_Name="event_a") : 150000 records in 24 hour
(index=A report="report_b") : 700000 records in 24 hour

Is there any way to rewrite the code to make lookup only apply on events belongs to (index=A Message_Name="event_a") ? try to use subsearch, append, appendpipe to restrict find associated DEV_ID first but not working.

 

Thank you so much.

Labels (1)
0 Karma

bowesmana
SplunkTrust
SplunkTrust

It is more likely that your performance issue is caused by the sort+streamstats rather than the lookup

Here is an example that does not use sort or streamstats - it may or may not work in your data, but the principle is to use stats. You can run this example and it will give you your results. 

The piece you would want is shown by the comment before the fields statement.

 

| makeresults format=csv data="_time,DEV_ID,case_name,case_action
01:00,111,ping111.py,start
01:20,111,ping111.py,end
02:00,222,ping222.py,start
02:30,222,ping222.py,end
02:40,111,ping222.py,start
03:00,111,ping222.py,end"
| eval _time=strptime("2023-11-21 "._time.":00", "%F %T")
| append [ 
  | makeresults format=csv data="_time,LOG_ID,Message_Name
01:10,01,event_a
02:50,02,event_a"
  | eval _time=strptime("2023-11-21 "._time.":00", "%F %T")
  | eval DEV_ID=111
]

``` So use your first two lines of your search and then the following```
| fields _time DEV_ID case_name case_action LOG_ID Message_Name
| eval t=if(isnull(LOG_ID),printf("%d##%s##%s", _time, case_action, case_name), null())
| eval lt=if(isnull(LOG_ID),null,printf("%d##%s##%s", _time, LOG_ID, Message_Name))
| fields - LOG_ID Message_Name case_*
| stats values(*) as * by DEV_ID
| where isnotnull(lt)
| mvexpand lt
| eval s=split(lt, "##")
| eval _time=mvindex(s, 0), LOG_ID=mvindex(s, 1), Message_Name=mvindex(s,2)
| rex field=t max_match=0 "(?<report_time>\d+)##(?<case_action>[^#]*)##(?<case_name>.*)"
| eval min_ix=-1
| eval c = 0
| foreach mode=multivalue report_time [ eval min_ix=if(_time > '<<ITEM>>', c, min_ix), c=c+1 ]
| eval case_name=if(min_ix>=0, mvindex(case_name, min_ix), "unknown")
| eval case_action=if(min_ix>=0, mvindex(case_action, min_ix), "unknown")
| fields - s lt t c min_ix report_time
| table _time Message_Name LOG_ID DEV_ID case_name

 

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Just note: Often times it is better to describe your use case than trying to "fix" SPL.  Are you sure it is lookup that slows the search, not sort?  Sorting large amount of data is expensive in many ways while lookup is a very efficient command.

If you must try to not lookup in report_b, you can append after lookup.

(index=A Message_Name="event_a")
| lookup table_A.csv LOG_ID OUTPUT DEV_ID
append [search index=A report="report_b"]
| sort 0 + _time 
| streamstats current=false last(case_name) as last_case_name, , last(case_action) as last_case_action by DEV_ID 
| eval case_name=if(isnull(case_name) AND last_case_action="start",last_case_name,case_name)

| where isnotnull(Message_Name)
| table _time Message_Name LOG_ID DEV_ID case_name

Not sure how much this can speed search up, however.

0 Karma
Get Updates on the Splunk Community!

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Splunk Lantern is a Splunk customer success center that provides advice from Splunk experts on valuable data ...

Your Guide to SPL2 at .conf24!

So, you’re headed to .conf24? You’re in for a good time. Las Vegas weather is just *chef’s kiss* beautiful in ...