Re: How to associate two data set when one set nee...

Jouman · ‎11-20-2023

Dear All,

I have one index and I use this index to store messages and summary report as well.

In report="report_b", it stores the running case name and the used device id(DEV_ID) in timestamp _time.

ex.

_time	DEV_ID	case_name	case_action
01:00	111	ping111.py	start
01:20	111	ping111.py	end
02:00	222	ping222.py	start
02:30	222	ping222.py	end
02:40	111	ping222.py	start
03:00	111	ping222.py	end

For Message_Name="event_a", it is stored in index=A as below:

_time	LOG_ID	Message_Name
01:10	01	event_a
02:50	02	event_a

I would like to associate the case that is running when the event_a is sent.
So I use the code below:
Firstly, to find out the device id(DEV_ID) associated with this log(LOG_ID)
Secondly, to associate event_a and case_name by DEV_ID
Finally, list those event_a only.

(index=A Message_Name="event_a") OR (index=A report="report_b")
| lookup table_A.csv LOG_ID OUTPUT DEV_ID
| sort 0 + _time 
| streamstats current=false last(case_name) as last_case_name, , last(case_action) as last_case_action by DEV_ID 
| eval case_name=if(isnull(case_name) AND last_case_action="start",last_case_name,case_name)

| where isnotnull(Message_Name)
| table _time Message_Name LOG_ID DEV_ID case_name

The output would be:

_time	Message_Name	LOG_ID	DEV_ID	case_name
01:10	event_a	01	111	ping111.py
02:50	event_a	02	111	ping222.py

The code works fine but the amount of data is huge so the lookup command takes a very long time. Furthermore, actually, it is no need to apply lookup command for report="report_b".
(index=A Message_Name="event_a") : 150000 records in 24 hour
(index=A report="report_b") : 700000 records in 24 hour

Is there any way to rewrite the code to make lookup only apply on events belongs to (index=A Message_Name="event_a") ? try to use subsearch, append, appendpipe to restrict find associated DEV_ID first but not working.

Thank you so much.

bowesmana · ‎11-20-2023

It is more likely that your performance issue is caused by the sort+streamstats rather than the lookup

Here is an example that does not use sort or streamstats - it may or may not work in your data, but the principle is to use stats. You can run this example and it will give you your results.

The piece you would want is shown by the comment before the fields statement.

| makeresults format=csv data="_time,DEV_ID,case_name,case_action
01:00,111,ping111.py,start
01:20,111,ping111.py,end
02:00,222,ping222.py,start
02:30,222,ping222.py,end
02:40,111,ping222.py,start
03:00,111,ping222.py,end"
| eval _time=strptime("2023-11-21 "._time.":00", "%F %T")
| append [ 
  | makeresults format=csv data="_time,LOG_ID,Message_Name
01:10,01,event_a
02:50,02,event_a"
  | eval _time=strptime("2023-11-21 "._time.":00", "%F %T")
  | eval DEV_ID=111
]

``` So use your first two lines of your search and then the following```
| fields _time DEV_ID case_name case_action LOG_ID Message_Name
| eval t=if(isnull(LOG_ID),printf("%d##%s##%s", _time, case_action, case_name), null())
| eval lt=if(isnull(LOG_ID),null,printf("%d##%s##%s", _time, LOG_ID, Message_Name))
| fields - LOG_ID Message_Name case_*
| stats values(*) as * by DEV_ID
| where isnotnull(lt)
| mvexpand lt
| eval s=split(lt, "##")
| eval _time=mvindex(s, 0), LOG_ID=mvindex(s, 1), Message_Name=mvindex(s,2)
| rex field=t max_match=0 "(?<report_time>\d+)##(?<case_action>[^#]*)##(?<case_name>.*)"
| eval min_ix=-1
| eval c = 0
| foreach mode=multivalue report_time [ eval min_ix=if(_time > '<<ITEM>>', c, min_ix), c=c+1 ]
| eval case_name=if(min_ix>=0, mvindex(case_name, min_ix), "unknown")
| eval case_action=if(min_ix>=0, mvindex(case_action, min_ix), "unknown")
| fields - s lt t c min_ix report_time
| table _time Message_Name LOG_ID DEV_ID case_name

yuanliu · ‎11-20-2023

Just note: Often times it is better to describe your use case than trying to "fix" SPL. Are you sure it is lookup that slows the search, not sort? Sorting large amount of data is expensive in many ways while lookup is a very efficient command.

If you must try to not lookup in report_b, you can append after lookup.

(index=A Message_Name="event_a")
| lookup table_A.csv LOG_ID OUTPUT DEV_ID
append [search index=A report="report_b"]
| sort 0 + _time 
| streamstats current=false last(case_name) as last_case_name, , last(case_action) as last_case_action by DEV_ID 
| eval case_name=if(isnull(case_name) AND last_case_action="start",last_case_name,case_name)

| where isnotnull(Message_Name)
| table _time Message_Name LOG_ID DEV_ID case_name

Not sure how much this can speed search up, however.

How to associate two data set when one set need to lookup for associated key but the other data set doesn't need ?

subsearch

Index This | I’m short for "configuration file.” What am I?

New Articles from Academic Learning Partners, Help Expand Lantern’s Use Case Library, ...

Your Guide to SPL2 at .conf24!