Could someone help me in deriving solution for this case below?
Background : We have an app and in which we set all our saved searches as durable ones as we dont want to miss any runs. So any scheduled search if it fails on that particular scheduled time due to any issues like infra related or resource related it will be covered in next run. So am trying to capture the last status even after the durable logic applied.
Lets say I have 4 events. So the first two runs (Scheduled_time=12345 AND Scheduled_time=12346) of ALERT ABC failed. And in the third schedule during 12347 those two are covered and in that 12347 is also covered and all are success.
So if I take query like this first
.. | stats last(status) by savedsearch_name scheduled_time I get output like this
savedsearch_name last(status) scheduled_time
ABC skipped 12345
ABC skipped 12346
ABC success 12347
I need to write a logic that take
A. jobs whose last status is not success - So here ABC 12345 and ABC 12346
B. where durable_cursor != scheduled_time. So it will pick events for that job where multiple jobs covered for that missed duration. In this case here it will pick my EVENT 3
C. Then I have to derive like this. Take the failed saved search job name with its scheduled time in which its failed and check that scheduled_time falls within next durable_cursor and scheduled_time with status=success.
.. TAKE FAILED SAVEDSEARCH NAME TIME as FAILEDTIME | where durable_cursor!=scheduled_time | eval Flag=if(FAILEDTIME>=durable_cursor OR FAILEDTIME<=scheduled_time, "COVERED", "NOT COVERED")
with its schedule_time and check again if that job (with its job name) other scheduled time run falls betweee
EVENT 4 : savedsearch_name = ABC ; status = success; scheduled_time =12347
EVENT 3 : savedsearch_name = ABC ; status = success ; durable_cursor=12345 scheduled_time =12347
EVENT 2 : savedsearch_name = ABC ; status = skipped ; scheduled_time =12346
EVENT 1 : savedsearch_name = ABC ; status = skipped ; scheduled_time =12345
How I derived so far and where I stuck.
I took this in two reports
First report will take all the Jobs whose last status is not success and tabled output with fields SAVEDSEARCH NAME, SCHEDULEDTIME AS FAILEDTIME, LAST(STATUS) as FAILEDSTATU
Then I saved this result in lookup
Thsi has to run for last one hour window
Second Report
It will refer the lookup and take the failed savedsearch names from the lookup and search only those events in Splunk internal sets and search only the events where durable_cursor!=scheduled_time and then check if that failed savedsearch time falls within durable_cursor and next scheduled_time and check if status is success.
Thsi is working fine if I have one savedsearch job for one time. But not for multivalues
Lets say Job A itself is having four runs in an hour and except first all are failures. In this case I could not cover as referring values from lookup as multivalue field not matching the exact stuff
Here is the question I posted for the same
If somebody have any alternate or better thoughts on this can you please throw some light on this.