Alerting

How to find job durartion based on start time and alert based on separate event?

cmcdole
Path Finder

Situation: I have jobs that start running at different times because they are dependent on previous jobs to run successfully. There are two events I am concerned with. One event for "job started" and another event for "job finished".

Goal: Alert when the job has been running for 5 hours or more, regardless of when it started, and has not finished

My approach was to calculate the duration based on the _time from "job started" but I can't determine the syntax on how to prevent an alert if the job finished on time since the "job finished" _time is in a separate event.
I am able to determine the duration using | eval secondsAgoStr=tostring(now() - _time, "duration"
I am also able to calculate the delta of the time using | delta _time AS timeDeltaSeconds p=1
I can't use either of these when the job is currently running because the "job finished" event has not occurred.

0 Karma
1 Solution

somesoni2
Revered Legend

Assuming your events have a field to uniquely identify a job, by either a jobID or jobName field, you can do like this.

your current search to fetch both "job started" and "job finished" events
| eval eventType=if(searchmatch("job started"),"Start","End")
| chart values(_time) over jobID by eventType
| eval End=coalesce(End,now()) 
| eval jobduration=End-Start | where jobduration>=18000

View solution in original post

cmcdole
Path Finder

This helped me come up with the alert. Thanks

0 Karma

somesoni2
Revered Legend

Assuming your events have a field to uniquely identify a job, by either a jobID or jobName field, you can do like this.

your current search to fetch both "job started" and "job finished" events
| eval eventType=if(searchmatch("job started"),"Start","End")
| chart values(_time) over jobID by eventType
| eval End=coalesce(End,now()) 
| eval jobduration=End-Start | where jobduration>=18000
Get Updates on the Splunk Community!

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...

Adoption of Infrastructure Monitoring at Splunk

  Splunk's Growth Engineering team showcases one of their first Splunk product adoption-Splunk Infrastructure ...

Modern way of developing distributed application using OTel

Recently, I had the opportunity to work on a complex microservice using Spring boot and Quarkus to develop a ...