Alerting

How to trigger an alert if the condition remains for a certain period of time?

splunking1
Explorer

I am trying to create an alert when the field toState changes to OPEN and stays in that OPEN state for 5 minutes. I have tried the following but it is not working. Would appreciate if I get some pointers. 

 

... CB_STATE_TRANSITION | timechart span=5m count(toState="OPEN") as state | stats count | where count > 1

I have the alert run every 5 minutes and triggers when the number of results > 0. 

Labels (2)
Tags (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

Perhaps this will help.

... CB_STATE_TRANSITION 
``` Get the most recent time and state ```
| stats latest(_time) as _time, latest(toState) as toState
``` Keep only the "OPEN" events with a timestamp at least 5 minutes old ```
| where (toState="OPEN" AND _time < relative_time(now(), "-5m")
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Perhaps this will help.

... CB_STATE_TRANSITION 
``` Get the most recent time and state ```
| stats latest(_time) as _time, latest(toState) as toState
``` Keep only the "OPEN" events with a timestamp at least 5 minutes old ```
| where (toState="OPEN" AND _time < relative_time(now(), "-5m")
---
If this reply helps you, Karma would be appreciated.

splunking1
Explorer

I appreciate the reply. I am curious about the issue with my approach. Conceptually, it makes sense. What is the problem with that approach? Just trying to understand. 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

You said your query was "not working" without qualification so I offered a query that should work.

Let's take a closer look at the original query.  The transaction command counts the number of "OPEN" events in each 5-minute period of the search time window.  Depending the time window chosen, this will produce one or more results.

| timechart span=5m count(toState="OPEN") as state
_timestate
01/29/2023 14:300
01/29/2023 14:350
01/29/2023 14:400

 

The stats command then counts the number of results produced by timechart.

| stats count | where count > 1

Giving us "3", which is greater than 1.  That will erroneously trigger an alert.

Let's say we run the query over the previous 5 minutes with known "OPEN" events.  The timechart command might produce something like this

_timestate
01/29/2023 14:453

 

This time the stats command will return "1", which is not greater than 1 and so the alert will erroneously NOT trigger.

Like a broken clock, the query will occasionally work, but the false positives and false negatives make it unreliable.

The query could be improved by removing the stats command.

... CB_STATE_TRANSITION | timechart span=5m count(toState="OPEN") as state | where count > 1

This will give us only the 5-minute periods where an "OPEN" event occurred and the alert can be triggered if there are results.  That doesn't mean, however, that the "OPEN" was present for all 5 of those minutes.

---
If this reply helps you, Karma would be appreciated.

splunking1
Explorer

That makes a lot of sense. Thank you so much. One final question: If I were to extend your query to different host; would it still work?

....CB_STATE_TRANSITION | stats latest(_time) as _time, latest(toState) as toState by host | where (toState="OPEN" AND _time < relative_time(now(), "-5m"))

 

I have different hosts and it is possible that the alert does not trigger for one of them even though the state was set to open for the last 5 minutes due to the state transitioning to closed. That would be a false positive so I want to account for each host separately. I will accept your answer once this thread is closed so don't worry 🙂

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Yes, adding "by host" should work as you expect.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...

Adoption of Infrastructure Monitoring at Splunk

  Splunk's Growth Engineering team showcases one of their first Splunk product adoption-Splunk Infrastructure ...

Modern way of developing distributed application using OTel

Recently, I had the opportunity to work on a complex microservice using Spring boot and Quarkus to develop a ...