Alerting

How to create a recurring alert based on message strings?

chaitanyaaiops
Explorer

Dear experts,
I've created an alert based on a message string to identify closed connections . However, alert gets triggered only once although the problem doesn't get fixed until we bounce.

Looking for a query to have an recurring alert, until I see success message string as "*reconfigured with 'RabbitMQ' bean*" as the latest in comparison to the failed strings across all events.

Failed messages:  *com.rabbitmq.client.ShutdownSignalException* OR "*"channel shutdown*"
Success message: "*reconfigured with 'RabbitMQ' bean*"

Current Alert query that occurs only once:


index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*") |stats count by cf_app_name, cf_foundation

Thank you for the help

Labels (1)
0 Karma
1 Solution

richgalloway
SplunkTrust
SplunkTrust

That requirement can be included in the search.

index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*") 
| dedup <<field with message>>
| where NOT match(<<field with message>>, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation
---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway
SplunkTrust
SplunkTrust

Alerts are triggered each time the search criteria are met (unless throttled).  If the shutdown event is only received once then the alert will only be triggered once.  If you want the alert to repeat then the search must be written and scheduled to find the triggering event (or canceling event) each time it runs.

---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops
Explorer

Thank you Rich - however, I don't want to create noise of recurring alert unless there is a need. i.e., only if the reconfigured message is not the latest in comparison to other strings - i want alerts to recur 

0 Karma

richgalloway
SplunkTrust
SplunkTrust

That requirement can be included in the search.

index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*") 
| dedup <<field with message>>
| where NOT match(<<field with message>>, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation
---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops
Explorer

Thank you once again Rich. 
To add more details:
Failed condition comes as a different fields in the event compared to reconfigured which comes at a different position of the event. In short, if i extract this is how it would look

msg1 = "channel shutdown"
msg2 = "com.rabbitmq.client.ShutdownSignalException"
msg3 ="*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*"
msg4= "*reconfigured with 'RabbitMQ' bean*"

Alert should be kept triggering until msg 4 is latest in comparison to all other 3 messages irrespective of even it occurring only once.

0 Karma

richgalloway
SplunkTrust
SplunkTrust

If the failure and success messages are in different fields, then we can use the coalesce function to combine them for dedup.

index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*") 
| eval alert_field = coalesce(<<msg1 field>>, <<msg2 field>>, <<msg3 field>>, <<msg4 field>>)
| dedup alert_field
| where NOT match(alert_field, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation
---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops
Explorer

Thanks Rich. However, the challenge being alert is set to run for every 15 mins and events occur only once. How can it recur every 15 mins since the failure event won't occur?

Thank you for patience

0 Karma

richgalloway
SplunkTrust
SplunkTrust

Yes, that's the tricky bit and goes back to the part of my first reply that said "the search must be written and scheduled to find the triggering event".  Rather than search back 15 minutes, it will be necessary for the alert to search back as far as necessary to find the events of interest.

---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops
Explorer

That's right - however, if i use dedup, and if the failed message has occurred after the success message, would that removes duplicates for failed messages

Only bit I don't get is, how do i compare timestamp for msg 4 (success) to be latest among all messages?

0 Karma

richgalloway
SplunkTrust
SplunkTrust

There's no need to compare timestamps.  The dedup command keeps the most recent event so whatever result you get must be the latest message.

---
If this reply helps you, Karma would be appreciated.
Get Updates on the Splunk Community!

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

(view in My Videos)Struggling with alert fatigue, lack of context, and prioritization around security ...

Troubleshooting the OpenTelemetry Collector

  In this tech talk, you’ll learn how to troubleshoot the OpenTelemetry collector - from checking the ...

Adoption of Infrastructure Monitoring at Splunk

  Splunk's Growth Engineering team showcases one of their first Splunk product adoption-Splunk Infrastructure ...