Solved: Re: Create a recurring alert based on message stri...

chaitanyaaiops · ‎07-26-2022

Dear experts,
I've created an alert based on a message string to identify closed connections . However, alert gets triggered only once although the problem doesn't get fixed until we bounce.

Looking for a query to have an recurring alert, until I see success message string as "*reconfigured with 'RabbitMQ' bean*" as the latest in comparison to the failed strings across all events.

Failed messages: *com.rabbitmq.client.ShutdownSignalException* OR "*"channel shutdown*"
Success message: "*reconfigured with 'RabbitMQ' bean*"

Current Alert query that occurs only once:

index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*") |stats count by cf_app_name, cf_foundation

Thank you for the help

richgalloway · ‎07-26-2022

That requirement can be included in the search.

index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*") 
| dedup <<field with message>>
| where NOT match(<<field with message>>, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation

---
If this reply helps you, Karma would be appreciated.

View solution in original post

richgalloway · ‎07-26-2022

Alerts are triggered each time the search criteria are met (unless throttled). If the shutdown event is only received once then the alert will only be triggered once. If you want the alert to repeat then the search must be written and scheduled to find the triggering event (or canceling event) each time it runs.

---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops · ‎07-26-2022

Thank you Rich - however, I don't want to create noise of recurring alert unless there is a need. i.e., only if the reconfigured message is not the latest in comparison to other strings - i want alerts to recur

richgalloway · ‎07-26-2022

That requirement can be included in the search.

index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*") 
| dedup <<field with message>>
| where NOT match(<<field with message>>, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation

---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops · ‎07-27-2022

Thank you once again Rich.
To add more details:
Failed condition comes as a different fields in the event compared to reconfigured which comes at a different position of the event. In short, if i extract this is how it would look

msg1 = "channel shutdown"
msg2 = "com.rabbitmq.client.ShutdownSignalException"
msg3 ="*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*"
msg4= "*reconfigured with 'RabbitMQ' bean*"

Alert should be kept triggering until msg 4 is latest in comparison to all other 3 messages irrespective of even it occurring only once.

richgalloway · ‎07-27-2022

If the failure and success messages are in different fields, then we can use the coalesce function to combine them for dedup.

index IN ("devcf","devsc") cf_org_name IN(xxxx,yyyy) cf_app_name=* "rabbit*" AND ("channel shutdown*" OR "*com.rabbitmq.client.ShutdownSignalException*" OR "*rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error*" OR "*reconfigured with 'RabbitMQ' bean*") 
| eval alert_field = coalesce(<<msg1 field>>, <<msg2 field>>, <<msg3 field>>, <<msg4 field>>)
| dedup alert_field
| where NOT match(alert_field, "reconfigured with 'RabbitMQ' bean")
| stats count by cf_app_name, cf_foundation

---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops · ‎07-27-2022

Thanks Rich. However, the challenge being alert is set to run for every 15 mins and events occur only once. How can it recur every 15 mins since the failure event won't occur?

Thank you for patience

richgalloway · ‎07-27-2022

Yes, that's the tricky bit and goes back to the part of my first reply that said "the search must be written and scheduled to find the triggering event". Rather than search back 15 minutes, it will be necessary for the alert to search back as far as necessary to find the events of interest.

---
If this reply helps you, Karma would be appreciated.

chaitanyaaiops · ‎07-27-2022

That's right - however, if i use dedup, and if the failed message has occurred after the success message, would that removes duplicates for failed messages

Only bit I don't get is, how do i compare timestamp for msg 4 (success) to be latest among all messages?

richgalloway · ‎07-28-2022

There's no need to compare timestamps. The dedup command keeps the most recent event so whatever result you get must be the latest message.

---
If this reply helps you, Karma would be appreciated.

How to create a recurring alert based on message strings?

alert condition

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

Troubleshooting the OpenTelemetry Collector

Adoption of Infrastructure Monitoring at Splunk