Solved: Alert on delta based on percentage.

nocostk · ‎01-20-2011

I'm trying to monitor any sudden drops/increases into my Weblogic queue. I can get a search easy enough to visualise it - I'm just having a hard time formatting it to something I can alert off of.

Here's the visual search:

host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference

I thought maybe adding some eval at the end would work - which it kind of does. I do get a percentage, I'm just not sure what I can do next. I'd like the alert to trigger if there is a 50% change (positive/negative).

host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=(difference/count)*100 | eval percdif=round(percdif,0)

Any help would be appreciated.

David · ‎01-21-2011

I'd simplify your statement a touch:

host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=round(abs(difference/count)*100,0)

So you can then alert on if percdif > 50.

Without knowing your data, though (and knowing that this may be very obvious to you already), note that the above will alert on any sudden drops / increases into the number of times that message is logged, which will not necessarily equal your queue length. If that full message contains a QueueLength field, or anything like that, you might get more useful information by going for that field:

host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart last(QueueLength) as CurrentQueueLength span=10m | delta CurrentQueueLength as difference | eval percdif=round(abs(difference/CurrentQueueLength)*100,0)

View solution in original post

David · ‎01-21-2011

I'd simplify your statement a touch:

host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=round(abs(difference/count)*100,0)

So you can then alert on if percdif > 50.

Without knowing your data, though (and knowing that this may be very obvious to you already), note that the above will alert on any sudden drops / increases into the number of times that message is logged, which will not necessarily equal your queue length. If that full message contains a QueueLength field, or anything like that, you might get more useful information by going for that field:

host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart last(QueueLength) as CurrentQueueLength span=10m | delta CurrentQueueLength as difference | eval percdif=round(abs(difference/CurrentQueueLength)*100,0)

David · ‎01-25-2011

Splunk put up a page with all the functions that are available in eval. It is quite helpful: http://www.splunk.com/base/Documentation/latest/SearchReference/CommonEvalFunctions

nocostk · ‎01-21-2011

Very nice, thanks, David. I didn't realize the the abs() existed.

nocostk · ‎01-20-2011

Hmm,I think I have it. Maybe I could get a spot check? host="weblogic*" JMS_Destination_Queue="CustomerAccountServiceQueue" JMS_Event="Produced" earliest=-10m | timechart span=10m count | delta count as difference | eval percdif=(difference/count)*100 | eval percdif=round(percdif,0) | where percdif < -50 OR percdif > 50 :: Then I schedule a job every 10 minutes?

Alert on delta based on percentage.

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

New in Observability Cloud - Explicit Bucket Histograms