|
I'm running a somewhat large splunk installation that monitors syslog for >40k hosts. Every once in a while, a host goes crazy and starts logging as fast as it's network will carry it (SCSI errors, OOMKiller, etc). Does anyone know of a way to alert in splunk on a host that exceeds a certain number of messages per minute? I'd like to kick off a script or email whenever one host goes over, say, 1000 mpm, but with so many different hosts I can't really create a search with the hostnames pre-defined. Any thoughts? |
|
Well the good news is that you don't have to predefine the hosts... that's what fields are for :-) Create an alert for your search like this:
Schedule it to be run every minute, with a relative time span of:
with a custom condition to email you when:
From there you might want to tweak your search to throttle subsequent notifications, but there's an example of how you'd do what you're after. Hope this helps :-) Definitely looks like the right direction, but i get the following error message when I try to specify the custom condition: "Encountered the following error while trying to update: In handler 'savedsearch': Cannot parse alert condition. Search operation 'count' is unknown. You might not have permission to run this operation." I'm setting up this alert as the admin user, so permissions shouldn't be an issue
(16 Jul '12, 08:27)
rgisrael
which strange admin disable the "count" command ? Maybe a typo error ?
(16 Jul '12, 09:35)
yannK
Yep, edited the answer accordingly (it was late when I did that one sorry!)
(16 Jul '12, 20:44)
R.Turk
does this work on the free version? will i be able to migrate alerts when i update from 3.x to 4.x?
(18 Jul '12, 10:34)
jyanga
|
|
Another way is to use time buckets. (more flexible, because you can run other longer periods) mysearch | bucket _time span=1m | stats count by _time host | WHERE count > 10000will return all the hosts that had more than 10000 events per minute (and at when minute) and setup an alert condition on number of results > 0. attach the result to the email and you have all the details : _time host count |