I'm running a somewhat large splunk installation that monitors syslog for >40k hosts. Every once in a while, a host goes crazy and starts logging as fast as it's network will carry it (SCSI errors, OOMKiller, etc).
Does anyone know of a way to alert in splunk on a host that exceeds a certain number of messages per minute? I'd like to kick off a script or email whenever one host goes over, say, 1000 mpm, but with so many different hosts I can't really create a search with the hostnames pre-defined.
asked 16 Jul '12, 07:46
Well the good news is that you don't have to predefine the hosts... that's what fields are for :-)
Create an alert for your search like this:
Schedule it to be run every minute, with a relative time span of:
with a custom condition to email you when:
From there you might want to tweak your search to throttle subsequent notifications, but there's an example of how you'd do what you're after.
Hope this helps :-)
Another way is to use time buckets. (more flexible, because you can run other longer periods)
mysearch | bucket _time span=1m | stats count by _time host | WHERE count > 10000will return all the hosts that had more than 10000 events per minute (and at when minute) and setup an alert condition on number of results > 0. attach the result to the email and you have all the details : _time host count
answered 16 Jul '12, 09:33