Refine your search:

1
1

We're using Splunk to monitor the logs of IBM's Tivoli Storage Manager and we'd like to replace our current home-grown alerting system. We'd like to create alerts based on the TSM error code, and the idea is to have one alert per error code so that they can be managed and thresholded independently (ie. we don't want several cases of a "benign" or understood error code eclipsing the others).

The problem is that there are many error codes that we'd like to alert, at current count about 300. Also, we'd like to be able to alert every other error code in case we miss something, but for this we should only get one alert.

Now, we could have alerts with searches like these:

  • Alert 1: "search tsmcode=ANR0102E"
  • Alert 2: "search tsmcode=ANR3423E"
  • ...
  • Alert 3XX (the generic one): "search eventtype=error NOT tsmcode=ANR0102E NOT tsmcode=ANR3423E NOT ..."

but this seems kind of hard to manage, not to mention messy. Is there a better way to do this?

asked 25 May '11, 11:09

alexiri's gravatar image

alexiri
614
accept rate: 33%

edited 30 May '11, 02:57


One Answer:

I think that using a lookup might be the best way. Your lookup file could look something like:

tsmcode,alert,severity
ANR0102E,1,low
ANR3423E,1,high
...

With automatic lookups your search would become more like:

# catch anything else?
eventtype=error NOT alert=*

or similar. And, of course, with the addition of severity to the mix, you could treat messages more appropriately, and likely from just a few searches.

link

answered 30 May '11, 06:49

mw's gravatar image

mw
1.6k12
accept rate: 30%

Hi Mike,

Yes, something like this may be the easiest way to deal with the generic alert. I guess I could probably also generate the CSV file programatically if I can get Splunk to give me a list of configured alerts. (Is this possible?)

Can you think of any solution to the first issue, ie. having to create 300 alerts in Splunk?

Cheers,
Alex

(30 May '11, 08:01) alexiri
1

Why do you need to create 300 alerts? I would imagine that the same lookup would be used to limit yourself to just a few alert searches. In other words, at least from my experience, you wouldn't treat 300 error codes in 300 different ways; you would treat them in groups as "critical" severity, etc, etc. With a lookup, the severity would be added, and so you would only need one or a few searches, IMHO.

(30 May '11, 08:07) mw
Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×172
×47
×35
×3

Asked: 25 May '11, 11:09

Seen: 727 times

Last updated: 30 May '11, 08:07

Copyright © 2005-2012 Splunk, Inc. All rights reserved.