We're ingesting Tomcat logs, and looking for items tagged [SEVERE]. I'd like to be able to pull a report of error rate, and look for errors which are occurring at a significantly higher than average rate *for their error type*. In addition, we're getting data streams from multiple hosts, each of which are really their own instance and have their own "native" error rate. I need the average rate of occurrence of errors over the last week, and over the last day, grouped by host and error type. Then I need to flag any error who's rate has risen by... say, 500%. So far the best I've come up with is this: index="tomcat" sourcetype="catalina.out_logs" SEVERE earliest=-7d latest=-1d | rex field=_raw "\[SEVERE[\s\t]+\][\s\t]+(?<err>.+)[\n\r]" | eval errhost = host + "::" + err | bucket _time span=1h | stats count as tcount by errhost | stats avg(tcount) as wk_avg by errhost | appendcols [search index="tomcat" sourcetype="catalina.out_logs" SEVERE earliest=-1d latest=now() | rex field=_raw "\[SEVERE[\s\t]+\][\s\t]+(?<err>.+)[\n\r]" | eval errhost = host + "::" + err | bucket _time span=1h | stats count as tcount by errhost | stats avg(tcount) as day_avg by errhost] | eval perc= round(((day_avg - wk_avg) * 100 / wk_avg),2) | fields + perc errhost | search perc > 500.0 | search NOT "Unable to serve form" | sort perc desc So - I'm pulling SEVERE errors, extracting just the error text, CONCATing that to the host to get my group-by string, bucketing in 1-hour increments to get an average, then building a chart with the 7-day average and the 1-day average for each host/error pair. Wondering if anyone else has a better way to do it? Thanks!
... View more