Solved: Re: Averaging by Field over Time

acdewey · ‎10-12-2022

Hi all!

I feel as if I'm overcomplicating an issue, but I haven't gotten any built-in Splunk tools to work.

Here's the situation: I have a field that I extract from my logs using rex. I want to be able to take an average AND a standard deviation count of each field's occurrence over the days to be able to detect any new abnormalities of this field. Here's the field extraction:

earliest=-7d@d
latest=-0d@d
index=prod
"<b>ERROR:</b>"
| rex "foo:\ (?<my_foo>[^\ ]*)"
| rex "bar:\ (?<my_bar>[^\<]*)"
| eval my_foo  = coalesce(my_foo,"-")
| eval my_bar = coalesce(my_bar, "-")
| rex mode=sed field=my_bar "s/[\d]{2,50}/*/g"
| strcat my_foo " - " my_bar my_foobar

I can use stats to get a total count by my_foobar. And I can use timechart to get a count by day for my_foobar. However, if I try to average by day after timechart, I'll get no output unless I give up my my_foobar discretion.

| timechart span=1d@d count as my_count by my_foobar
| stats avg(my_count)

No output

| bin span=1d@d my_chunk
| stats count(my_script_message) by my_chunk

No output

I did come up with a solution, but it's hideous. I basically made my own bins using joins

<initial search above>
| chart count as my_count1 by my_foobar
| join my_foobar [search
                  <initial search above with my_count iterated>
<x5 more joins>

| eval my_avg = SUM(my_count1 + my_count2 + my_count3 + my_count4 + my_count5 + my_count6 + my_count7)/7
| eval my_std = (POW((my_count1 - my_avg),2) + POW((my_count2 - my_avg),2) + POW((my_count3 - my_avg),2) + POW((my_count4 - my_avg),2) + POW((my_count5 - my_avg),2) + POW((my_count6 - my_avg),2) + POW((my_count7 - my_avg),2))/7
| eval my_last_day_dev = ABS(my_count1 - my_mess_avg)
| table my_foobar my_avg my_std my_last_day_dev
| search my_last_day_dev > my_std

I hate it and need to use this methodology for many of my monitoring plans. Any ideas on how to make this more sleek?

jdunlea · ‎10-12-2022

The issue with your first example is that when you use timechart, you lose the field reference for my_count because it gets wrapped up in the split by field (my_foobar). Essentially, if you use timechart before the last stats, you will notice when you view the data in statistics mode, the field called "my_count" does not exist as a "usable field" for the next part of the search.

Instead, if you swap out the timechart for another stats, you can do what you are wanting. See below:

earliest=-7d@d
latest=-0d@d
index=prod
"<b>ERROR:</b>"
| rex "foo:\ (?<my_foo>[^\ ]*)"
| rex "bar:\ (?<my_bar>[^\<]*)"
| eval my_foo  = coalesce(my_foo,"-")
| eval my_bar = coalesce(my_bar, "-")
| rex mode=sed field=my_bar "s/[\d]{2,50}/*/g"
| strcat my_foo " - " my_bar my_foobar
| bin _time span=1d
| stats count as my_count by _time my_foobar
| stats avg(my_count)

This should give you output.

View solution in original post

jdunlea · ‎10-12-2022

The reason that we are using _time for bin is because this allows us to use the first stats command in a similar way that the timechart works, BUT without losing the field called my_count.

Timechart is very similar to running "stats" with "by _time" at the end. Hence, I wanted to use a "stats" with "by _time", to create a similar result set. But in order to do that, I needed to use "bin" with _time first before replacing the timechart with the stats command.

The main thing you need to think about is, regardless of what you are putting in your query, you need to look at the fields that are produced in the RESULTS at each step of the query. Only the fields that are produced/outputted after each phase of the query will be available to the next phase of the query.

So, in your example, you were using timechart. And if you just ran that part of the search and checked the statistics table of the results, you would see that there is no field called "my_count". Which means that the field called "my_count" will not be available for the next part of the search (which is your last stats command).

So the reason that I used _time with bin, is to replace your timechart command with a stats command, which preserves the my_count field to be used in the next part of the search (which is your last stats command).

I hope that helps!

acdewey · ‎10-12-2022

You are amazing. Thank you so much! 100 karma points to you.

jdunlea · ‎10-12-2022

The issue with your first example is that when you use timechart, you lose the field reference for my_count because it gets wrapped up in the split by field (my_foobar). Essentially, if you use timechart before the last stats, you will notice when you view the data in statistics mode, the field called "my_count" does not exist as a "usable field" for the next part of the search.

Instead, if you swap out the timechart for another stats, you can do what you are wanting. See below:

earliest=-7d@d
latest=-0d@d
index=prod
"<b>ERROR:</b>"
| rex "foo:\ (?<my_foo>[^\ ]*)"
| rex "bar:\ (?<my_bar>[^\<]*)"
| eval my_foo  = coalesce(my_foo,"-")
| eval my_bar = coalesce(my_bar, "-")
| rex mode=sed field=my_bar "s/[\d]{2,50}/*/g"
| strcat my_foo " - " my_bar my_foobar
| bin _time span=1d
| stats count as my_count by _time my_foobar
| stats avg(my_count)

This should give you output.

acdewey · ‎10-12-2022

Hi jdunlea,

Thanks! I swear I tried this. Is there some magic behind using _time for bin?

Regardless, this worked. I will say, I did have to modify it slightly by adding the by my_foobar to the stats avg:

| bin _time span=1d@d
| stats count as my_count by _time my_foobar
| stats avg(my_count) stdev(my_count) by my_foobar

Averaging by Field over Time- How to make monitoring more sleek?

chart

join

stats

timechart

Enhance Security Visibility with Splunk Enterprise Security 7.1 through Threat ...

Troubleshooting the OpenTelemetry Collector

Adoption of Infrastructure Monitoring at Splunk