How to find the most prevalent values for a field

bigtyma · ‎01-09-2013

Title Position Security_Template NonEE - Volunteer F01-7121-600002 IP VIEW ONLY CLINICAL NonEE - Volunteer F01-7121-600002 FRONT DESK AND IP ORDERS CLERK NonEE - Volunteer F01-7121-600002 IP VIEW ONLY CLINICAL NonEE - Volunteer F01-7121-600002 IP VIEW ONLY CLINICAL NonEE - Volunteer F01-7121-600002 ELINK VIEW-ONLY CLINICAL

Using the example above, I would like to have a report listing the most frequent or most common value of 'Security_Template' for every Position. In the example above the most prevalent would be 'IP VIEW-ONLY'

Bonus point is if we can list the percentage of records that have each value.

Thank you!

tgow · ‎01-09-2013

If the field is already extracted then you can run the "top" command. Here is a quick example:

source=... | top Security_Template, PositionID

bigtyma · ‎01-09-2013

woohoo! this works. I had to to change it slightly to ' | top Security_Template by PositionID

bigtyma · ‎01-09-2013

Thank you for replying, unfortunately I did not get the results I was hoping for. Security_Template is already extracted.

In this scenario I should see

PositionID SecurityTemplate Percentage
F01-7121-600002 IP VIEW ONLY CLINICAL 50%

bigtyma · ‎01-09-2013

"wdd-d{4}-d{6}" format?

Yes.

tgow · ‎01-09-2013

You will need to extract the field first. Here is a quick example that might work:

source = .... | rex field=_raw "-\d+\s+(?[^\n]+) | stats count by Security_Template | sort -count

If you want it by percentage then use the "top" search command.

source = .... | rex field=_raw "-\d+\s+(?[^\n]+) | top Security_Template

If this is a CSV file with a header then you might be able to set the CHECK_FOR_HEADER in the props.conf file. Here is a link to more information:

http://docs.splunk.com/Documentation/Splunk/5.0.1/admin/Propsconf

bigtyma · ‎01-09-2013

Thank you for replying, unfortunately I did not get the results I was hoping for. Security_Template is already extracted.

In this scenario I should see

PositionID SecurityTemplate Percentage
F01-7121-600002 IP VIEW ONLY CLINICAL 50%

alacercogitatus · ‎01-09-2013

The data is not easy to split into fields because of the number of spaces. Will the position column always be in the "\w\d\d-\d{4}-\d{6}" format?

How to find the most prevalent values for a field

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

Everything Community at .conf24!

Index This | I’m short for "configuration file.” What am I?