Splunk Search

How to find the most prevalent values for a field

bigtyma
Communicator

Title Position Security_Template
NonEE - Volunteer F01-7121-600002 IP VIEW ONLY CLINICAL
NonEE - Volunteer F01-7121-600002 FRONT DESK AND IP ORDERS CLERK
NonEE - Volunteer F01-7121-600002 IP VIEW ONLY CLINICAL
NonEE - Volunteer F01-7121-600002 IP VIEW ONLY CLINICAL
NonEE - Volunteer F01-7121-600002 ELINK VIEW-ONLY CLINICAL

Using the example above, I would like to have a report listing the most frequent or most common value of 'Security_Template' for every Position. In the example above the most prevalent would be 'IP VIEW-ONLY'

Bonus point is if we can list the percentage of records that have each value.

Thank you!

Tags (3)
0 Karma

tgow
Splunk Employee
Splunk Employee

If the field is already extracted then you can run the "top" command. Here is a quick example:

source=... | top Security_Template, PositionID

0 Karma

bigtyma
Communicator

woohoo! this works. I had to to change it slightly to ' | top Security_Template by PositionID

0 Karma

bigtyma
Communicator

Thank you for replying, unfortunately I did not get the results I was hoping for. Security_Template is already extracted.

In this scenario I should see

PositionID SecurityTemplate Percentage
F01-7121-600002 IP VIEW ONLY CLINICAL 50%

0 Karma

bigtyma
Communicator

"wdd-d{4}-d{6}" format?

Yes.

0 Karma

tgow
Splunk Employee
Splunk Employee

You will need to extract the field first. Here is a quick example that might work:

source = .... | rex field=_raw "-\d+\s+(?[^\n]+) | stats count by Security_Template | sort -count

If you want it by percentage then use the "top" search command.

source = .... | rex field=_raw "-\d+\s+(?[^\n]+) | top Security_Template

If this is a CSV file with a header then you might be able to set the CHECK_FOR_HEADER in the props.conf file. Here is a link to more information:

http://docs.splunk.com/Documentation/Splunk/5.0.1/admin/Propsconf

0 Karma

bigtyma
Communicator

Thank you for replying, unfortunately I did not get the results I was hoping for. Security_Template is already extracted.

In this scenario I should see

PositionID SecurityTemplate Percentage
F01-7121-600002 IP VIEW ONLY CLINICAL 50%

0 Karma

alacercogitatus
SplunkTrust
SplunkTrust

The data is not easy to split into fields because of the number of spaces. Will the position column always be in the "\w\d\d-\d{4}-\d{6}" format?

0 Karma
Get Updates on the Splunk Community!

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

This article is the continuation of the “Combine multiline logs into a single event with SOCK - a step-by-step ...

Everything Community at .conf24!

You may have seen mention of the .conf Community Zone 'round these parts and found yourself wondering what ...

Index This | I’m short for "configuration file.” What am I?

May 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with a Special ...