Splunk Search

How to return unique values with highest count, and sort them highest to lowest

ea-2023
Path Finder

Hello,

I am working on a search to find domains queried via a particular host, and list out a count of hits per unique domain on the host, along with the username. This search is returning the domains individually, but they are showing up as an entry in each "count" (see the Actual Results below). 

What I am looking to do is get the results to show only the values for the highest count of each domain, and to order these results from highest to lowest (see Expected Results below). 

index=foo
Host=<variable>
| streamstats count(query) as Domains by User query Workstation
| eval combo=Domains +" : "+ query
| stats values(combo) as "Unique Hits : Domain" by User Workstation
| sort - combo



Actual Results (truncated):
1 : www.youtube.com
2 : history.google.com
3 : history.google.com

Expected Results (truncated):
3 : history.google.com
2 : mail.google.com
1 : www.youtube.com

Labels (3)
0 Karma

PickleRick
SplunkTrust
SplunkTrust

OK. You're overcomplicating the issue.

If you're gonna do stats and throw all other stuff away, there's no point in doing streamstats which is a much "heavier" command.

So what I'd do would be simple

index=foo
| stats count as domaincount by Domain User Workstation

Now you'll get your count and you can do with it whatever you please - sorting, filtering, aggregating. You name it.

For example, if you want just three most often used domains per each user/workstation, just

| sort User Workstation - domaincount
| streamstats count by Domain User Workstation
| where domaincount<=3

(writing from the top of my head so the sorting might be a bit off).

0 Karma

ea-2023
Path Finder

This seems to technically work, however I am left with an unwanted "count" column at the end that I don't know how to remove. As an example of what I'm after, I've included the "Target Output" below. 

Actual Output:

Screenshot 2023-12-11 at 5.15.19 PM.png


Target Output:

 Screenshot 2023-12-11 at 5.07.55 PM.png

0 Karma

PickleRick
SplunkTrust
SplunkTrust

You can always remove extra fields when they are no longer needed. In this case it's as simple as

| fields - count

While it's relatively easy to transform the output from my search to something resembling "your output" - you just add

| stats list(Domain) as Domain list(domaincount) as "Domain Count" by Workstation User

You can of course reorder the results with the "fields" command at the end to move some to the left, others to the right.

There is one catch - you get two separate multivalued fields - Domain and Domain Count. The "problem" with Splunk is that these are two separate fields and there is no connection between them. So if you wanted to - for example sort one of them and sort the other one accordingly - there's no way to do it unless you zip them together into a list of single values, sort them and then split them back. Ugly.

Also if by any chance your search leading to the "stats list()" resulted in partially empty results, you wouldn't get multivalued fields with "holes" in them as the result.

 

0 Karma

yuanliu
SplunkTrust
SplunkTrust

Start by illustrating/mockup/explaining data input. (Anonymize as needed.)  What are data fields of interest?  In your search, what does the field 'query' stand for? What are sample/mock values?  What does "Workstation" do in your logic?

Anyway, your sort command has no effect because "combo" is no longer available after stats.  If anything, you can try

index=foo
Host=<variable>
| streamstats count(query) as Domains by User query Workstation
| eval combo=Domains +" : "+ query
| stats values(combo) as "Unique Hits : Domain" by User Workstation
| sort - "Unique Hits : Domain"
0 Karma

ea-2023
Path Finder

Hi Yuanliu,

The query field is the domain visited (e.g. www.youtube.com) I am just renaming that field to "Domains". The Workstation is the hostname of the user's computer (e.g. ABC193423). I am ultimately looking to return unique values for the domains seen, but only the highest count (e.g. in the current results, if a user goes to www.youtube.com 3 times, it will show up 3 times in the search as 1 :, 2 :, 3 :, etc.). The intention here is to see how many times a domain was visited on a given Workstation, and include the User that accessed the domains.

So the search can either return only the highest count seen for a domain, OR I can sort the counts descending to get the highest hits on top (this is meant to be a workaround to get the search working until I can figure out how to remove the "duplicated" lower counts).

I thank you for your suggestion, but unfortunately it did not change how the data was populated. It was still ascending the counts.

 

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...