Splunk Search

Dedup within a MV field

pkashou
Explorer

I need the ability to dedup a multi-value field on a per event basis. Something like values() but limited to one event at a time. The ordering within the mv doesn't matter to me, just that there aren't duplicates. Any help is greatly appreciated.

My search:

host=test* | transaction Customer maxspan=3m | eval logSplit = split(_raw,",") | eval eventSplit = mvfilter(match(logSplit, "^[E|e]vent-")) | table eventSplit

Normal output:

event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1

Preferred output:

event-001 = date:02/14/2013 12:48:09 -0500|result:available_retrieve_success
event-002 = date:02/14/2013 12:48:10 -0500|result:scan_success|token:uf
event-001 = date:02/13/2013 12:49:20 -0500|result:log_success
event-003 = date:02/14/2013 12:48:11 -0500|result:retrieve_success|txType:P|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad
event-001 = date:02/14/2013 12:48:16 -0500|result:p_success|txRefId:c0544ec1-bce5-4c4e-bc9d-f6e9072131ad|total:6.1

Tags (1)
1 Solution

martin_mueller
SplunkTrust
SplunkTrust

You could make use of the regular dedup like this:

...  | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count

View solution in original post

emiller42
Motivator

I know this is an old question, but I stumbled upon this while trying to do the same thing, and there is now a much cleaner solution:

eval mvfield=mvdedup(mvfield)

danbutterman
Explorer

Exactly what I was looking for.

Love this community.

redc
Builder

I ran into this need today and stumbled across this post...

It's worth noting for anyone else who finds this post while trying to figure out how to do this that <code>mvdedup</code> was only introduced in 6.2.0.

0 Karma

sideview
SplunkTrust
SplunkTrust

Another idea is to use stats values(), but do a weird trick to make it calculate unique values only within each row.

| streamstats count as row_number | stats values(mvField) as mvField by row_number | fields - row_number

martin_mueller
SplunkTrust
SplunkTrust

You could make use of the regular dedup like this:

...  | streamstats count | mvexpand eventSplit | dedup count eventSplit | mvcombine eventSplit | fields - count

pkashou
Explorer

Thanks to both of you as these both worked to a certain degree. The stats weird trick did some strangeness to the output so I ended up using the mvexpand/mvcombine approach along with eventstats.

Much appreciated!

Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...