Solved: Best way to merge results? (ver 5.0.5)

yuwtennis · ‎05-21-2014

Hi!

I would like to get an advice for how to merge to results.

I have a search as below.

index=A [
search [ index=A
.....
field a b
]

The parent search takes the field a and b and search indexA again.
However , this is bit slow if I have thousands of result from the subsearch.

As a work-around , I believe you can merge results by either way.

Combination of lookup table and inner join
index=A [
search [ index=A
.....
fields a b
outputlookup hoge.csv
return ""
]
| join type=inner a b [|inputlookup hoge.csv]
Use map
index=A
......
| fields a b
| map search="search index=A a=$a$ b=$b$" maxsearches=xxxxxx

Since map command heavily relies on number of lists so I prefer using combination of join and lookuptable.

What will be a best way to merge results?

Thanks,
Yu

martin_mueller · ‎05-21-2014

For filtering a search based on a different search's results your first approach usually is best.

Let's make up a realistic example: You have events that form a transaction with some transaction_id... somewhere down the line of that transaction there is a user field, and you want to grab the transactions for user=yuwtennis.
A slow search would go like this:

sourcetype=transactions | transaction transaction_id | search user=yuwtennis

That'll build ALL the transactions and then throw out most of them.

Pre-filtering like this doesn't work if the user field isn't present in every event:

sourcetype=transactions user=yuwtennis | transaction transaction_id

So you'll have to pick out the transaction_id values you need before you build the transaction:

sourcetype=transaction [search sourcetype=transaction user=yuwtennis | dedup transaction_id | fields transaction_id] | transaction transaction_id

That will take a bit more time due to running two searches, but will almost always be miles faster than the first naïve search.

Your workaround #1 looks slow because joining will always be very slow compared to filtering before loading events.
Your workaround #2 is probably going to be worse when as you say there may be thousands of values returned from the subsearch, so the map would have to run thousands of searches - that can't be fast.

View solution in original post

martin_mueller · ‎05-21-2014

For filtering a search based on a different search's results your first approach usually is best.

Let's make up a realistic example: You have events that form a transaction with some transaction_id... somewhere down the line of that transaction there is a user field, and you want to grab the transactions for user=yuwtennis.
A slow search would go like this:

sourcetype=transactions | transaction transaction_id | search user=yuwtennis

That'll build ALL the transactions and then throw out most of them.

Pre-filtering like this doesn't work if the user field isn't present in every event:

sourcetype=transactions user=yuwtennis | transaction transaction_id

So you'll have to pick out the transaction_id values you need before you build the transaction:

sourcetype=transaction [search sourcetype=transaction user=yuwtennis | dedup transaction_id | fields transaction_id] | transaction transaction_id

That will take a bit more time due to running two searches, but will almost always be miles faster than the first naïve search.

Your workaround #1 looks slow because joining will always be very slow compared to filtering before loading events.
Your workaround #2 is probably going to be worse when as you say there may be thousands of values returned from the subsearch, so the map would have to run thousands of searches - that can't be fast.

lguinn2 · ‎05-21-2014

I am unclear about why you are going to "merge results"

I can't figure out why you can't simply do the search on index=A and be done. More details are needed to figure out the best approach.

Best way to merge results? (ver 5.0.5)

Get ready to show some Splunk Certification swagger at .conf24!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Database Performance Sidebar Panel Now on APM Database Query Performance & Service ...