Refine your search:

4
1

I have indexed the contents of a relational database along with a log file. My log contains these fields:

  • cost - this is an integer that I want to report on)
  • userid - this is an ID into the Users table that I've indexed from my database

My users table, now indexed in Splunk, contains these fields:

  • userid - same ID as above
  • username - human-readable text of their name

I want to end up with a chart which shows the top 20 users ordered by how much they've spent (sum of cost field) over a specific time range (e.g. last 2 weeks). I want to show them by username, not by ID. What's the best way to do this in Splunk?

asked 23 Mar '10, 01:25

Justin%20Grant's gravatar image

Justin Grant
1.5k6740
accept rate: 50%

edited 04 May '11, 17:28

jlaw's gravatar image

jlaw ♦
20113


3 Answers:

It is easy to look at this problem and think that since this would definitely be a `join` in SQL, therefore you need to use the `join` command in Splunk.

However you do not need the join command here and you will be much better off without it. Not only is join slower for having to run the second search and have a second process getting events off disk, but its searches will quietly truncate at I think 50,000 events, and will quietly self-finalize in some number of seconds in limits.conf whichever comes first. Both of these problems will bite you and it's usually easy to avoid joins by thinking about how to solve it with lookups or with the plain old stats command and some grouping.

In your particular case I'll echo Gerald's answer that the lookup is probably the way to go, particularly if the userids and usernames dont change very much relative to the scale of the scheduled search maintaining the lookup table.

But here's how to do it just with search and stats. One big OR clause in search mashes the two data sets together and then we do a little bit of stats command to merge it all into what we want.

Run this search over the last 2 weeks:

source=costtable OR source=usertable | stats first(username) as name sum(cost) as total by userid | sort - total | fields username total

link

answered 22 Apr '10, 22:39

nick's gravatar image

nick ♦
14.2k1318
accept rate: 46%

edited 30 Jul '11, 11:16

2

You might want to read "Splunk for SQL Users"...

If you are familiar with SQL this quick comparison might be helpful to jump-starting you into using Splunk.

http://www.innovato.com/splunk/SQLSplunk.html

(22 Apr '10, 23:58) carasso ♦♦

The best way is to set your users table as a lookup. Periodically (or whenever you update it, or whenever you need it, run):

source=usertable | outputlookup usertable.csv

You could also just skip putting the user table into the Splunk index and just export it as CSV and place it into a lookup table directly. If you define the lookup table in transforms.conf, you can use that name rather than usertable.csv

Then:

source=logfile | lookup usertable.csv userid OUTPUT username | stats sum(cost) by username | sort 20 -sum(cost)

You could also set the lookup as an automatic lookup in props.conf so it runs every time you view your log source (or sourcetype or whatever).

link

answered 23 Mar '10, 02:07

gkanapathy's gravatar image

gkanapathy ♦
26.5k1622
accept rate: 42%

how does a lookup approach differ (in performance, ease of management, etc.) from @Equalis's suggestion above to use the JOIN command?

(23 Mar '10, 18:24) Justin Grant
3

Splunk 4.0 executes lookups more efficiently than join. Making the lookup automatic lets you efficiently search using the username (e.g. source=costtable username=myname), while with join you must execute the join on the entire costtable. If there is no match on (e.g.) userid, then join will drop your costtable row, while lookup will keep it (join=inner join vs lookup=left join). Lookups can be transparently replaced with a script. Note, with join, you need to specify option max=0, or you'll only get one result per row in the usertable.

(24 Mar '10, 14:31) gkanapathy ♦

Alternatively you could use the JOIN command. Assuming this first search returns the usernames

source=usertable | dedup username | fields + userid username

And your cost data has the common field userid, you could use the following to join the outputs.

source=costtable | join userid [source=usertable | fields + userid username]

Bob

alt text

link

answered 23 Mar '10, 15:13

BobM's gravatar image

BobM
1.5k110
accept rate: 31%

Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×1,090
×194
×189

Asked: 23 Mar '10, 01:25

Seen: 2,546 times

Last updated: 30 Jul '11, 11:16

Copyright © 2005-2012 Splunk, Inc. All rights reserved.