Solved: Re: how to join 3 different data sources

Laya123 · ‎03-16-2016

Hi,

I have 3 different sources. I want to merge splunk search data with another data of 2 different csv files using inputlookup.

From all 3 different sources p_name is the common field. But csv files having more number of records (this file is having duplicate project names) than IVW logs. I want to lookup with csv file for latest values.

Example:

from IVW logs, I got output like

p_name; respondents; host
A 20 X
B 25 Y
C 5 Z
D 12 W
E 6 V

From A-N.csv, I have fileds like

p_code; O_Status; O_Server; Contact; SVer DateTime
A; Pass; AB1; abc@f ; 2; 23/01/2016 06:30:20

A; Pass; AB1; abc@f ; 2; 22/02/2016 05:10:20
A; fail; AB1; abc@f ; 2; 13/03/2016 08:30:20
A; Pass; AB1; abc@f ; 2; 05/03/2016 07:20:20
A; Pass; AB1; abc@f ; 2; 06/03/2016 05:30:20
A; Pass; AB1; abc@f ; 2; 11/03/2016 05:30:20
C; fail; NA Not found NA 15/01/2016 05:30:20
C; Pass; AB1; abc@f ; 2; 05/02/2016 05:30:20
C; Pass; AB1; abc@f ; 2; 15/02/2016 05:30:20
C; fail; AB1; abc@f ; 2; 02/03/2016 05:30:20
C; fail; AB1; abc@f ; 2; 13/03/2016 05:30:20
B; Pass; AB1; abc@f ; 2; 12/01/2016 05:30:20
B; Pass; AB1; abc@f ; 2; 11/02/2016 05:30:20
B; Pass; AB1; abc@f ; 2; 10/02/2016 05:30:20
B; Pass; AB1; abc@f ; 2; 25/02/2016 05:30:20
B; Pass; AB1; abc@f ; 2; 02/03/2016 05:30:20
B; Pass; AB1; abc@f ; 2; 07/03/2016 05:30:20
B; pass; AB1; abc@f ; 2; 12/03/2016 05:30:20
H; Pass; AB1; abc@f ; 2; 10/02/2016 05:30:20
H; Pass; AB1; abc@f ; 2; 25/02/2016 05:30:20
H; Pass; AB1; abc@f ; 2; 02/03/2016 05:30:20
H; Pass; AB1; abc@f ; 2; 07/03/2016 05:30:20
M; fail; AB1; abc@f ; 2; 12/03/2016 05:30:20
M; Pass; AB1; abc@f ; 2; 02/03/2016 05:30:20
M; Pass; AB1; abc@f ; 2; 07/03/2016 05:30:20
M; fail; AB1; abc@f ; 2; 12/03/2016 05:30:20
M; Pass; AB1; abc@f ; 2; 10/02/2016 05:30:20

From D1.csv, I have fields like

Project; CCoun; CO1; Contacts

A; ABC ; AB ; AB@g
B; XYZ ; CD ; BC@g

C; LMN ; EF ; CD@g
D; PQR ; GH ; EF@g
E; STU ; YZ ; GH@g

I want my output like

p_name; respondents; host; CCoun; CO1; Contancts; O_status O_Server SVer

A ; 20 ; X; ABC ; AB ; AB@g ; Pass ; AB1 2
B ; 25 ; Y XYZ; CD ; BC@g ; pass ; AB1 2

C ; 5 ; Z LMN ; EF ; CD@g fail ; AB1 2
D ; 12 ; W PQR ; GH; EF@g

E ; 6 ; V STU ; YZ ; GH@g

Below query is giving results but in some places there is data in A-N.csv but not showing in output. Can any one help me to do this.

there is duplicate project names in A-N.csv, but when I am mapping with ivw data it has to show latest values based on the date in A-N.csv

index=ibm sourcetype=ivw host=ABC |eval p_name=upper(p_name) |chart distinct_count(respondentID) as respondents, values(host) as "host", count as Entries, |join max=0 type=outer "p_name" [|inputlookup A-N.csv |rename PCode as p_name|values(O_Server) as O_Server values(O_Status) as O_Status values(SVer) as SVer by p_name] |join max=0 type=outer "p_name" [|inputlookup D1.csv |rename project as p_name|stats values(CCoun) as CCoun values(CO1) as CO1 values(Contacts) as Contacts by p_name ]

Thanks in advance

somesoni2 · ‎03-16-2016

Looks like you've csv fiels ingested as lookup table files, so you don't need join at all. It could be as simple as this

index=ibm sourcetype=ivw host=ABC |eval p_name=upper(p_name) |chart distinct_count(respondentID) as respondents, values(host) as "host", count as Entries, 
| lookup A-N.csv PCode as p_name OUTPUT O_Server OStatus Sver 
| lookup D1.csv project as p_name OUTPUT CCount CO1 Contacts

Update

Give this a try

index=ibm sourcetype=ivw host=ABC |eval p_name=upper(p_name) |chart distinct_count(respondentID) as respondents, values(host) as "host", count as Entries by p_name
| lookup D1.csv project as p_name OUTPUT CCount CO1 Contacts
| append [| inputlookup A-N.csv | rename PCode as p_name | sort 0 p_name -DateTime | dedup p_name | table p_name O_Server OStatus Sver ]
| stats values(*) as * by p_name

View solution in original post

somesoni2 · ‎03-16-2016

Looks like you've csv fiels ingested as lookup table files, so you don't need join at all. It could be as simple as this

index=ibm sourcetype=ivw host=ABC |eval p_name=upper(p_name) |chart distinct_count(respondentID) as respondents, values(host) as "host", count as Entries, 
| lookup A-N.csv PCode as p_name OUTPUT O_Server OStatus Sver 
| lookup D1.csv project as p_name OUTPUT CCount CO1 Contacts

Update

Give this a try

index=ibm sourcetype=ivw host=ABC |eval p_name=upper(p_name) |chart distinct_count(respondentID) as respondents, values(host) as "host", count as Entries by p_name
| lookup D1.csv project as p_name OUTPUT CCount CO1 Contacts
| append [| inputlookup A-N.csv | rename PCode as p_name | sort 0 p_name -DateTime | dedup p_name | table p_name O_Server OStatus Sver ]
| stats values(*) as * by p_name

Laya123 · ‎03-16-2016

Hi,

When I used this query I am getting all values from A-N.csv but I don't want all values from A-N.csv for one project I want to display only latest value from the A-N.csv.

Means from my sample data If Project A is having more than one OStatus but I want only OStatus of latest. If Ostatus for A is having fail on 15/Feb/2016 but on 16/Mar/2016 the project status was Pass I want to display only this pass in my output not all values from A-N.csv

Thanks

Laya123 · ‎03-16-2016

Thank you so much the update one worked for me

Thanks

asimagu · ‎03-16-2016

this may point you in the right direction

https://answers.splunk.com/answers/129424/how-to-compare-fields-over-multiple-sourcetypes-without-jo...

Laya123 · ‎03-16-2016

Thank you for your immediate response.

I tried this but no luck. Here I am mapping values with csv

Thanks & Regards

how to join 3 different data sources

Detecting Remote Code Executions With the Splunk Threat Research Team

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!