I have a CSV file which has a some columns. There is one column named GROUP and the data in that column are in the format {'a':1,'b':2}, there can be any number of strings. There is another column VALUE and the data are in the format {'a','b'}.
I want to check if the strings in VALUE column are present in GROUP column and create a separate column named DATA with the strings not present. I am not sure how to achieve it in Splunk using commands. Can anyone have any suggestions?
Example:
Group Value Data
{'a':'1','b':'2'} {'a','b'}
{'a':1,'b':'2'} {'a'} {'b'}
{'a':1,'b':'2','c':'3'} {'a'} {'b','c'}
There are many columns like these and strings present in GROUP column can be more.
Hi @anooshac,
these seems to be data in json format.
If they aren't in a lookup, you could parse them and store in an index using the INDEXED_EXTRACTIONS=json option of props.conf (https://docs.splunk.com/Documentation/Splunk/9.1.1/Admin/Propsconf).
In this way you extract all the fields and you can use the values for your matches.
Ciao.
Giuseppe
You can do it like this runnable example with your data - using from the rex statement
| makeresults
| eval _raw="Group Value Data
{'a':'1','b':'2'} {'a','b'}
{'a':1,'b':'2'} {'a'} {'b'}
{'a':1,'b':'2','c':'3'} {'a'} {'b','c'}"
| multikv forceheader=1
| table Group Value Data
``` This is your Splunk SPL ```
| rex field=Group max_match=0 "'(?<g>\w)':"
| rex field=Value max_match=0 "'(?<v>\w)'"
| eval Calculated_Data=mvmap(g, if(g!=v, g, null()))
| eval Calculated_Data="{'".mvjoin(Calculated_Data, "','")."'}"
| fields - g v
So, if you have a CSV file with Group and Value in it, then
| inputlookup your_csv.csv
| rex field=Group max_match=0 "'(?<g>\w)':"
| rex field=Value max_match=0 "'(?<v>\w)'"
| eval Data=mvmap(g, if(g!=v, g, null()))
| eval Data="{'".mvjoin(Data, "','")."'}"
| fields - g v
Should the csv file uploaded as lookup file? Can we avoid that?
Also the strings can contain _ as well as they can be capital letters and small letters.. how can i do this? This [A-Za-z_] regular expression is fine?
Where is your data coming from? Data has to be either
a) In a Splunk index
b) In a lookup in Splunk
c) Part of your search, as in my | makeresults example
Hi @bowesmana the data comes from Splunk index and it is csv file
Just change the regex in the 2 rex statements
| makeresults
| eval _raw="Group;Value;Data
{'Ala_ABC':'1','Bob_XX_':'2'};{'Ala_ABC','Bob_XX_'};
{'Ala_ABC':1,'Bob_XX_':'2'};{'Ala_ABC'};{'Bob_XX_'}
{'Ala_ABC':1,'Bob_XX_':'2','_c_is_for_Charlie':'3'};{'Ala_ABC'};{'Bob_XX_','_c_is_for_Charlie'}"
| multikv forceheader=1
| table Group Value Data
``` This is your Splunk SPL ```
| rex field=Group max_match=0 "'(?<g>[A-Za-z_]+)':'"
| rex field=Value max_match=0 "'(?<v>[A-Za-z_]+)'"
| eval Calculated_Data=mvmap(g, if(g!=v, g, null()))
| eval Calculated_Data="{'".mvjoin(Calculated_Data, "','")."'}"
| fields - g v
Hi @bowesmana , the method you mentioned is working fine..
Sorry for not mentioning before, the column VALUE can have data like {'a','b'} or a. In such cases how can i change the regex?
Just add any additional character groupings into the allowed character ranges, i.e.
| rex field=Group max_match=0 "'(?<g>[A-Za-z_\.]+)':'"
| rex field=Value max_match=0 "'(?<v>[A-Za-z_\.]+)'"