Splunk Search

How to collect and dedup the newest to a new Index

sai33
Explorer

Hello Splunkers,

I've got an existing index which I would like to process and collect in a new Index. My rough idea is as following:

  • Use Sort and get the latest(Newest) event in the existing Index - BY(Group by) ID
  • Collect(Copy) only the first(Newest) event from the Above Index to a New Index.

My sample data in the existing Index looks like below:

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
1, Purchase, 11.08.2019-15:30
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

The new data in my New Index should be a Collect from the Above Index

ID, Action, DateTime
1, Purchase, 11.08.2019-16:00
2, Purchase, 11.08.2019-13:00
3, Purchase, 11.08.2019-16:00

If you observe the second Event for ID 1 is not present in the second Index.

I'm believing this should be possible using Sort, Dedup and Collect. Please suggest the best possible method. I've to move an Index of around 5GB.

Thanks!!

0 Karma

niketn
Legend

@sai33 does the DateTime field in index1 corresponds to _time field in your data?

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma

sai33
Explorer

I'm not exactly sure what _time you're refering to. But, this is the timestamp(Date & Time of the Event)
Since, being a newbie to Splunk i'm relatively new to technical terms.
Sorry for the trouble!

0 Karma

niketn
Legend

In order for the community to assist you better you would need to provide your current SPL (mock/anonymize any sensitive information before posting the same).

Can you print the following table and see if _time has same value as DateTime or not?

<yourIndex1Query>
| table _time ID Action DateTime

_time is the Time of the event that you define while indexing the data in Splunk. It is one of the most crucial piece of information that Splunk would need while indexing as any incorrect timestamp in indexed event would imply that none of your correlation/queries would work as expected.

While this is not directly related to the answer to your question here, I would recommend you to understand this as the first step for indexing data correctly. So, refer to documentation: https://docs.splunk.com/Documentation/Splunk/latest/Data/HowSplunkextractstimestamps
Also, second most crucial step is Event Breaking which tells Splunk the boundary of each event as it processes streaming data input. Incorrect event breaks would imply that there may be unwanted events overlap or drop. So read the following documentation as well: https://docs.splunk.com/Documentation/Splunk/latest/Data/Configureeventlinebreaking

____________________________________________
| makeresults | eval message= "Happy Splunking!!!"
0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...