Splunk Search

how to avoid duplicates when a log file is read from an url using scripted input ?

vallurupalli
New Member

we are reading http logs from a weburl using the curl command the webserver log is exposed as http://host/webserver.log which is read using scripted data input every 5 min.

If the log file has older entry along with new once when the next read happens splunk keeps loading the old log along with new once again , how to avoid the duplicates after if the log file is not rotated but still got new entries in it along with the old once that are already read during previous call.

Tags (1)
0 Karma

gkanapathy
Splunk Employee
Splunk Employee

Your script must keep track of what has been read, and output only new items. If it is helpful and you are on version 5.0, you can use modular inputs which provides a checkpointing function that will make this tracking easier.

http://docs.splunk.com/Documentation/Splunk/5.0/AdvancedDev/ModInputsCheckpoint

theouhuios
Motivator

You can use the sort command to list the new ones up. Or else you can specify the timeframe by setting up the earliest and latest in your search query.

Eg: earliest=-5h@h latest=@h --> Gives data which ahs occurred in last 5 hours

0 Karma
Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...