Getting Data In

How to tell splunk to read log files only once, but keep monitoring the folder for new files?

Path Finder

I have an ActiveBatch setup that generates many files (tens of thousands) in a folder. I'd like to have Splunk read only files freshly generated in these ActiveBatch folders. I am using the setting followTail=1 for now, and it works OK. Is there a better way to do this?

It took splunk several hours of 100% CPU usage to go through a couple of such folders (with 30K files each). The files are generated once and are never modified after that (so "following their tail" is useless).

Is there a way to tell that to splunk? A setting similar to followTail but that would tell it to:

  • look only at new files in a folder (ignore any files that existed before the input was defined in splunk)
  • each file is created when corresponding job starts running, the file grows for some time (anywhere from 1 second to several hours, depending how long the corresponding job takes to complete)
  • once the corresponding job is finished the log file will never be modified again (no use tailing it anymore)
  • there are tens of thousands of such files, in several folders (it looks like tailing all those files is taking a serious toll on splunkd)
  • each of these files has a common section at the end, that can be used to determine that no more monitoring is necessary (you can see that common section this question)
Tags (2)
1 Solution

Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

View solution in original post

Splunk Employee
Splunk Employee

There is a setting for ignoring old files:

ignoreOlderThan = <time window>
* Causes the monitored input to stop checking files for updates if their modtime has passed this threshold.
  This improves the speed of file tracking operations when monitoring directory hierarchies with large numbers
  of historical files (for example, when active log files are colocated with old files that are no longer
  being written to).
* A file whose modtime falls outside this time window when seen for the first time will not be indexed at all.
* Value must be: <number><unit> (e.g., 7d is one week).  Valid units are d (days), m (minutes), and s (seconds).
* Default: disabled.

Path Finder

Excellent! This seems to be quite suitable for this. Ignoring files older than 2 days will cover every situation in this case. Thanks!

0 Karma

New Member

I'm not getting 'ignoreOlderThan' to work?

disabled = false
index = [redacted]
blacklist = 201[0-9]-[0-1][0-8]
sourcetype = syslog

The directory is full of syslog files from rsyslog. When I do a 'splunk list monitor' its showing files that have dates back in 2017-12? (PS the blacklist was my attempt to stop if monitoring old files).

Like above OP, I have files created each day, but thousands of them. I dont want the UV to 'monitor' the files, but import any new ones. Once the files are created, they are never written too.

0 Karma
Get Updates on the Splunk Community!

Join Us at the Builder Bar at .conf24 – Empowering Innovation and Collaboration

What is the Builder Bar? The Builder Bar is more than just a place; it's a hub of creativity, collaboration, ...

Combine Multiline Logs into a Single Event with SOCK - a Guide for Advanced Users

This article is the continuation of the “Combine multiline logs into a single event with SOCK - a step-by-step ...

Everything Community at .conf24!

You may have seen mention of the .conf Community Zone 'round these parts and found yourself wondering what ...