I have an input setup to monitor a folder where new log files get generated daily. Today however, a bad process generated a file that is 4GB in size, the process that generated it was stuck in an infinite loop, generating the same log output over and over, until we noticed and killed the process. That shot up my splunk limit for the day in one pass. I have 2 questions:
asked 20 May '11, 21:38
First, there is only one way to delete a file from an index and reclaim the space: Use the Splunk clean command to remove everything from the index and then re-index just what you want. This isn't usually a practical solution for production environments.
AFAIK, there is no way to tell Splunk to index (or not to index) inputs based on their size. However, you could write a script that examines file sizes on disk and outputs that info. If you index the output of the script, you could easily write an search that would send an alert based on the file size. How hard this would be -- well, that depends on your programming skills.
You might also want to look at the Deployment Monitor that is part of Splunk 4.2. The Deployment Monitor has some dashboards and alerts that you can use to notify you if Splunk is indexing "too much" of a particular source or sourcetype, etc. Even if the Deployment Monitor is not exactly what you want, you can look at the searches and alerts it uses, as a starting point for writing your own.