|
In order to identify web content that hasn't been pulled in a while, I thought I would use Splunk since a) my Apache logs are in Splunk already, and b) I can easily create a scripted input to get a list of files under the various directories. Initially, I'm going to do this for our .cgi's and .pl files So, I have one index for the standard Apache access logs. I do have a field extraction for this called file. More on that later. I then created a scripted input that runs once per day to pull a list of files under our content sub-directory (we're talking 13,000+ files). An example of the input looks like this:
I can do a query that looks like this:
Which only returns 36 out of the 125 .pl / .cgi files out there, which is not exactly what I'm looking for. Basically, I'm looking to take a list of files from a specific query, check to see how many of those files are found in the Apache logs, including ones with zero results. I've spent a couple of days trying to get this working, and I haven't been able to. Any ideas on how to do this? Is it even possible? |
|
Your best strategy here is to use an OR search, to load data from both prod_ohs_logs and prod_coldfusion_files at the same time and see, for each file, whether it is in one, the other or both of the indexes. For example:
Great, it's a starting point. I need to figure out how to only list the files that have 1 as the results under prod_coldfusion_files..
(30 Sep '10, 15:31)
Brian Osburn
1
Just add "... | search prod_condfusion_files=0" to your search.
(30 Sep '10, 16:06)
Stephen Sorkin ♦
Pure awesomeness Stephen. Thank you!
(30 Sep '10, 17:27)
Brian Osburn
|
