|
Hey all: I'm very interested in setting Splunk up to have it monitor all of my logs. One of such main requirements are Apache log files on my web host. I've got access to these files remotely via HTTP (htaccess protected) and FTP. Due to how the log files are handled, their file names include the date. Due to these servers belonging to ISPs, I cannot install the normal Splunk forwarding agents. The same scenario exists with a client that I'm looking to set Splunk up for (minus the HTTP access to the logs). I also have Cron access on my personal web host, as well as CGI capabilities, however I would rather have this be setup as a pull type solution so that I don't need to start opening things up on the firewall. I'm sure that the above scenario is pretty common, so I would imagine one of the Splunk aficionados out there has already tackled this many times. Many thanks in advance, Greg |
|
I think the biggest thing you have to decide how up-to-date you need your log data to be. Here are some options starting from the most "live" to the most infrequent.
Here is a quick example of doing a recursive copy using
NOTES: This assumes you've installed a Hi Lowell, Thanks for your clear and insightful response. A couple of added questions / clarifications however... The Forwarder: Can it be setup to run simply using cron? I can only run things using cron (or CGI of course) on my ISP, but I thought that perhaps there was a way to execute a "collect data and report" type function for the Forwarder. Re: what platform. Oops... sorry for not including this point! The web servers are Unix, and the servers running Splunk are Windows (test machine XP Home OEM, production server will be Windows 2003 Server R2).
(26 Apr '10, 19:15)
geva
No, I don't believe the forwarder can be setup to run via cron. Even if you could rig up something like this, it's probably not a good idea. Running on windows will make the rsync option more difficult (and jrodman's
(26 Apr '10, 19:36)
Lowell ♦
|
|
Lowell's 5 star answer gives all the key points to think about, but I might consider using a simple shellscript wrapping wget -C or curl to get the logfiles updated partially as they're built out on the remote site. This is somewhat dependent upon the provider having reliable behavior, but they probably do. They are pretty reliable, so no prob there. What would this accomplish? (I'm not a *nix expert)
(26 Apr '10, 19:21)
geva
Just a minor correction. I believe Jrodman means
(26 Apr '10, 19:34)
Lowell ♦
geva, this would be a possible replacement for the
(26 Apr '10, 19:41)
Lowell ♦
|
|
Hi Lowell, Thanks for your clear and insightful response. A couple of added questions / clarifications however... The Forwarder: Can it be setup to run simply using cron? I can only run things using cron (or CGI of course) on my ISP, but I thought that perhaps there was a way to execute a "collect data and report" type function for the Forwarder. Re: what platform. Oops... sorry for not including this point! The web servers are Unix, and the servers running Splunk are Windows (test machine XP Home OEM, production server will be Windows 2003 Server R2). There is a very strong likely hood that I will end up using the option to pull the log files daily/weekly. Perhaps I'm lazy, or expecting way more from Splunk than it can provide, but is there anyway to setup such scheduled transfers easily? I could of course find some tool to do recurring scheduled FTP downloads, but this starts to make things more complex and kludgey. Thanks for your help, Greg Greg, looks like we are going to need some more info to help you out. First, do you have
(26 Apr '10, 19:50)
Lowell ♦
|
|
Hi Lowell, OK - so I've been in touch with the other ISP, and now better understand the options on both ISPs. Both provide SSH access (I'm working on getting it activated on the one). My personal provider provides scheduled tasks, the other does not. Both have CGI access for things like PHP/PERL. Both have FTP access to the log files. Both automatically name, rotate, and GZIP the log files. The one provider has the log files appear under domain.com/logs - access controllable by permissions and .htaccess. The other keeps the log files in ~/logs, where ~/www is where the domain web root is. I do not have any Unix systems; and the boss does not want any. So Cygwin could be a good option to use wget --continuous. However I may be better off trying to convince the boss that having one Linux box that is only doing monitoring is not a risk to the company. It is seeming as if this option is becoming more and more interesting. I am not concerned with the data being encrypted. It's just web site traffic data. No forms or anything like that exist on the site. I've checked for rsync, wget, and curl on the hosting providers. My personal host has curl, but that is it. Neither have wget or rsync. Cheers, Greg |