Hey all,
I have a system that is generating a log that I need to have indexed and pull into Splunk. The system is on several individual boxes--so it spits out output and was set up with 3.4.5 to go to our central server.
The only problem is that it is now taking 10+ minutes to get from the system to the saved searches on the Splunk server. The system times are in sync, and there's no time zones to screw up the timing. The entries appear in order so it doesn't appear to be a problem with entries being lost or anything.
I am thinking that by adjusting some of the configuration, I can help reduce the problem. I saw these things that looked somewhat promising:
If anyone else has any suggestions on how we might improve the speed from soup to nuts I'm very interested in hearing about it.
Thanks!
By default, light (universal) forwarders will usually limit themselves to transferring data at a maximum of 256 kbps. In these scenarios, increasing the limit may help with more real-time results.
You should find out the true delay of the data and check for any indexing problems.
If Splunk is behind with respect to indexing, you will see a delay like this. To check if Splunk is behind on indexing, look for blocked or filled queues:
index=_internal source=*metrics.log blocked
OR
index=_internal source=*metrics.log group=queue | timechart avg(current_size) by name
If you have consistently blocked queues or they are filled (1000 is the max value) then you will need to debug why Splunk is queue-ing data.
Back again with an update.....
We're down to about 5 to 7 minutes of delay in getting from the log to the forwarder to the index. Our times are all synchronized, so there are no issues from there.
We are looking at settings and tweaking. Ideally, we want it to go down to about 3 minutes.
Thanks for allowing me to pick your great brains.
Wwhitener
Thanks everyone.
We're looking at the issue of possible time lags and time zone difficulties right now.
Is there any reason why you couldn't update the system to a more up to date universal forwarder?
The system footprint will be less (although it still clearly shouldn't take as long as you have indicated).
The number of files shouldn't slow down the forwarder due to its system of CRC checking a file, have you looked in the splunkd.log to see if there are any errors or issues happening?
I guess with stuff like this you want to verify how often these logs are being written, be sure that they are updating very frequently. Perhaps even do a manual run of the log to track through the system?
If it is a large log file you can play around with the maxkbps however this shouldn't "restrict" events from showing up, it may delay some but I suppose if it is a large file this could have an adverse effect if too low.
You're doing extractions on the data before sending? this is going to slow things down and I believe the speed of this has been improved on 4+ but in most circumstances it is best to simply define a target index for the data and let the indexer handle the rest.
You probably have too many files on the forwarder and Splunk is getting bogged down in the housekeeping of checking each one of them for changes (changes that will probably never happen). Try moving/deleting the old files and see if this helps.