I'm deploying a Splunk environment with two universal forwarders and two indexers. I've got a primary indexer and a backup one. The backup will be manually brought up if the primary goes down. So i'll have only one indexer up at any time. Both forwarders are up at all times.
In case of failure of the primary indexer, i'll need a human intervention to launch the second one. During this time, my forwarders will still be sending logs (and a lot).
To avoid loosing logs, i'm using a queue on the forwarders. That's what my outputs.conf (on both forwarders) look like (141 is my primary indexer, 142 is the backup) :
defaultGroup = lb
[tcpout:lb] server=192.168.100.141:9997, 192.168.100.142:9997
autoLB = true
maxQueueSize = 4GB
dropEventsOnQueueFull = 1
If my primary indexer crashes, here's what happens : - meanwhile, the queue grows on the forwarders, - i manually bring up the backup one, - when the backup indexer is up, forwarders automatically send their logs to it (as per outputs.conf).
This is working great but i've noticed that sometimes, for no specific reason, the memory usage on the forwarder starts growing and never stops until it reaches the maximum RAM. When this happens, the splunk process is killed as it's using too much memory.
Do you know what's wrong ? What do i have to do to correct this ?
Please let me know if you need further details.
Regards ands thanks a lot in advance.
FYI : case created on Splunk Support : 87035.
I don't think this is necessarily a good strategy for your forwarders. Basically, what's happening is that your forwarders are throwing connection errors and timing out to the second Splunk indexer (because you keep it offline) and causing it to grow it's heap footprint due to it's queuing mechanism.
Personally, I would keep a second set of outputs on my forwarders that simply points the the second indexer and then manually change the outputs when you're running up your second indexer. You should see much better performance and manageability of your forwarders after doing that. Moreover, you could use something like Tivoli to orchestrate this whole process for you.
answered 14 Jun '12, 06:47