|
Hi everyone, I'm deploying a Splunk environment with two universal forwarders and two indexers. I've got a primary indexer and a backup one. The backup will be manually brought up if the primary goes down. So i'll have only one indexer up at any time. Both forwarders are up at all times. In case of failure of the primary indexer, i'll need a human intervention to launch the second one. During this time, my forwarders will still be sending logs (and a lot). To avoid loosing logs, i'm using a queue on the forwarders. That's what my outputs.conf (on both forwarders) look like (141 is my primary indexer, 142 is the backup) : [tcpout] defaultGroup = lb [tcpout:lb] server=192.168.100.141:9997, 192.168.100.142:9997 autoLB = true maxQueueSize = 4GB dropEventsOnQueueFull = 1 If my primary indexer crashes, here's what happens : - meanwhile, the queue grows on the forwarders, - i manually bring up the backup one, - when the backup indexer is up, forwarders automatically send their logs to it (as per outputs.conf). This is working great but i've noticed that sometimes, for no specific reason, the memory usage on the forwarder starts growing and never stops until it reaches the maximum RAM. When this happens, the splunk process is killed as it's using too much memory. Do you know what's wrong ? What do i have to do to correct this ? Please let me know if you need further details. Regards ands thanks a lot in advance. FYI : case created on Splunk Support : 87035. |
|
I don't think this is necessarily a good strategy for your forwarders. Basically, what's happening is that your forwarders are throwing connection errors and timing out to the second Splunk indexer (because you keep it offline) and causing it to grow it's heap footprint due to it's queuing mechanism. Personally, I would keep a second set of outputs on my forwarders that simply points the the second indexer and then manually change the outputs when you're running up your second indexer. You should see much better performance and manageability of your forwarders after doing that. Moreover, you could use something like Tivoli to orchestrate this whole process for you. 2
Thanks for your answer. I can understand that it's not standard deployment. I thought of having two outputs.conf files and switching to the second one if indexer number 1 goes down. From my understanding, the AutoLB feature was the solution to avoid that. So I gave it a go this way. What i don't get is that it works great for a while and then for no specific reason, memory usage starts growing. I mean, it is able to deal with huge amount of events during several days and then in the middle of the night when there are not a lot of events, the problem shows up again. Thoughts ?
(14 Jun '12, 07:18)
Mahieu
1
I honestly can't answer to the 'why' it grows in heap, but I can say that I've seen this behavior before on some of my larger implementations where indexers go down, for whatever reason, and forwarders tend to grow in size because they have an expectation that their list of indexers will be available. And will therefore grow in size. I would be more interested on what's really happening on the systems that you're using for forwarders as well. Have you created a ticket for this and uploaded a diag?
(14 Jun '12, 08:26)
Lamar
2
Yes, i've crated a case, it's number 87035. I'll upload a diag soon, haven't done it yet. Do you suggest i change my outputs.conf ?
(15 Jun '12, 01:13)
Mahieu
1
That's what I would do personally. If your forwarders still behave that way then you may actually have an issue with them.
(15 Jun '12, 06:39)
Lamar
2
Just uploaded the diag to the support case. Changed my outputs.conf to use a single indexer. I'll have to manually edit the outpus.conf in case of a failure. That's far from perfect but if the load balancing config is the problem, i think i have no other choice.
(18 Jun '12, 03:17)
Mahieu
Hi there, any thoughts on this issue ? Thanks in advance.
(19 Jun '12, 06:43)
Mahieu
showing 5 of 6
show 1 more comments ▼
|