Hi,
I have a forwarder which collects WMI (cpu, disk, processes, memory) from ~150 servers (win2008R2, win2003). In splunk-launch.conf I have several variables for groups of these servers which looks like:
SPLUNK_SRVRGROUP1=SERVER1,SERVER2...SERVERN
SPLUNK_SRVRGROUP2=SERVER1,SERVER2...SERVERN
so in wmi.conf:
server = $SPLUNK_SRVRGROUP1,$SPLUNK_SRVRGROUP2
Sometimes forwarder stops pull/collect data for some random servers in the group. And I don't know why - there is no any error information in log files. After restarting the splunk service everything is OK, but after some time (1 day, 3 days - random) it happens again
The version of splunk indexer/forwarder is 4.2.2
Any suggestions?
The "problem" was in max_retries_at_max_backoff
parameter of wmi.conf
max_retries_at_max_backoff = <integer>
* Once max_backoff is reached, tells Splunk how many times to attempt to reconnect to the WMI provider.
* Splunk will try to reconnect every max_backoff seconds.
* If reconnection fails after max_retries, give up forever (until restart).
* Defaults to 2.
I've set it to 30000. Will see.
The "problem" was in max_retries_at_max_backoff
parameter of wmi.conf
max_retries_at_max_backoff = <integer>
* Once max_backoff is reached, tells Splunk how many times to attempt to reconnect to the WMI provider.
* Splunk will try to reconnect every max_backoff seconds.
* If reconnection fails after max_retries, give up forever (until restart).
* Defaults to 2.
I've set it to 30000. Will see.