Splunk Search

"Unable to distribute to peer" -- adjustable timeout?

batzel
Engager

I'm getting quite a few "Unable to distribute to peer..." messages when searching in splunk.

The reasons given tend to be '...because peer has status = "Down".' or Authentication Failed.

Sometimes just reloading the page will get through to the search peers. Sometimes it gives me that error a number of times in a row. But I've verified that the peer is not down, and I can connect to it from the search head with no problems.

The splunk servers are in different datacenters, and all I can think of is that there's a little bit of network lag and the connections aren't being made quickly enough?

Is there a config option to alter whatever timeout there is for this? Am I on the right track, or can someone suggest what else to look at?

drrushi_splunk
Splunk Employee
Splunk Employee

Additionally splunkd_access.log on the indexing peer will show the POST requests to this endpoint: /services/admin/auth-tokens
If these requests are taking longer than 10000ms then you are hitting the default timeout (authTokenReceiveTimeout).

0 Karma

raziasaduddin
Path Finder

Where else would we see these authToken related messages in the log? The indexers are still intermittently down and I cannot figure out why.

I tried:
[distributedSearch]
authTokenConnectionTimeout = 20
authTokenReceiveTimeout = 30
authTokenSendTimeout = 30

I still see this error after a monute or so:

Unable to distribute to peer named BLAH at uri https://BLAH:8089 because replication was unsuccessful. replicationStatus Failed

0 Karma

drrushi_splunk
Splunk Employee
Splunk Employee

The timeout settings for the authentication token exchange between search-head and peers are exposed now as configurable values in distsearch.conf (since v4.3.6):

authTokenConnectionTimeout =
* Maximum number of seconds to connect to a remote search peer, when getting its auth token
* Default is 5

authTokenSendTimeout =
* Maximum number of seconds to send a request to the remote peer, when getting its auth token
* Default is 10

authTokenReceiveTimeout =
* Maximum number of seconds to receive a response from a remote peer, when getting its auth token
* Default is 10

drrushi_splunk
Splunk Employee
Splunk Employee

If you don't see any offending requests on the peer and the auth status is still failed then the request is not able to make to the peer at all. Here you may want to investigate general connectivity to the peer and adjust authTokenConnectionTimeout and authTokenSendTimeout.
For failed connections check the splunkd.log on the search-head for Warn messages from UserManagerPro component:

WARN UserManagerPro - Unable to connect to peeruri=

0 Karma

Ayn
Legend

There is indeed. Have a look at distsearch.conf (http://www.splunk.com/base/Documentation/latest/Admin/Distsearchconf ), particularly the following parameters:

connectionTimeout = <int, in seconds>
* Amount of time in seconds to use as a timeout during search peer connection establishment.

sendTimeout = <int, in seconds>
* Amount of time in seconds to use as a timeout while trying to write/send data to a search peer.

receiveTimeout = <int, in seconds>
* Amount of time in seconds to use as a timeout while trying to read/receive data from a search peer.

The defaults for these (and other) settings are set in $SPLUNK_HOME/etc/system/default/distsearch.conf.

raziasaduddin
Path Finder

Did you ever solve this?

0 Karma

mslvrstn
Communicator

We are working this case with support. They've said

After some further inquiries with our Dev team, I've learned that the timeout settings in distsearch.conf will not actually have any effect on the problem.
It seems that what is happening is that we are timing out at time, while trying to read the auth token from the peer (Unable to connect to peer uri...) . The httpclient timeouts that affect this behavior are actually hardcoded and NOT configurable.

connectionTimeout = 5;
sendTimeout = 10;
rcvTimeout = 10;

There isn't one setting exposed which you could use to control such timeouts.

Get Updates on the Splunk Community!

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer at Splunk .conf24 ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...

Share Your Ideas & Meet the Lantern team at .Conf! Plus All of This Month’s New ...

Splunk Lantern is Splunk’s customer success center that provides advice from Splunk experts on valuable data ...