I saw a similar issue;
Key Errors:
[-failing tgt-]# tail /opt/splunk/var/log/splunk/splunkd.log -f
04-28-2019 17:18:50.639 +0000 ERROR S2SFileReceiver - event=onFileClosed replicationType=eJournalReplication bid=unix~564~3980C11F-0463-420B-8584-F58CF055EC0E state=eComplete src=B78B7685-1AEF-477F-B50C-BB65C1633777 bucketType=warm status=failed err="bucket is already registered, registered not as a streaming hot target (SPL-90606)"
04-28-2019 17:18:50.639 +0000 WARN S2SFileReceiver - event=processFileSlice bid=unix~564~3980C11F-0463-420B-8584-F58CF055EC0E msg='aborting on local error'
04-28-2019 17:18:50.699 +0000 WARN CMSlave - event=addTargetDone bid=unix~564~3980C11F-0463-420B-8584-F58CF055EC0E but we no longer have the bucket lets remove it from the master as well
04-28-2019 17:18:50.699 +0000 WARN CMSlave - deleting bucket=unix~564~3980C11F-0463-420B-8584-F58CF055EC0E but failed to delete, reason=unable to find bucket
04-28-2019 17:18:50.882 +0000 INFO CMRepJob - job=CMReplicationErrorJob bid=unix~564~3980C11F-0463-420B-8584-F58CF055EC0E failingGuid=14E1138E-A7E7-499E-A7AD-84BC5797B164 srcGuid=B78B7685-1AEF-477F-B50C-BB65C1633777 tgtGuid=14E1138E-A7E7-499E-A7AD-84BC5797B164 succeeded
[attempting least to most resistance affect]
First try;
enable Maintenance-mode; restart affected splunkd on IDXs
You can correlate the affected GUIDs by Master execution:
/opt/splunk/bin/splunk show cluster-status
[It gives output of hostname/GUID/Site]
disable Maintenance-mode
Second try:
Master>Bucket Status
Resync the non-tgt failing bucket
Third try;
Delete the non-tgt failing bucket
Fourth try;
Delete a copy[I felt ok deleting; “copy” where RF3]
Failing tgt was not an option;
~first did failing source, [bid_GUID]
~second-then there was another option for another IDX at another site, did that, and finally it stopped bouncing and the Bucket status error /fix-up finished; [srcGuid]
Reviewing now;Tailing splunkd on all the peers; initially I was only seeing paying attention to ERRORs WRT [tgtGuid] and [bid_GUID]; now that I am looking at it in review there is an INFO log identifying the [srcGuid]
[I have seen this many times, when IT pulls the plug on my precious IDX peers; but thats why we have 3:2][usually another restart of single components in maintenance mode will bring ‘em back in the game]
This was on a 4 peer Cluster with RF3 SF2 running 7.0.3
In my situation the affected tgt was bouncing in :
/opt/splunk/bin/splunk show cluster-status and MasterGUI>Settings>IndexerClustering
Searchable NO
Status Stopped
After IT Support applied some updates and rebooted my IDXs.
... View more