|
We were running some load over the weekend, and ran into an issue where one of our Forwarder nodes went unresponsive. We are now attributing it to a large mazQueueSize in outputs.conf, all Indexer nodes unreachable, and splunkd consuming all available memory. In the problem case, our maxQueueSize was set to 1000000 and a splunkd process was (in a recorded snapshot) seen consuming 3GB: maxQueueSize=1000000 8947 root 15 0 3482m 3.1g 7300 S 2.0 39.4 0:35.23 splunkd On investigation, I restarted splunkd with varying values for mazQueueSize - 10,000; 1,000; and 100 with corresponding reduction in memory consumption:
A few questions:
Thanks, |
|
Yes that behaviour is expected. Generally, if your deployment is performing well, there's no reason to increase this beyond the default, as it should never even get as high as 1000. If you were receiving UDP data on your forwarder however, and it was imperative you captured as much as possible when this happens, that would be a reason to increase it to a high number. In the case that data retention was a priority however, I would question the suitability of using UDP in the first place. |
