Hello,
We have determined that in Splunk 5.0, active UDP inputs cause the main splunkd process to leak memory. The rate of this memory leak appears to be proportional to the rate of data that is being received on the UDP input(s). For that reason, it is possible for a very active UDP input to cause splunkd to eventually exhaust all available memory on the host.
The bug that references this behavior is SPL-58075 and has been added to the list of known issues for Splunk 5.0.
We are actively working towards the release of a fix to this issue in the next few days.
In the meantime, there are four possible work-arounds that we can propose:
Install a 4.3.4 universal forwarder on the same machine that is currently receiving the UDP traffic and migrate the UDP inputs to that instance. The universal forwarder should be configured to send all incoming data to the indexer on the same host.
Where possible customers should switch from sending data via UDP to sending via TCP as it helps to reduce potential data loss and is more in line with best practices for sending network data to Splunk. Be advised that this input removes syslogd event augmentation (e.g. timestamp and hostname pre-pending)
Schedule a restart of the impacted instance(s) at regular intervals to prevent memory exhaustion. This solution is not strongly recommended as it can introduce data loss and kick users out of the system.
Disable UDP inputs. This solution is not recommended since all data from the sending host(s) will be lost.
Hello,
We have determined that in Splunk 5.0, active UDP inputs cause the main splunkd process to leak memory. The rate of this memory leak appears to be proportional to the rate of data that is being received on the UDP input(s). For that reason, it is possible for a very active UDP input to cause splunkd to eventually exhaust all available memory on the host.
The bug that references this behavior is SPL-58075 and has been added to the list of known issues for Splunk 5.0.
We are actively working towards the release of a fix to this issue in the next few days.
In the meantime, there are four possible work-arounds that we can propose:
Install a 4.3.4 universal forwarder on the same machine that is currently receiving the UDP traffic and migrate the UDP inputs to that instance. The universal forwarder should be configured to send all incoming data to the indexer on the same host.
Where possible customers should switch from sending data via UDP to sending via TCP as it helps to reduce potential data loss and is more in line with best practices for sending network data to Splunk. Be advised that this input removes syslogd event augmentation (e.g. timestamp and hostname pre-pending)
Schedule a restart of the impacted instance(s) at regular intervals to prevent memory exhaustion. This solution is not strongly recommended as it can introduce data loss and kick users out of the system.
Disable UDP inputs. This solution is not recommended since all data from the sending host(s) will be lost.
We disabled the Splunk Deployment Monitor app and our memory consumption has been flat ever since.
@richgalloway: Are you running the most recent 5.x version (5.0.2 as of now)? I would typically recommend to use the S.o.S app to track the resource usage of Splunk processes and establish a clear pattern. With that information, you'll want to open a support case to get this investigated further.
We're also running out of memory, but have zero UDP inputs configured. Could there be another source of the leak? We increased our VM from 8GB to 16GB and splunkd used it up in two days.
What types of input do you have, and are you running any special apps? You should probably run a "splunk diag" and include the diag file to the support case.
/k
I upgraded Splunk 4.3.4 to v5 and have a handful of event sources. Apart from restarting the service, the performance hasn't changed a bit. Response time is just the same as well.
yes splunk 5.0 uses more ressources :
But in our case, it may also be a configuration issue, (deployment server, apps, backfill ...)
I'm seeing Similar issues on our Heavy Forwarders. I'll follow the above suggestions and see if that sheds any light!
I am seeing this same issue on one of two indexers after the 5.0 upgrade.
Memory is leaking on one of them (SplunkD process) until fully consumed.
There are some configuration differences between the two. I will troubleshoot this issue some more.
Hi,
I also see a steady increase of memory usage on our search head since the upgrade to v5.0. The indexers and universal forwarders are running fine, though.
To investigate a memory leak,
Then create a support case and attach diag and screenshots, also precise if you turned on special features of 5.0 (like replication or search acceleration)
I have to agree something is causing rapid loss of memory (and not my turning 50 this December!).
We are seeing the same thing since 5.0
4gb system, not that many inputs.... Was running smooth on 4.3.4 but since upgrade exhausted all memory and froze.... Added 2gb extra RAM and while the server started up fine, within 1/2 hour all the ram had been consumed and it was unusable again.
we have had to reboot our V5 indexer twice in the last 24 hours. System crawls to a halt, and looking at the stats, the memory grows in a straight line until all is consumed, and the system effectively stops. We are only taking syslogs from 6 firewalls, and event logs from two windows boxes. The indexer has 8Gb of memory available. It looks like it might be a memory leak.