Somewhat odd behavior, my deployment server stopped generating log entries. I've been using the searches from http://answers.splunk.com/questions/2038/whats-the-best-way-to-monitor-deployment-activity to monitor activity but the entires have pretty much stopped. Specifically, these searches don't return anything in the past week or so
index="_internal" sourcetype="splunkd" source=*metrics.log Component="DeploymentMetrics"
index=_internal sourcetype=splunkd Component="DeployedApplication"
source="/opt/splunk/var/log/splunk/splunkd_access.log" index=_internal /services/broker/phonehome | rex "/services/broker/phonehome/connection_[\\d|\\.]*_\\d*_(?<src>.*)_(?<client>.*)_" | dedup client
Yet, my deployment server still works. I can see new apps being deployed, the forwarders are restarting, etc. This was helpful because we have a forwarder deployed on a machine that I cannot access. I can see via the rest interface that the apps that should have been deployed have not been deployed so I want to make sure it can at least phone home.
Any ideas why my logs went silent?
Starting in Splunk 4.1 a normal (non lightweight/LWF) forwarder will not forward events from indexes who's names begin with _ (such as _internal), except for _audit. As such, 'Installing app' events, like those cited above, will not get forwarded to your indexer. The source of this behavior are the filters in $SPLUNK_HOME/etc/system/default/outputs.conf (NOTE: Please do not edit files in the /default/ directory).
To override this behavior, add these lines to $SPLUNK_HOME/etc/system/local/outputs.conf:
[tcpout]
forwardedindex.filter.disable = true
Assuming your running 4.0, please try these two searches. I have a support case open on the issue of deployment metrics disappearing randomly, and I'm wondering if you are having the same issue.
index="_internal" sourcetype="splunkd" DeploymentMetrics "event=install" "status=ok" | timechart count by appName
index="_internal" sourcetype=splunkd DeployedApplication "Installing app:" | rex "app: (?<appName>[\w._-]+?) to location: (?<location>.*)$" | timechart count by appName
I'm curious to see if search 1 works, but search 2 does not. (Note that I have all my _internal
events forwarded to a central server, which is our deployment server; so I can run these searches on a single splunk instance.)
Update: oreoshake, since you've been able to confirm that the you are also seeing a discrepancy between the deployment servers' metrics, and the deployment client install logs, would you be willing to provide some sample logs to Splunk support? (Pretty sure if we have multiple people reporting the same issues there's a better chance the issues an be found and corrected.)
My open Splunk support Case # 43023 (Accuracy of DeploymentMetrics, ref:00D49oyL.5004AS90Y:ref), you should be able to include that info and they should be able to tie the two issues together. Support is looking for sample splunkd.log files (and probably also the metrics.log files) from both a deployment client and server. Right now I don't have any example deployments in my logs since these get rotated so quickly, and it's been a few days since my last deploy.
Chris, I'm not sure what your asking exactly. I don't see any outputs.conf
entries in any of my unix
app folders. (I'm assuming Neil=oreoshake?) Is there an upgrade that I should be trying? I'm still seeing the same sporadic behavior. (I am in the process of upgrading my forwarders to 4.1.3, so maybe that will make a change eventually.)
Lowell how's your DS metrics now? I had a bug open due to both of your cases but Neil's was solved due to a rogue outputs.conf in the unix app.
I reproduced the issues today by publishing a trivial config changes and sent in the corresponding log files to splunk support. Hopefully this issues can move forward. Do you find this problem is happening for both windows and Linux deployment clients? I'm only seeing this problem on my Linux machines.
Will do, I'll reference your case number
Any chance you would be willing to provide some sample logs to splunk support? I put some reference info about the open case I have with them about this problem. (Although in my case, the metrics events seem to be intermittent rather than completely missing, which may be an easier to track down.)
Still on 4.0.11
The first search stopped returning results around the time of my other ones, the second search returns results probably because it comes from the forwarders themselves.