|
I've created a custom command in python that needs to view an entire set of events as a single batch, because it's comparing subsequent events. Unfortunately, Splunk is sending events to the custom command in chunks of <= 50,000 events. The commands.conf has streaming = false. Setting run_in_preview = false only changes the way the results are displayed, as expected. In case it's relevant, the command is running on a search head which receives events from several distributed search nodes. Here's the basic code -- run() is invoked by a minimal plugin "manager":
When invoked by a single splunk search, these results are generated:
Once the search is complete, only the 3 results from the last batch of events is shown. For completeness, here's commands.conf:
So, is there any way aside from the settings in commands.conf to really convince Splunk not to stream events into a custom command? Maybe an intermediate command I could insert into the pipeline? |
|
Well, either someone else can spot what's missing or can confirm that it's a bug, but for the time being an easy way to make sure no streaming events make it to your command is just to put a non-streaming command in front of it.
should do it. Adding the non-streaming command does keep splunk from sending multiple chunks of events to the custom script. Unfortunately, only the last 50k events are sent. Since I've asked for unlimited inputs, this is pretty sure to be a bug.
(28 Aug '11, 17:16)
mute_dammit
It didn't work for me. I am using a dedup command in my search and the search scanned roughly 4000 events and the result set size is only 11 events.
(08 Mar '12, 14:50)
asingla
scanCount numbers below 5000 or 10000 can be pretty misleading. Splunk will pretty much always scan at least that deep into any search before potentially shutting down the stream, because that's the sort of "chunk" size that the search process uses when talking to the index. Or so I understand.
(08 Mar '12, 14:56)
sideview ♦
mute_dammit: yea once you're out of the streaming portion I'm afraid 50,000 is the default in limits.conf. It can be changed although it's to be filed under "do at your own risk"...
(08 Mar '12, 14:58)
sideview ♦
|
|
That behavior doesn't seem right to me, but streaming=false was never intended to cause splunk to deliver all the events regardless of event quantity to the search command. To my understanding, it is supposed to influence how the search machinery thinks, and encourage it to only give one chunk to the search command. Essentially, you could view this flag as "I'm only designed for small datasets". In order to make your tool work over large datasets, you'll want to be streaming, and you'll want to be able to handle the data chunk by chunk. For some problems that opens up an entire new topic about how you can efficiently store your state, and is it valid to emit nothing until the last call, and how do you know when it's the last call.. Hi, Any idea how we can determine which is the last call? So that we can collate all the results? so that we emit nothing until the last call?
(25 Apr '12, 23:28)
mridus
|