Refine your search:

I've created a custom command in python that needs to view an entire set of events as a single batch, because it's comparing subsequent events. Unfortunately, Splunk is sending events to the custom command in chunks of <= 50,000 events. The commands.conf has streaming = false. Setting run_in_preview = false only changes the way the results are displayed, as expected.

In case it's relevant, the command is running on a search head which receives events from several distributed search nodes.

Here's the basic code -- run() is invoked by a minimal plugin "manager":

class RemoteLogins( SplunkPlug ):
  def run( self, events, keywords, options ):
    out_events = []
    if not events:
      intersplunk.outputResults( out_events )
      return
    now = datetime.now()
    with open( "/opt/splunk/var/log/test.log", "a" ) as f:
      f.write( "Running at %s with %s events\n" % ( now, len( events ) ) )
    for related_events in self.related( events ):
      self.find_overlap( related_events, out_events )
    with open( "/opt/splunk/var/log/test.log", "a" ) as f:
      f.write( "Ending %s with %s results\n" % ( now, len( out_events ) ) )
    intersplunk.outputResults( out_events )

When invoked by a single splunk search, these results are generated:

Running at 2011-08-27 16:56:18.619245 with 25 events
Ending 2011-08-27 16:56:18.619245 with 0 results
Running at 2011-08-27 16:56:19.078111 with 2942 events
Ending 2011-08-27 16:56:19.078111 with 0 results
Running at 2011-08-27 16:56:20.900458 with 19980 events
Ending 2011-08-27 16:56:20.900458 with 1 results
Running at 2011-08-27 16:56:31.590848 with 50000 events
Ending 2011-08-27 16:56:31.590848 with 4 results
Running at 2011-08-27 16:56:55.376255 with 50000 events
Ending 2011-08-27 16:56:55.376255 with 3 results

Once the search is complete, only the 3 results from the last batch of events is shown.

For completeness, here's commands.conf:

[py]
type = python
filename = py.py
streaming = false
run_in_preview = false
maxinputs = 0

So, is there any way aside from the settings in commands.conf to really convince Splunk not to stream events into a custom command? Maybe an intermediate command I could insert into the pipeline?

asked 27 Aug '11, 15:35

mute_dammit's gravatar image

mute_dammit
13
accept rate: 0%

edited 28 Aug '11, 17:09


2 Answers:

Well, either someone else can spot what's missing or can confirm that it's a bug, but for the time being an easy way to make sure no streaming events make it to your command is just to put a non-streaming command in front of it.

`<your search> | table * | py`

should do it.

link

answered 27 Aug '11, 19:47

sideview's gravatar image

sideview ♦
25.5k3543
accept rate: 46%

edited 27 Aug '11, 19:48

Adding the non-streaming command does keep splunk from sending multiple chunks of events to the custom script. Unfortunately, only the last 50k events are sent. Since I've asked for unlimited inputs, this is pretty sure to be a bug.

(28 Aug '11, 17:16) mute_dammit

It didn't work for me. I am using a dedup command in my search and the search scanned roughly 4000 events and the result set size is only 11 events.

(08 Mar '12, 14:50) asingla

scanCount numbers below 5000 or 10000 can be pretty misleading. Splunk will pretty much always scan at least that deep into any search before potentially shutting down the stream, because that's the sort of "chunk" size that the search process uses when talking to the index. Or so I understand.

(08 Mar '12, 14:56) sideview ♦

mute_dammit: yea once you're out of the streaming portion I'm afraid 50,000 is the default in limits.conf. It can be changed although it's to be filed under "do at your own risk"...

(08 Mar '12, 14:58) sideview ♦

That behavior doesn't seem right to me, but streaming=false was never intended to cause splunk to deliver all the events regardless of event quantity to the search command. To my understanding, it is supposed to influence how the search machinery thinks, and encourage it to only give one chunk to the search command.

Essentially, you could view this flag as "I'm only designed for small datasets".

In order to make your tool work over large datasets, you'll want to be streaming, and you'll want to be able to handle the data chunk by chunk.

For some problems that opens up an entire new topic about how you can efficiently store your state, and is it valid to emit nothing until the last call, and how do you know when it's the last call..

link

answered 12 Oct '11, 13:04

jrodman's gravatar image

jrodman ♦
7.0k21027
accept rate: 41%

Hi, Any idea how we can determine which is the last call? So that we can collate all the results? so that we emit nothing until the last call?

(25 Apr '12, 23:28) mridus
Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×57

Asked: 27 Aug '11, 15:35

Seen: 890 times

Last updated: 25 Apr '12, 23:28

Copyright © 2005-2012 Splunk Inc. All rights reserved.