Splunk Search

Pros and Cons: External lookup script vs custom search command?

Lowell
Super Champion

What are the pros and cons to using an external lookup script vs a custom search command when the purpose is simply to augment your results with additional fields based on a given field?

Here is my scenario: I have a field that contains a hexadecimal value that contains several bit-level encoded fields (5 single bit flags, some multi-bit lookups, and a multi-bit value). I've written a python function that will take in the hex field and return a dictionary of new fields, and now I'm wondering which approach is better.

1 Solution

Lowell
Super Champion

I've tried both approaches and found the following:

External lookups:

  • Pro: Seems to be slightly faster.
  • Pro: Less inputs for the script to handle (since only unique values are passed to the script)
  • Con: Can't natively handle multi-valued field. (You can return a ";" and then split them using eval, but that may not always work.)
  • Con: Less flexible. For example, input to output must be deterministic (or static); which works for my given scenario.
  • Con: No way to pass in authentication therefore making it difficult to make REST calls to lookup configuration settings stored in Splunk or username/password info for remote resources, for example.
  • Pro: You can setup a new external lookup script without restarting splunk. (I had to to trick splunk into reloading the metadata file to pick up my [searchscripts/my_lookup.py] entry, since I don't think you can setup these permissions via the UI yet.)
  • Pro: Lookup can be setup to automatically extracted based on source/sourcetype/...

Custom Search Command

  • Pro: Full flexibility. Access to all fields
  • Pro: Can return multi-value fields
  • Con: Speed is a tad bit slower. (I found that enabling "streaming" did improve performance by 6x on my test query, but it's still slightly slower than the "lookup" approach). Also take a look at the v2 interface and the Python SDK for example scripts.
  • Con: Can't be setup to run automatically.
  • Con: You have to deal with getting everything setup properly via config files. Enabling getinfo does let you do more of this without as many restarts.

Please let me know if you have additional thoughts or if you find any mistakes in either of these lists.

Practically speaking, it's a good idea to wrap all of this in a macro, that way if you ever change your mind about which approach to use there are no changes to existing searches. And, if your new approach breaks, you can switch back quickly.

View solution in original post

Lowell
Super Champion

I've tried both approaches and found the following:

External lookups:

  • Pro: Seems to be slightly faster.
  • Pro: Less inputs for the script to handle (since only unique values are passed to the script)
  • Con: Can't natively handle multi-valued field. (You can return a ";" and then split them using eval, but that may not always work.)
  • Con: Less flexible. For example, input to output must be deterministic (or static); which works for my given scenario.
  • Con: No way to pass in authentication therefore making it difficult to make REST calls to lookup configuration settings stored in Splunk or username/password info for remote resources, for example.
  • Pro: You can setup a new external lookup script without restarting splunk. (I had to to trick splunk into reloading the metadata file to pick up my [searchscripts/my_lookup.py] entry, since I don't think you can setup these permissions via the UI yet.)
  • Pro: Lookup can be setup to automatically extracted based on source/sourcetype/...

Custom Search Command

  • Pro: Full flexibility. Access to all fields
  • Pro: Can return multi-value fields
  • Con: Speed is a tad bit slower. (I found that enabling "streaming" did improve performance by 6x on my test query, but it's still slightly slower than the "lookup" approach). Also take a look at the v2 interface and the Python SDK for example scripts.
  • Con: Can't be setup to run automatically.
  • Con: You have to deal with getting everything setup properly via config files. Enabling getinfo does let you do more of this without as many restarts.

Please let me know if you have additional thoughts or if you find any mistakes in either of these lists.

Practically speaking, it's a good idea to wrap all of this in a macro, that way if you ever change your mind about which approach to use there are no changes to existing searches. And, if your new approach breaks, you can switch back quickly.

Get Updates on the Splunk Community!

Join Us for Splunk University and Get Your Bootcamp Game On!

If you know, you know! Splunk University is the vibe this summer so register today for bootcamps galore ...

.conf24 | Learning Tracks for Security, Observability, Platform, and Developers!

.conf24 is taking place at The Venetian in Las Vegas from June 11 - 14. Continue reading to learn about the ...

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...