Splunk Search

Pros and Cons: External lookup script vs custom search command?

Lowell
Super Champion

What are the pros and cons to using an external lookup script vs a custom search command when the purpose is simply to augment your results with additional fields based on a given field?

Here is my scenario: I have a field that contains a hexadecimal value that contains several bit-level encoded fields (5 single bit flags, some multi-bit lookups, and a multi-bit value). I've written a python function that will take in the hex field and return a dictionary of new fields, and now I'm wondering which approach is better.

1 Solution

Lowell
Super Champion

I've tried both approaches and found the following:

External lookups:

  • Pro: Seems to be slightly faster.
  • Pro: Less inputs for the script to handle (since only unique values are passed to the script)
  • Con: Can't natively handle multi-valued field. (You can return a ";" and then split them using eval, but that may not always work.)
  • Con: Less flexible. For example, input to output must be deterministic (or static); which works for my given scenario.
  • Con: No way to pass in authentication therefore making it difficult to make REST calls to lookup configuration settings stored in Splunk or username/password info for remote resources, for example.
  • Pro: You can setup a new external lookup script without restarting splunk. (I had to to trick splunk into reloading the metadata file to pick up my [searchscripts/my_lookup.py] entry, since I don't think you can setup these permissions via the UI yet.)
  • Pro: Lookup can be setup to automatically extracted based on source/sourcetype/...

Custom Search Command

  • Pro: Full flexibility. Access to all fields
  • Pro: Can return multi-value fields
  • Con: Speed is a tad bit slower. (I found that enabling "streaming" did improve performance by 6x on my test query, but it's still slightly slower than the "lookup" approach). Also take a look at the v2 interface and the Python SDK for example scripts.
  • Con: Can't be setup to run automatically.
  • Con: You have to deal with getting everything setup properly via config files. Enabling getinfo does let you do more of this without as many restarts.

Please let me know if you have additional thoughts or if you find any mistakes in either of these lists.

Practically speaking, it's a good idea to wrap all of this in a macro, that way if you ever change your mind about which approach to use there are no changes to existing searches. And, if your new approach breaks, you can switch back quickly.

View solution in original post

Lowell
Super Champion

I've tried both approaches and found the following:

External lookups:

  • Pro: Seems to be slightly faster.
  • Pro: Less inputs for the script to handle (since only unique values are passed to the script)
  • Con: Can't natively handle multi-valued field. (You can return a ";" and then split them using eval, but that may not always work.)
  • Con: Less flexible. For example, input to output must be deterministic (or static); which works for my given scenario.
  • Con: No way to pass in authentication therefore making it difficult to make REST calls to lookup configuration settings stored in Splunk or username/password info for remote resources, for example.
  • Pro: You can setup a new external lookup script without restarting splunk. (I had to to trick splunk into reloading the metadata file to pick up my [searchscripts/my_lookup.py] entry, since I don't think you can setup these permissions via the UI yet.)
  • Pro: Lookup can be setup to automatically extracted based on source/sourcetype/...

Custom Search Command

  • Pro: Full flexibility. Access to all fields
  • Pro: Can return multi-value fields
  • Con: Speed is a tad bit slower. (I found that enabling "streaming" did improve performance by 6x on my test query, but it's still slightly slower than the "lookup" approach). Also take a look at the v2 interface and the Python SDK for example scripts.
  • Con: Can't be setup to run automatically.
  • Con: You have to deal with getting everything setup properly via config files. Enabling getinfo does let you do more of this without as many restarts.

Please let me know if you have additional thoughts or if you find any mistakes in either of these lists.

Practically speaking, it's a good idea to wrap all of this in a macro, that way if you ever change your mind about which approach to use there are no changes to existing searches. And, if your new approach breaks, you can switch back quickly.

Get Updates on the Splunk Community!

Announcing Scheduled Export GA for Dashboard Studio

We're excited to announce the general availability of Scheduled Export for Dashboard Studio. Starting in ...

Extending Observability Content to Splunk Cloud

Watch Now!   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to leverage ...

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!

What if there was a way you could keep all the metrics data you need while saving on storage costs?This is now ...