I'm currently receiving an excess amount of data from the VMWare app sample below and would like to only keep a few of the fields before being indexed. Is there a way to do this?
_raw: vm-1111 501170cc-8439-1cb3-04ba-8dc34434b33c 4001 20 0 0 0 0 0 0 0 21 0 0 0 0 21
Field Extractions:
p_average_net_bytesRx_kiloBytesPerSecond 0
p_average_net_bytesTx_kiloBytesPerSecond 0
p_average_net_received_kiloBytesPerSecond 0
p_average_net_transmitted_kiloBytesPerSecond 0
p_average_net_usage_kiloBytesPerSecond 0
p_summation_net_broadcastRx_number 21
p_summation_net_broadcastTx_number 0
p_summation_net_droppedRx_number 0
p_summation_net_droppedTx_number 0
p_summation_net_multicastRx_number 0
p_summation_net_multicastTx_number 0
p_summation_net_packetsRx_number 21
p_summation_net_packetsTx_number 0
I'm looking to only keep these fields before being indexed (for example)
p_average_net_received_kiloBytesPerSecond 0
p_average_net_transmitted_kiloBytesPerSecond 0
p_summation_net_droppedRx_number 0
p_summation_net_droppedTx_number 0
p_summation_net_packetsRx_number 21
p_summation_net_packetsTx_number 0
You can route to nullqueue based on patterns in the events you want to drop:
https://docs.splunk.com/Documentation/Splunk/latest/Forwarding/Routeandfilterdatad
The following would prevent any events with the string p_summation_net from getting indexed.
props.conf:
[vmware:sourcetype]
TRANSFORMS-null = drop_p_avg, drop_p_summation
transforms.conf:
[drop_p_avg]
REGEX = p_average_net_
DEST_KEY = queue
FORMAT = nullQueue
These would need to be placed on your indexers.
Tried this approach, I created a test message (in JSON format for example)
_raw: {"message": "Running ITBSA Common Module", "field1": "some text", "state": "OK"}
On the Search Head / Indexer (my test system is a combined one)
Updated file: /opt/splunk/etc/system/local/props.conf
[common]
TRANSFORMS-null = drop_message
Update file: /opt/splunk/etc/system/local/transforms.conf
[drop_message]
REGEX = state
DEST_KEY = queue
FORMAT = nullQueue
I restarted splunkd and now no data is coming in.
On a forwarder I have this specified to create test data
[script://./bin/common.py]
source = monitoring::test
sourcetype = common
ALL messages, or just all messages of the sourcetype common?
If ALL messages, not sure your issues.
if messages of the sourcetype common is the issue, the problem could be your REGEX is matching more than expected.
I need the raw input go from
{"message": "Running ITBSA Common Module", "field1": "some text", "state": "OK"}
to
{"field1": "some text", "state": "OK"}
from the specficied sourcetype. However, the catch is that the true data that is coming in does not fit that format it looks like a tab separated data
_raw: vm-1111 501170cc-8439-1cb3-04ba-8dc34434b33c 4001 20 0 0 0 0 0 0 0 21 0 0 0 0 21
I misunderstood, thought you were looking to get rid of the events, not the specific fields.
If you want to get rid of specific fields, you probably want to look at SEDCMD-
http://docs.splunk.com/Documentation/Splunk/7.1.0/Admin/Propsconf#Field_extraction_configuration
You should be able to use Sed like syntax to remove the unwanted data
Something like:
props.conf
[sourcetype]
SEDCMD-removeunwanted1 = s/{[^:]+?:[^:]+?/{/
That's what I was leaning towards but as the data is 'tab' separated I was unsure on how the field extractions would handle that. I was hoping to specify just the field names to be excluded.
Along with writing the regex would be just 'fun'
Example data and would need to remove the bold (if I have to deal with raw data)
vm-125620 5006a450-f3f4-3794-ecb7-a50b97a8bec4 vmnic5 20 0 0 0 0 0 0 0
vm-1111 501170cc-8439-1cb3-04ba-8dc34434b33c 4000 20 58738 1108 1108 0 379 0 379 11612 0 0 0 1487 0
vm-163268 5006d319-719a-d56c-3e3c-eb1cab4163de aggregated 20 656 2 2 0 91 0 91 1591 0 0 0 94 60
Assuming the field at the front is the VMware eventId...you might need to create one per eventId:
SEDCMD-vm125620 = s/(vm-125620)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)/$1 $2 $3 $4 $5 $6 $7 $8 $10/
Correction, should have used back references \1 instead of variables $1 in the SEDCMD:
SEDCMD-vm125620 = s/(vm-125620)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)\s(\w)/\1 \2 \3 \4 \5 \6 \7 \8 \10/