how do I get auto field detection on forwarded csv...

bmgilmore · ‎02-27-2012

I set up a splunk instance on a server with a local csv file that updates 1/min. Using the add data wizard, it auto detected all the appropriate timestamp, metadata and value fields. I then set splunk to forward to another instance (to test forwarding), and the data forwards fine, but its all in raw format. I looked for a props.conf file on the original server to see if the wizard created something I could copy over, but no luck.

Also, if you can help with setting this up on the reciever instance, can you also mention if there is a way to go through all the data that has already been indexed and extract the fields into the indexes?

Sorry, totally new to splunk, just trying to build a business case and do some DD before strapping to it as a platform!

lguinn2 · ‎02-27-2012

Good news - field extraction is done at search time. This means that you can create fields for data that has already been indexed.

If you selected csv as your sourcetype (under More Settings in Manager » Data inputs » Files & directories » Add new), then Splunk would be doing the field extraction for you. But since you are forwarding the data, you didn't set that up on the forwarder.

Option 1 - Set sourcetype to csv

Here is one way to do this. This technique has you set the sourcetype of the input to csv manually as the data is collected, on the forwarder. Edit the inputs.conf to tell Splunk the correct sourcetype of the input file. I am using the filename "example.csv" here

inputs.conf

[monitor::///mydirpath/example.log]
sourcetype=csv

Important - this will only affect new data. It will not change the sourcetype of data that has already been indexed.

Option 2 - Set field extraction for a sourcetype

And here is another way. This technique assumes that the data has already been indexed, and has been assigned a sourcetype that is not csv. Let's say that you have two csv files: one of the files has sourcetype X and the second file has sourcetype Y.
Create an entry in props.conf and transforms.conf for each type of csv file.

You may need to create the props.conf and transforms.conf files. Put them under $SPLUNK_HOME/splunk/etc/system/local ($SPLUNK_HOME is wherever you installed Splunk).

props.conf

[X]
SHOULD_LINEMERGE = false
TRANSFORMS-t01 = csv1-fieldextraction

[Y]
SHOULD_LINEMERGE = false
TRANSFORMS-t02 = csv2-fieldextraction

transforms.conf

[csv1-fieldextraction]
DELIMS=","
FIELDS="User","UID","Session#","CPU","Memory","Status"

[csv2-fieldextraction]
DELIMS=","
FIELDS="PID","PPID","UID","CPU","Memory","CMD"

Now, as you add more data to splunk, you can continue to use sourcetype X and sourcetype Y, or create new sourcetypes as needed.

helge · ‎12-11-2013

Shouldn't REPORT be used instead of TRANSFORMS so search-time extractions are used instead of index-time extractions?

sideview · ‎06-13-2012

Right. The csv sourcetype is configured to use CHECK_FOR_HEADER, and that type of configuration generates AutoHeader config that ends up in '$SPLUNK_HOME/etc/apps/learned', and ends up trapped on the forwarder. So while the data itself gets forwarded, and mod the weird "foo-2" thing that CHECK_FOR_HEADER does to it's sourcetypes, arguably the sourcetypes come across, the field extractions do not come across to the indexer.

bmgilmore · ‎02-27-2012

Thanks. I was most interested in having this work from scratch, so I uninstalled splunk on both servers. Set up the first server again, used the wizard with Preview to add the file. Despite making sure both on the first screen and on more options that the sourcetype was csv, when the data source was saved, it assigned a sourcetype of csv-2. Set the primary up to forward to the newly reinstalled secondary, and the data is sent to the secondary server, but it does not break out the fields like on the primary server. Same data source, datatype, I have 21 fields on pri and 17 on sec. THX!

how do I get auto field detection on forwarded csv?

Option 1 - Set sourcetype to csv

Option 2 - Set field extraction for a sourcetype

Announcing Scheduled Export GA for Dashboard Studio

Extending Observability Content to Splunk Cloud

More Control Over Your Monitoring Costs with Archived Metrics GA in US-AWS!