- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

Is there a way to export the data that isn't correct then re-import it using the correct sourcetype? If not, is there another way to change the sourcetype after the data has been indexed?
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


The easiest method is to wipe the data and reindex.
Wiping the data can be global (splunk clean eventdata -index myindex) or more focused (splunk search "some data | delete"). The full wrinkles of these methods are discussed elsewhere.
Another means is sourcetype renaming, if you want to alias an entire sourcetype to another one you can do this, by eg, in props.conf:
[wrong_sourcetype]
rename = right_sourcetype
This clearly doesn't work if your [wrong_sourcetype] is a valid sourcetype on its own.
It's also possible to dump a bucket to a csv format, manipulate that, and then generate a new bucket from the modified or filtered csv data. This is sort of, 'for wizards'.
The command to emit a bucket to csv is splunk cmd exporttool bucketname filename.csv -csv
To generate a new bucket from the csv, you can use splunk cmd importtool new_bucket_dir filename.csv
You will either have to manually assign the correct splunk name to the bucket_dir, for example by naming it the same as the original, or by using some kind of script to name it. I used the following shell fragment, where $bucket was the old bucket
bucket_id=$(echo $bucket | sed 's/.*_//')
(cd $NEW_BUCKET; ls *.tsidx | sed 's/-[0-9]\+\.tsidx$//' |sed 's/-/ /') | {
global_low=0
global_high=0
while read high low; do
if [ $global_high -eq 0 ] || [ $high -gt $global_high ]; then
global_high=$high
fi
if [ $global_low -eq 0 ] || [ $low -lt $global_low ]; then
global_low=$low
fi
done
REAL_BUCKET_NAME=db_${global_high}_${global_low}_${bucket_id}
mv $NEW_BUCKET $bucket_dir/$REAL_BUCKET_NAME
Once you have a newly constructed, duplicated bucket, you can remove the old one from your index and insert the new one.
The main problem with exporttool/importtool is that they're not all that optimized, so they consume a significant amount of ram, and a significant amount of cpu for a significant amount of time. We'll be making them faster, but for now you should probably be sure you have a certain amount of headroom on the box where you're processing them.
If you want to go down that path, the full script (treat as example) is stuck in the wiki over here: http://www.splunk.com/wiki/Community:Modifying_indexed_data_via_export_and_import
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

No and no, once data has been indexed, that's the state it's going to stay in. An export/import capability has been requested on a number of occasions, but it's not built yet. If you want to change the 'sourcetype' value, all you can really do is re-index the data
If that's not possible, then the next best solution is to just use tags - http://docs.splunk.com/Documentation/Splunk/5.0/Knowledge/Defineandusetags
- Mark as New
- Bookmark Message
- Subscribe to Message
- Mute Message
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content


The easiest method is to wipe the data and reindex.
Wiping the data can be global (splunk clean eventdata -index myindex) or more focused (splunk search "some data | delete"). The full wrinkles of these methods are discussed elsewhere.
Another means is sourcetype renaming, if you want to alias an entire sourcetype to another one you can do this, by eg, in props.conf:
[wrong_sourcetype]
rename = right_sourcetype
This clearly doesn't work if your [wrong_sourcetype] is a valid sourcetype on its own.
It's also possible to dump a bucket to a csv format, manipulate that, and then generate a new bucket from the modified or filtered csv data. This is sort of, 'for wizards'.
The command to emit a bucket to csv is splunk cmd exporttool bucketname filename.csv -csv
To generate a new bucket from the csv, you can use splunk cmd importtool new_bucket_dir filename.csv
You will either have to manually assign the correct splunk name to the bucket_dir, for example by naming it the same as the original, or by using some kind of script to name it. I used the following shell fragment, where $bucket was the old bucket
bucket_id=$(echo $bucket | sed 's/.*_//')
(cd $NEW_BUCKET; ls *.tsidx | sed 's/-[0-9]\+\.tsidx$//' |sed 's/-/ /') | {
global_low=0
global_high=0
while read high low; do
if [ $global_high -eq 0 ] || [ $high -gt $global_high ]; then
global_high=$high
fi
if [ $global_low -eq 0 ] || [ $low -lt $global_low ]; then
global_low=$low
fi
done
REAL_BUCKET_NAME=db_${global_high}_${global_low}_${bucket_id}
mv $NEW_BUCKET $bucket_dir/$REAL_BUCKET_NAME
Once you have a newly constructed, duplicated bucket, you can remove the old one from your index and insert the new one.
The main problem with exporttool/importtool is that they're not all that optimized, so they consume a significant amount of ram, and a significant amount of cpu for a significant amount of time. We'll be making them faster, but for now you should probably be sure you have a certain amount of headroom on the box where you're processing them.
If you want to go down that path, the full script (treat as example) is stuck in the wiki over here: http://www.splunk.com/wiki/Community:Modifying_indexed_data_via_export_and_import
