Refine your search:

2
3

How do I reclaim my disk space after deleting a large number of events from an index?

The Remove data from Splunk pages says:

Currently, piping to delete does not reclaim disk space, but Splunk will be delivering a utility in a future release that reclaims the disk space--this will go through and permanently remove all the events marked by the delete operator.

Is there any other way of reclaiming this space in the meantime?

asked 13 May '10, 13:55

Lowell's gravatar image

Lowell ♦
9.6k637
accept rate: 40%

As of December 2011, Splunk 4.2.5 still does not provide this functionality. The docs still say "Note: Piping to delete does not reclaim disk space.". I heard this is still on the roadmap, but it's still not available.

(19 Dec '11, 15:33) stefanlasiewski

2 Answers:

It is possible to reclaim disk space in this type of scenario by re-indexing the effected buckets.

Note: This may also be useful if you've deleted some sensitive information, such as a password, that really needs to be completely purged. This approach would prevent that indexed term from showing up in type-a-head, for example.

There are several steps to this process.

  1. Identify all buckets for each index that were effected by your deletion. (This alone can be a complicated task, also keep in mind that the delete command forces a bucket roll for hot buckets.)
  2. For each bucket, do the following:
    1. Export the bucket data to a .csv file
    2. Import the .csv file into a new empty bucket (with a temporary name/location)
    3. Optimize the new bucket.
    4. Replace the original bucket with the newly created bucket.

For users running on a unix platform, the following shell commands (script) may be of use: (Note that we are combining the export and import step into a single operation using a pipe)

#!/bin/bash
BUCKET=$1

# Be sure to compare the imported/exported event count.  They should be the same.
exporttool ${BUCKET} /dev/stdout -csv meta::all | importtool ${BUCKET}.new /dev/stdin

# Make sure that bucket .tsidx files are optimized (and merged_lexicon.lex is up to date)
splunk-optimize ${BUCKET}.new
splunk-optimize-lex ${BUCKET}.new

# Compress all rawdata files that were not gziped by importtool
find ${BUCKET}.new/rawdata -name '[0-9]*[0-9]' -size +1k -print0 | xargs -0 -r gzip -v9

# Swap buckets
mv ${BUCKET} ${BUCKET}.old
mv ${BUCKET}.new ${BUCKET}

# Uncomment next line if you really want to remove the original bucket automatically
# rm -rf ${BUCKET}.old

Note: If you plan on using this script, please be sure to add return-code checking. You wouldn't want to remove the original bucket if the export/import failed to complete, for example.


Other considerations:

  • Keep in mind that importtool does not respect your segmentation settings. The default segmentation is used for all imported events. For many setups, this will not matter, but it is something to be aware of.
  • It's possible to loose data using this approach. This is a use-at-your-own-risk kind of operation. It's possible that you may not even reclaim all that much disk space using this approach.
link

answered 13 May '10, 14:26

Lowell's gravatar image

Lowell ♦
9.6k637
accept rate: 40%

-1

not sure what you want to do exactly, but if deleting most of an index for which the logs are still around, you'd prob be better off deleting the index and reindexing the events that you want to

$SPLUNK_HOME/bin/splunk stop

$SPLUNK_HOME/bin/splunk clean eventdata -index myindex

$SPLUNK_HOME/bin/splunk start

link

answered 13 May '10, 16:18

rayfoo's gravatar image

rayfoo
1781110
accept rate: 12%

Yes, the link to the docs in the question does mention that option too. If you want to delete almost everything in an index, then sure this would work. But this is NOT something you would want to do after running splunk for any considerable length of time. Also remember that re-indexing the log files would count towards your license usage. And you also have to use tricks to get splunk to re-read the log files you want to keep.

(13 May '10, 16:43) Lowell ♦
Post your answer
toggle preview

Copyright © 2005-2012 Splunk, Inc. All rights reserved.