This is commonly caused by a corrupted journal.gz. You can try the following to repair the bucket:
Locate the corrupt bucket using this query in Splunk. This query will return the bucket name as the Bucket field value. The splunk_server is the name of the server with the corrupt bucket.
index=_internal sourcetype=splunkd IndexerService corrupt earliest=-7d | stats earliest(_time) as time by splunk_server, idx, Bucket | convert ctime(time) as time
If you know the index name that was being queried when the error is returned, narrow it down in your query by adding idx=, e.g.:
index=_internal sourcetype=splunkd idx=main IndexerService corrupt earliest=-7d | stats earliest(_time) as time by splunk_server, idx, Bucket | convert ctime(time) as time
SSH to the splunk_server, stop the splunk indexer instance in question and run the following command to repair the bucket (replacing the index-name and bucket-name appropriately):
./splunk fsck repair --one-bucket --index-name=main --bucket-name=db_1490291523_1489620455_12 --try-warm-then-cold
If the command fails for some reason, do the following:
1. cd to bucket's rawdata directory
2. gunzip journal.gz (this will produce a journal file)
3. gzip -c journal > journal.gz (recompresses the journal file into journal.gz)
4. delete journal
5. Re-run the repair command above and restart the the splunk server.
This is commonly caused by a corrupted journal.gz. You can try the following to repair the bucket:
Locate the corrupt bucket using this query in Splunk. This query will return the bucket name as the Bucket field value. The splunk_server is the name of the server with the corrupt bucket.
index=_internal sourcetype=splunkd IndexerService corrupt earliest=-7d | stats earliest(_time) as time by splunk_server, idx, Bucket | convert ctime(time) as time
If you know the index name that was being queried when the error is returned, narrow it down in your query by adding idx=, e.g.:
index=_internal sourcetype=splunkd idx=main IndexerService corrupt earliest=-7d | stats earliest(_time) as time by splunk_server, idx, Bucket | convert ctime(time) as time
SSH to the splunk_server, stop the splunk indexer instance in question and run the following command to repair the bucket (replacing the index-name and bucket-name appropriately):
./splunk fsck repair --one-bucket --index-name=main --bucket-name=db_1490291523_1489620455_12 --try-warm-then-cold
If the command fails for some reason, do the following:
1. cd to bucket's rawdata directory
2. gunzip journal.gz (this will produce a journal file)
3. gzip -c journal > journal.gz (recompresses the journal file into journal.gz)
4. delete journal
5. Re-run the repair command above and restart the the splunk server.
I'm curious as to why this works? Correct me if I wrong but you are simply uncompressing the journal file, and then recompressing the file you just uncompressed. And then deleting the uncompressed version to get rid of it. Finally, run the single bucket fix. How does that actually fix the issue? I'm not saying it doesn't work I'm just wondering why the unzip/zip thing works. Thanks.
worked perfectly.. thanks
Good information. If you are in a cluster environment.
1. Enable the indexer cluster maintenance mode
2. Stop the indexer in question
3. Follow the above steps 1 through 5
4. Start the indexer in question
5. Disable the indexer cluster maintenance mode.
If you are not successful using gunzip, try 7z.
Someone had this problem.
https://answers.splunk.com/answers/755426/trying-to-fix-the-corrupted-bucket-error-journalsl.html