Hi,
I have indexed 6GB of CSV data in Splunk. When I look at the compression rate using this search:
| dbinspect index=sca_rs_index2
| fields state,id,rawSize,sizeOnDiskMB
| stats sum(rawSize) AS rawTotal, sum(sizeOnDiskMB) AS diskTotalinMB
| eval rawTotalinMB=(rawTotal / 1024 / 1024) | fields - rawTotal
| eval compression=tostring(round(diskTotalinMB / rawTotalinMB * 100, 2)) + "%"
| table rawTotalinMB, diskTotalinMB, compression
It gives me 500%.
Indeed, when I execute du -h /opt/splunk/var/lib/splunk/sca_rs_index2 , I got 25GB.
The CSV files contain mainly float data. So, I can understand that the compression ratio would not be so good, but not five times bigger. Here is a sample line of the indexed files:
156,Jun-25-2015 03:53:56:765 PM (CEST),4.46,4.24,-33.79,-36.85,2.9007191883900015E-9,2.9265995391399987E-9,7.07,12.803,2.184,2.128,1.7805797178279998E-9,1.8405463108899995E-9,-0.8554831,-1.4905629,-0.23367512,-0.04267813,-0.85057795,-1.4899217,-0.24520056,-0.03800104,2.9247010642799994E-9,2.91267178384E-9,1.3790839E-7,1.36082E-7,0.38341087,97,156,2.9231726E-9,0.02,2.8500933E-9,2.996252E-9,2.760363E-9,3.0509275E-9,2.6058726E-9,3.1849554E-9,1.3411523E-9,2.2089566E-9,-1.0E-8,1.9596868E-7,-1.0E-8,1.959884E-7,,,-0.009,0.55262375,-0.9074077,-0.8010149,-1.5971898,-1.379061,-0.5492179,0.022210669,-0.3409165,0.23051207,-0.90580165,-0.79919446,-1.5967877,-1.3784232,-0.5492179,0.022210669,-0.3409165,0.23051207,-35.85059,-31.78982,-39.128872,-34.6483,-2.9167533,7.0832467,-2.962398,7.037602,3.2690098,2.9473321,1.4245242,7.4673486,12.490728347227515
I have 5 accelerated reports on this data. There is a subfolder summary in /opt/splunk/var/lib/splunk/sca_rs_index2. Does it contain the summary information?
What could be improved? How can I investigate this further?
Thanks,
... View more