Refine your search:

7
1

I would like to check that a given file has been fully indexed by Splunk.

I tried counting the lines in the source file using "wc -l" against the number of events indexed in Splunk, but this doesn't match up because some of my events include multiple lines.

How can I do this?

asked 18 May '11, 10:17

hexx's gravatar image

hexx ♦
7.5k1941
accept rate: 51%


One Answer:

Checking the line count of a source file against the number of lines indexed by Splunk can be easily achieved. Here is an example with a file that numbers 1117 lines indexed as 7 events :

  • Count the number of lines in the source file excluding all lines empty or exclusively containing white space characters, which Splunk doesn't index :

[root@beefysup01 ~]# grep -v -e "^\s*$" /var/log/Xorg.0.log | wc -l
1116

  • In Splunk, search for all events for that specific source over all time, and aggregate the values of the "linecount" field :

source="/var/log/Xorg.0.log" | stats sum(linecount)

alt text


The two numbers should match, provided that you do not work on a live file that is part of a rotation (example : /var/log/messages or $SPLUNK_HOME/var/log/splunk/metrics.log) or that you are routing events from this file to the null queue.


Another method, although often less accurate, is to measure the byte count of the source file (again, excluding empty lines) and compare it against the aggregated byte count for all events indexed for that source :

  • Count the number of bytes in the source file excluding all lines empty or exclusively containing white space characters, which Splunk doesn't index :

[root@beefysup01 ~]# grep -v -e "^\s*$" /var/log/anaconda.log | wc -c
841067

  • In Splunk, search for all events for that specific source over all time, calculate and aggregate the size of all events in the "esize" field using the len() eval function on the event raw data (the "_raw" field) and add the event count for the file!. The last step is important to get an accurate byte count because Splunk "loses" one byte per event when it dumps the last newline character of each event :

source="/var/log/anaconda.log" | eval esize=len(_raw) | stats sum(esize) AS sum_esize, count | eval fsize=sum_esize + count | fields fsize

alt text

link

answered 18 May '11, 10:26

hexx's gravatar image

hexx ♦
7.5k1941
accept rate: 51%

edited 19 May '11, 07:14

Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×326
×113

Asked: 18 May '11, 10:17

Seen: 755 times

Last updated: 19 May '11, 07:14

Copyright © 2005-2012 Splunk, Inc. All rights reserved.