Refine your search:

What is the purpose of these files? Some get to be quite large

1179300775 Feb 9 16:45 merged_lexicon.lex

asked 04 Mar '10, 18:37

Chris%20R.'s gravatar image

Chris R.
1.0k126
accept rate: 36%


3 Answers:

These files are part of the search index. They are mostly used to support typeahead. I would not consider them large. They are usually quite a bit smaller than the .tsidx files that constitute the main part of the index.

link

answered 05 Mar '10, 06:10

gkanapathy's gravatar image

gkanapathy ♦
26.4k1622
accept rate: 42%

In a bit more detail, a tsidx file consists of two parts: a lexicon, and a set of postings. The lexicon is a list of terms in alpha order, followed by a pointer to its posting list. The posting list is a mapping for that term, to which events (in the rawdata files) contain that term.

So essentially you have, something like this:

tsidxfile 1:
leixcon: a  b  c 
          |  |  | 
          |  |  +-+
          |  ++   |
          V   v   v
postings: 2 4|1 5|2

tsidxfile 2: (smaller)
leixcon: d 
          |
          V
postings: 2 8

The lexicon tells us what terms exist and the postings tell us where to find them. However, we have to look in every tsidx file to find out all the terms. So if there are 20 tsidx files and you type in 'gromblhyozorktooks', which doesn't exist, splunkd has to open all 20 tsidx files to figure out you're crazy.

The merged_lexicon.lex is just a file to contain all the lexicons, which are much smaller, it looks more like this:

a b c d 

This allows typeahead to answer its questions much more quickly (what terms exist), and allows negative lookups to fail much faster. The typical case for this is that some buckets have your term, and some do not, so the merged lexicon allows buckets to be completely ruled out much faster.

link

answered 05 Mar '10, 22:33

jrodman's gravatar image

jrodman ♦
5.8k2515
accept rate: 42%

edited 11 Mar '10, 22:35

If you don't need typeahead and are looking to save some space on your Splunk partition, deleting these files can save you about 10% on your total index size.

link

answered 06 Mar '10, 00:36

the_wolverine's gravatar image

the_wolverine ♦
4.3k5843
accept rate: 50%

I think there's been some optimization to the merged_lexicon files. They're currently under 5% for me.

(11 Mar '10, 22:47) jrodman ♦

Apparently they can take up anywhere from 5%-20%

(30 Mar '10, 16:39) the_wolverine ♦
Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×299

Asked: 04 Mar '10, 18:37

Seen: 582 times

Last updated: 11 Mar '10, 22:35

Copyright © 2005-2012 Splunk, Inc. All rights reserved.