Hi everyone!
We've been randomly facing with rather annoying and critical issue while working with lookups:
sometimes only several entries get lookup fields when there should be many more of them. Rewriting lookup file helps in most of cases.
But it is not stable performance and means you cannot trust results especially when lookup is used in scheduled searches.
Can't say for sure what conditions cause such behaviour: often it happens with large csv files (over 1mln lines) that are rewritten on a daily basis and it happens in all Splunk versions.
This time I managed to save search logs to the same query when lookup worked incorrectly and when it worked ok (after its rewriting).
Comparison of these two files showed that they are mainly identical except several lines.
In "good" log file there is an entry, that misses in "bad" log:
INFO CMBucketId CMIndexId: New indexName=main inserted, mapping to id=1
Also in good log:
INFO DispatchThread SrchOptMetrics optimize_toJson=1
While in bad:
INFO DispatchThread SrchOptMetrics optimize_toJson=2
Excel with comparison is attached
Hope for your help, guys!
The lookup file that's being updated daily, try to move it to etc/system/lookups instead of etc/apps/some_app/lookups. My bet is on knowledge bundle replication issue as it's a large lookup.
Thanks for your suggestion. But does knowlegde bundle replication applicable if we have one-machine installation?
When you say "rewritten" on a daily basis, what, exactly do you mean? There is a time lag during propagation. Is it possible that the lookup file was updated during or shortly before the search that is in error?
It seems to me that there was a technique for shipping the lookup under a different name and then renaming it to put it in place "instantaneously". I'll have to review and see if I can find the description.
I mean that the same search is executed every day and results rewrite csv file.
Resulting csv looks consistent and is searchable but somehow it isn't lookuped correctly.
Speaking about error do you mean error in search that creates this lookup? Then no errors. If you mean errors in other searches that could appear while "csv-writing-search" was running - _internal index didn't show anything critical in that period either.
at what time does the lookup get rewritten? At what time do the searches run that are getting erroneous results?
Lookup is rewritten early in the morning and last time we got erroneous results when used it in the evening.