I have a splunk search that is returning the wrong results from a kvstore if the secondUID field is set to itself before doing the lookup. This is distilled from the actual search for simply showing the bug. Both secondUID and uID should be represented as strings.
Does anybody know why | eval secondUID=secondUID causes the lookup command to return the wrong results? When it is commented out the correct results are returned. The results are consistently the same wrong results when they are wrong and the errors are event count dependent. So for instance, if I switch the head command on line 4 from 4000 results up to 10000 results, the lookup wrong result rate goes from 4.3% to 11.83% given the lines I am passing in for this example. If I pass in a different set of events, the results would still be wrong and consistently the same results wrong, but not necessarily the same % of wrong results compared to the other starting events.
If you either comment out that eval on line 8 or do | eval secondUID=tostring(secondUID) then the correct results are returned from the lookup command. If you switch tostring() with tonumber() the number of wrong lookups goes up.
I don't think this is intended functionality because | eval secondUID=secondUID should not be changing the results IMO, and the % of errors depend on how many events are passed through the search. More events = higher % of errors. The string comparison functions in the wheres also show nothing should be changing.
| inputlookup kvstore_560k_lines_long max=10000
| stats dc(uID) as uID by secondUID
| where uID=1
| head 4000 ```keep 4000 results with the 1=1 uID to secondUID relationship established```
| eval secondUIDArchive=secondUID ```save the initial value ```
| where match(secondUIDArchive, secondUID) and like(secondUIDArchive, secondUID) ```initial value is unchanged```
| eval secondUID=secondUID ```this line causes the search to return different results compared to when commented out```
| where match(secondUIDArchive, secondUID) and like(secondUIDArchive, secondUID) ```string comparison methods show they are the same still```
| lookup kvstore_560k_lines_long secondUID output uID ```output the first UID again where there should be a 1=1 relationship```
| table uID secondUID secondUIDArchive
| stats count by uID ```the final output counts of uID vary based on whether the eval on line 8 is commented out.```
Hi @Derson,
Thank you for the nice and detailed explanation and the test results. I cannot think of anything else to say.
I agree with you that may be a bug, it seems better to open a support case for this issue.
Well... Splunk sometimes does some unintuitive casts between number and string and it's just how it is, I think although so far I've only encountered the opposite situations - the need to explicitly call tonumber() because the field was a string even though it looked like a number. Typing in Splunk isn't that strict (maybe except for datamodels) so I think you just have to get used to it. BTW, you can use typeof() to see the type of a field.
I wouldn't expect it to be an issue in a dynamically typed language either. In my other response to scelikok, I included a search where the typeof() command either isn't showing the real type of the field, or confirming it is a memory allocation/referencing issue because I think the c++ tostring() function re-allocates the variables (its been a while so I could be wrong though).
Do you know if typeof() is returning the actual type of the variable? Or is it all stored as an object behind the scenes so the best typeof() can do is guess based on the contents? Even when I am casting tostring() Splunk still calls it a number. Not calling it a float or an int makes me think it is taking a best guess based on object contents.
Reposting the other response's search below:
| inputlookup kvstore_560k_lines_long max=15000
| stats count by secondUID
| where count=1 AND match(secondUID, "^\d+$")
| head 4000
| eval initialType=typeof(secondUID)
| eval secondUID=tostring(secondUID)
| eval subsiquentType=typeof(secondUID)
| where like(initialType, subsiquentType) ```all of the types still read as "numbers" so no events are removed```
My biggest clue that this is abnormal is the event count affecting the % of lookup error in the initial search. If I run the same events through the search that have errors initially but with a smaller search event count, they return the correct results.
Well, you can to some extent treat SPL as dynamically typed - the fields are created on the fly and there is really not much way to force a field to be of a given type. The tonumber() seems to be doing the trick for converting to number but indeed a number string seems to be treated like a number.
I remember there used to be some cases with displaying values which suggest that multivalue fields are also stored as strings. So it seems (but that's just an observation from outside, I'm no Splunk developer) that values are internally, at least to some extent stored as strings.
MV fields being stored as strings would make sense, or an object containing strings and a hidden delim field at least. I wish one of the architects would be able to give a technical talk on how this stuff works more behind the scenes so we could help with more pointed debugging info.
I considered not having strict typing on the kvstore collection as being the issue, but regardless it shouldn't be returning the wrong value instead of nulls. That would bring it back to the lookup command being the issue.
I also think I have found a bug with mvmap that exhibits very similarly to this one. But since it was such an edge case I didn't even make a blog post about it.
Hoping a splunk employee will see this and track down the issue behind the scenes.
Hi @Derson ,
I don't call this behavior as a bug. eval command works for strings and numbers. For example, if the values are strings it concatenates if secondUID values are numbers it makes a calculation.
I think some of your secondUID values are being processed as numbers, that is why lookup cannot match.
Your solution is wise and assures that the values are processed as strings.
Hey @scelikok,
Thanks for jumping in to look at this.
I agree that part of the problem is the typing here because it only returns the wrong lookup uID values for secondUID values that could be reasonably interpreted as numbers. And maybe this comes down to poor error handling.
I wouldn't be satisfied just writing this off to a typing issue though for multiple reasons:
| inputlookup kvstore_560k_lines_long max=15000
| stats count by secondUID
| where count=1 AND match(secondUID, "^\d+$")
| head 4000
| eval initialType=typeof(secondUID)
| eval secondUID=tostring(secondUID) ```this line causes the search to return different results```
| eval subsiquentType=typeof(secondUID)
| where like(initialType, subsiquentType)
Maybe it really is just poor error handling in |lookup combined with typeof() being a lie. But I don't know how to come to a definitive conclusion without having access to the Splunk code (c/c++?) behind the scenes, or analyzing Splunk while it is running to rule out an allocation/referencing issue in absence of the |lookup code... Maybe you would have other ideas on how to narrow it down. Or maybe I'm just crazy