Re: Eval or Lookup bug

Derson · ‎01-22-2024

I have a splunk search that is returning the wrong results from a kvstore if the secondUID field is set to itself before doing the lookup. This is distilled from the actual search for simply showing the bug. Both secondUID and uID should be represented as strings.

Does anybody know why | eval secondUID=secondUID causes the lookup command to return the wrong results? When it is commented out the correct results are returned. The results are consistently the same wrong results when they are wrong and the errors are event count dependent. So for instance, if I switch the head command on line 4 from 4000 results up to 10000 results, the lookup wrong result rate goes from 4.3% to 11.83% given the lines I am passing in for this example. If I pass in a different set of events, the results would still be wrong and consistently the same results wrong, but not necessarily the same % of wrong results compared to the other starting events.

If you either comment out that eval on line 8 or do | eval secondUID=tostring(secondUID) then the correct results are returned from the lookup command. If you switch tostring() with tonumber() the number of wrong lookups goes up.

I don't think this is intended functionality because | eval secondUID=secondUID should not be changing the results IMO, and the % of errors depend on how many events are passed through the search. More events = higher % of errors. The string comparison functions in the wheres also show nothing should be changing.

| inputlookup kvstore_560k_lines_long max=10000 
| stats dc(uID) as uID by secondUID 
| where uID=1 
| head 4000 ```keep 4000 results with the 1=1 uID to secondUID relationship established```
| eval secondUIDArchive=secondUID ```save the initial value ``` 
| where match(secondUIDArchive, secondUID) and like(secondUIDArchive, secondUID) ```initial value is unchanged```

| eval secondUID=secondUID ```this line causes the search to return different results compared to when commented out``` 
| where match(secondUIDArchive, secondUID) and like(secondUIDArchive, secondUID) ```string comparison methods show they are the same still```

| lookup kvstore_560k_lines_long secondUID output uID ```output the first UID again where there should be a 1=1 relationship```
| table uID secondUID secondUIDArchive 
| stats count by uID ```the final output counts of uID vary based on whether the eval on line 8 is commented out.```

scelikok · ‎01-23-2024

Hi @Derson,

Thank you for the nice and detailed explanation and the test results. I cannot think of anything else to say.

I agree with you that may be a bug, it seems better to open a support case for this issue.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

PickleRick · ‎01-23-2024

Well... Splunk sometimes does some unintuitive casts between number and string and it's just how it is, I think although so far I've only encountered the opposite situations - the need to explicitly call tonumber() because the field was a string even though it looked like a number. Typing in Splunk isn't that strict (maybe except for datamodels) so I think you just have to get used to it. BTW, you can use typeof() to see the type of a field.

Derson · ‎01-23-2024

I wouldn't expect it to be an issue in a dynamically typed language either. In my other response to scelikok, I included a search where the typeof() command either isn't showing the real type of the field, or confirming it is a memory allocation/referencing issue because I think the c++ tostring() function re-allocates the variables (its been a while so I could be wrong though).
Do you know if typeof() is returning the actual type of the variable? Or is it all stored as an object behind the scenes so the best typeof() can do is guess based on the contents? Even when I am casting tostring() Splunk still calls it a number. Not calling it a float or an int makes me think it is taking a best guess based on object contents.
Reposting the other response's search below:

| inputlookup kvstore_560k_lines_long max=15000
| stats count by secondUID 
| where count=1 AND match(secondUID, "^\d+$")
| head 4000

| eval initialType=typeof(secondUID)
| eval secondUID=tostring(secondUID) 
| eval subsiquentType=typeof(secondUID)
| where like(initialType, subsiquentType) ```all of the types still read as "numbers" so no events are removed```

My biggest clue that this is abnormal is the event count affecting the % of lookup error in the initial search. If I run the same events through the search that have errors initially but with a smaller search event count, they return the correct results.

PickleRick · ‎01-23-2024

Well, you can to some extent treat SPL as dynamically typed - the fields are created on the fly and there is really not much way to force a field to be of a given type. The tonumber() seems to be doing the trick for converting to number but indeed a number string seems to be treated like a number.

I remember there used to be some cases with displaying values which suggest that multivalue fields are also stored as strings. So it seems (but that's just an observation from outside, I'm no Splunk developer) that values are internally, at least to some extent stored as strings.

Derson · ‎01-23-2024

MV fields being stored as strings would make sense, or an object containing strings and a hidden delim field at least. I wish one of the architects would be able to give a technical talk on how this stuff works more behind the scenes so we could help with more pointed debugging info.

I considered not having strict typing on the kvstore collection as being the issue, but regardless it shouldn't be returning the wrong value instead of nulls. That would bring it back to the lookup command being the issue.
I also think I have found a bug with mvmap that exhibits very similarly to this one. But since it was such an edge case I didn't even make a blog post about it.

Hoping a splunk employee will see this and track down the issue behind the scenes.

scelikok · ‎01-23-2024

Hi @Derson ,

I don't call this behavior as a bug. eval command works for strings and numbers. For example, if the values are strings it concatenates if secondUID values are numbers it makes a calculation.

I think some of your secondUID values are being processed as numbers, that is why lookup cannot match.

Your solution is wise and assures that the values are processed as strings.

If this reply helps you an upvote and "Accept as Solution" is appreciated.

Derson · ‎01-23-2024

Hey @scelikok,

Thanks for jumping in to look at this.

I agree that part of the problem is the typing here because it only returns the wrong lookup uID values for secondUID values that could be reasonably interpreted as numbers. And maybe this comes down to poor error handling.

I wouldn't be satisfied just writing this off to a typing issue though for multiple reasons:

First (opaque), this would be completely opaque to the end user. Maybe you have enough experience to tell me, but if it is a typing issue, then the typeof command does not return the actual types of the variables. I have to believe it returns what Splunk expects it would be. For instance in the search below, the last |like command does not remove any results.

| inputlookup kvstore_560k_lines_long max=15000
| stats count by secondUID 
| where count=1 AND match(secondUID, "^\d+$")
| head 4000

| eval initialType=typeof(secondUID)
| eval secondUID=tostring(secondUID) ```this line causes the search to return different results```
| eval subsiquentType=typeof(secondUID)
| where like(initialType, subsiquentType)

Second (inconsistent), casting the secondUID with tonumber() makes more wrong results be returned. So whatever is happening is not consistently happening for secondUIDs that are integers. I initially found this by rexing a dashboard token list of secondUIDs so it isn't a matter of how the variables are being initially loaded from the lookup table. This is also shown by needing the eval command to cause the error which is distilled logic from an if statement. And as far as the inconsistency goes, there is no pattern to the numbers that are treated as numbers vs strings. Its not like all numbers starting with 0s have this happen or whitespace being around some.
Third (unexpected lookup function), if you looked at any of the events they would look correct. The values for secondUID would be correct and they have results returned for uID. Unless this is just another quirk of Splunk where wrong lookups are possible without warning, I would expect the |lookup to output a null() value when it is unable to use the key it is given.
Fourth (noticing the error), because Splunk doesn't give a warning and outputs a result, anybody using dedup or stats without a predetermined output expected would likely never know this exists. The only way I noticed it was because I knew there should be a 1-1 relation between my keys and values and there wasn't at the end of the search.
Fifth (event count dependent), since there is a higher % of errors happening when the initial event count goes up without a pattern in the secondUID values being passed in other than being "\d+", this makes me worried there is a memory allocation/referencing bug behind the scenes. Something which would probably also be fixed by tostring if this was the case. Counterpoint being that it reliably returns the same wrong lookup values.

Maybe it really is just poor error handling in |lookup combined with typeof() being a lie. But I don't know how to come to a definitive conclusion without having access to the Splunk code (c/c++?) behind the scenes, or analyzing Splunk while it is running to rule out an allocation/referencing issue in absence of the |lookup code... Maybe you would have other ideas on how to narrow it down. Or maybe I'm just crazy

Eval or Lookup bug

eval

lookup

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!

Introducing the Splunk Community Dashboard Challenge!