|
hi... we are using splunk to look at indexed logs, at the same time, the googlemap add on is enabled to view query's origin. problem is, when we search using search query sourcetype="*" | geoip , supposably should give all the events information (which by the way is nearly 5 billion of events!!) but it only shows 19,000 events!! this is a disaster as geoip of splunk is really important to us. any clues what might be the cause of the problem and how to fix it???
showing 5 of 9
show 4 more comments ▼
|
|
The observed behavior is a postprocess-limitation of Splunk. When you take a look at the default maps view, you will notice that results are being post-processed. If you search through Splunkbase, youl'll find multiple discussions regarding the 10k postprocess limitation. The results are summarized behind the scenes for the user. The module will automatically apply the following postprocess-search to the base-search:
So the results are aggregated to the count results by unique (distinct) location. The resulting number of records is usually lower by an order of a magnitude in most cases. eg. when dealing with geo-ip database based results, there will not be a huge number of unique locations, since the number of records in the GeoCity Light database is not that big. A lot of IP addresses share the same location. The GoogleMaps module will only fetch 100,000 results from the search endpoint. This is a hard-coded limitation at the moment, since the browser won't be able to handle more records at a time. A better approach is to summarize the result in the base-search, by searching for something like:
Here's a short explaination what this search does:
Reduce the result in the base search to those events that contain the relevant IP field
Aggregate by distinct IP address
Do the geo-ip lookup
Filter out those results that do not contain geo-information
Aggregate again to the the summarized count of events by distinct location (ie. distinct combination of latitude and longitue). If you're really dealing with a even bigger number of distinct locations (more than 100k), which I doubt, then you will need to perform some kind of server-side clustering. There will be support for accurate geo-clustering in a future version of the Google Maps app. In the meanwhile you can use the kmeans command or craft a custom search command. thanks alot ziegfried for the comprehensive, detailed answer... everyone here kindly aimed to help me with this problem and finally there is an answer... 2 thoughts though.. 1: searching using your suggested search query does not fetch anything for me, it seems to be searching, but fetched results remains 0 and search percentage sticks to 46% for very long time (almost a day)... 2: your doubt is actually wrong. I do have nearly 1.5 millions of "distinct locations"...
(26 Feb '12, 17:59)
nina15
I forgot the geoip command in the search. You should take a closer look at the kmeans command to do server-side clustering of the results.
(27 Feb '12, 06:18)
ziegfried ♦
ok, from what I understand, your comments are on Google Maps limit of 100,000: 1)Why splunk only goes up to 10k instead of 100k? if possible, how to modify that? 2)Why running geoip command in splunk's main search (the flashtime runner) also has the same issue although geoip command doesnt have anything to do with Google Maps?
(01 Mar '12, 20:30)
nina15
Im trying sourcetype="*" | geoip SourceIP | kmeans k=100 SourceIP_country_name , its not giving me anything... any suggestions on the search command?
(06 Mar '12, 01:16)
nina15
Im still waiting for an answer... Did I miss any answers here...?
(15 Mar '12, 23:22)
nina15
|
|
sourcetype="*" | geoip ??? How can that work. You need to pass a field like a src_ip or client_ip to that search so geoip knows what to graph. Whatever your IP address field is called, just use that in the search: sourcetype="*" | geoip src_ip |
It's very unlikely that this is a major bug that has gone unnoticed until now. More likely it's a problem with your configuration. Also remember that this is a community-based forum, that means a lot of people who read it are probably on vacation or busy with the holidays. |
|
You must pass the field name that contains the IP addresses after the geoip command or it will not work. |
|
answering my own question in my previous post, the parameter "maxout" can solve the misery as its under [subsearch] stanza, which makes sence as geoip is a subsearch not a whole search... but anyhow, changing that parameter's value still did not help at all... no effects!! 1
In this context geoip is not a subsearch, which is why the limits.conf parameter you mention has no influence on it.
(30 Jan '12, 23:33)
hexx ♦
noted and thanks. and I still am waiting to find out the source of the problem...
(31 Jan '12, 01:28)
nina15
|
|
I think i found the source of the problems.. i think its neither google maps nor geoip! but actually any view or special searches rather than normal search!! to prove that im correct, you can try a simple search "*" which retrieves all info in the normal search, but then try it in "Advanced chart view" and u'll see again only less than 15-20 thousands of results will be shown!!! (this is while normal search goes up to few billions!) I think there is a bug in splunk's views or any kind of advanced searching for that matter... so i've started a new thread here: Bug? Splunk advanced searching/views does not display correctly |
|
yes I do get new fields like SourceIP_city, SourceIP_country_code, SourceIP_country_name, etc... which is generated by geoip... for the search u said, its taking for ever for it to complete (as I have near 5 billion events). this search is supposed to count per day, right? im believing more and more now that my geoip configuration has some sort of limit up to 20000 in displaying results for example when I search the query
it searches in all events, per day, and fetching millions of events, but since its daily, each day will be counted as 1 result overall hence it does not stop the search. but when I search this query:
by itself, it stoppes at 19940... |
|
ok I found another interesting support for the limitation fact I was talking about... I tried this query in Google maps view:
It says 22,539 "matching events", and above the map it says: "10000 results with location information ( 1 distinct locations ) over all time" and when I again tried
I get 19,946 "matching events" (same as before) and above the map says: "9984 results with location information ( 1315 distinct locations ) over all time" I said before that I think there is some limit working as a barrier here, and I said somehow its related to results, I think im confirmed here that its related to "results" number, not matching events... and seems to have a limit of "10000" ... any ideas about this kind of configuration...? |
|
In case you want to take a look at the limits, they are established on $SPLUNK_HOME/etc/system/default/limits.conf, find the one you'd like to change, create a new limits.conf and place under: $SPLUNK_HOME/etc/system/local/limits.conf |
I'm surprised no one has any answers for this post... does this mean that the geoip plugin has that huge problem and no one has ever fixed it??
thanks for the responses...
if I want it to show all events that exists, what should my search be? because I need to get on the map counting all events.. one more issue... is this wrong? :
I get some errors when I perform this search...
ok so now this is my search query:
but again, it only shows me 19940 number of events. (as I said, total is nearly 5 billion!!) is it possible that geoip has some kind of limitation in displaying the results??
P.S: the results from
* | geoip SourceIPare exactly similar to my own search method:Are you not seeing the new fields on the left, near the bottom when you run a search like this?
SourceIP=* | geoip SourceIP | timechart span=1d count(clientip_city) as "City Count" by clientip_city | rename clientip_city as City
I play around with some larger firewall logs and observe the same behavior. geoip stops always after approximately 20000 events.
I will do some further testing later.
I'm unable to find the right paramter till now. geoip still stops working after around 20k events and 12 seconds.
thanks for your answers... yes I figured that and its been few days I'm also looking after it, still could not find the right parameter. under limits.conf it seemed the last parameter (max_count) to be the one, but no changes after changing the parameter takes effect...
and one other matter I cannot understand, is that the limits.conf file supposably should control all parts of splunk (since its located under system/default) but how come when I enter search query not using geoip, I can easily get billions of results, but the limit problem only occurs when geoip is being used in the search ??!
could this problem be from geoip's free license but not the limit.conf's parameters??