I'm looking for an efficient way to retrieve the single most recent event from each of about 2000 sources.
It seems that something like:
source=prefix* | stats first(_raw) as _raw by source
scans a lot of events.
Is there a better way?
Yoel
Have a look at the metadata
command.
| metadata type=sources
UPDATE: So, to get the actual event, you could still use metadata
, but in a subsearch that feeds the outer search with specific info on where/when to look for events. I messed around a bit to get a search that's working. For clarity I can also show what did NOT work 😉
[| metadata type=sources | rename lastTime as _time | fields _time source]
The idea here was to get output from the subsearch like this:
( ( _time=1341991627 AND source="source1" ) OR ( _time=1342119251 AND source="source2" ) OR ([...]))
However the problem is that _time
is considered to be an internal field, and as such won't get picked up by the format
command that is implicitly run by the subsearch. So the output will only contain the source
parts.
However using a clever (or well...you be the judge of that) hack we can get the output we want anyway, by creating a field called query
which will contain the _time
filtering string. query
is a special field whose value is returned from the subsearch as-is rather than Splunk adding "query=" before it.
[| metadata type=sources | eval query="_time=".lastTime | fields source query]
This should create a search filter that searches on a number of source
/_time
pairs as shown above. Because multiple events from a source could potentially occur within the same second, you might still need to add a | dedup source
at the end of the outer search to make sure you only get one event per source. I hope this gives you what you were looking for.
If you're wanting to get the actual indexed 'event', that will be how you do it. If you just want to know when the last event occurred for a source you could do this:
| metadata type=sources | search source="*prefix*" | convert ctime(lastTime) as timestamp | sort - lastTime
I would like to have an external system that caches a "snapshot" of a group of sources. The cache will serve hundreds of request per second for some field values. For my purpose I only need the last event from each source but I would like to update the cache every few minutes.. I'm looking for a way to update this cache as efficiently as possible. As this still in design, I'm open for suggestions (such as using multiple indexes etc).
It would because you're inspecting the raw events as opposed to the metadata of your events. The way that both Ayn and myself shows is just for practical timing purposes.
Help us understand what problem you're trying to solve and we may be able to find a better way.
Unfortunately it seems my initial method scans all the events over the time range.
Have a look at the metadata
command.
| metadata type=sources
UPDATE: So, to get the actual event, you could still use metadata
, but in a subsearch that feeds the outer search with specific info on where/when to look for events. I messed around a bit to get a search that's working. For clarity I can also show what did NOT work 😉
[| metadata type=sources | rename lastTime as _time | fields _time source]
The idea here was to get output from the subsearch like this:
( ( _time=1341991627 AND source="source1" ) OR ( _time=1342119251 AND source="source2" ) OR ([...]))
However the problem is that _time
is considered to be an internal field, and as such won't get picked up by the format
command that is implicitly run by the subsearch. So the output will only contain the source
parts.
However using a clever (or well...you be the judge of that) hack we can get the output we want anyway, by creating a field called query
which will contain the _time
filtering string. query
is a special field whose value is returned from the subsearch as-is rather than Splunk adding "query=" before it.
[| metadata type=sources | eval query="_time=".lastTime | fields source query]
This should create a search filter that searches on a number of source
/_time
pairs as shown above. Because multiple events from a source could potentially occur within the same second, you might still need to add a | dedup source
at the end of the outer search to make sure you only get one event per source. I hope this gives you what you were looking for.
Brilliant!
I wonder how will it scale for >1000 source. The subsearch will create a very large filter (although far from maxout max value of 10500).
Very slick Ayn.
Updated my answer in an attempt to solve what I think you want to accomplish.
Thanks but I would like the raw events, not meta data on the sources.