What's the best way to determine how many events I'm pulling off disk during a query, and what numbers am I looking for?
Here's a query I wrote that takes a stab at it, but if there's a better and/or existing way, I'd like to hear it.
index=foo | eval n=time() | eval s=len(_raw)
| stats count sum(s) as bytes min(n) as min max(n) as max avg(s) as "Average event size"
| eval megs=bytes/1024/1024 | eval "Megabytes per second" = megs/(max-min)
| eval "Events per second" = count/(max-min)
Note:
Turn off Field Discovery. On a test machine with good disks, I'm seeing 2,500 with Field Discovery on, and 15,000 with Field Discovery off.
Using the number of bytes of the event is not really a meaningful measure of disk performance, as event data is stored compressed, and it does not account for bytes read but not returned, nor bytes decompressed but not returned, not reads of the index itself as opposed to retrieval of the raw data.
Also please note that this query (a "dense" query) that returns most of the raw data in an entire index, without having to seek for it, this mostly measures speed of decompression, which is CPU-bottlenecked. It does not tell you as much about disk seek or disk transfer rates.
No, that will tend to measure decompression as well. Choose an extremely rare term, something that occurs in fewer than 1 in a million events, and time it a search for it over a given time range of data, e.g., search for "vbumgarburnerization".
Can you think of a query that would be better for testing disk seek? Maybe date_second=0?
Also please note that this query (a "dense" query) that returns most of the raw data in an entire index, without having to seek for it, this mostly measures speed of decompression, which is CPU-bottlenecked. It does not tell you as much about disk seek or disk transfer rates.
Using the number of bytes of the event is not really a meaningful measure of disk performance, as event data is stored compressed, and it does not account for bytes read but not returned, nor bytes decompressed but not returned, not reads of the index itself as opposed to retrieval of the raw data.
Take a look at $SPLUNK_HOME/share/splunk/search_mrsparkle/templates/parser/inspector.html
Wow, that seems really low. What is your setup? Is this a VM?
well i get a bit better values but i did the search on a different index where the average size of event is only 160 and not 500 as at the index before. This just to make sure Splunk doesn't read from FS Chache.
My values :
Index 1 : avg eventsize 503.08 events per second 2021.67
Index 2 : avg eventsize 150.88 events per second 2731.79
sorry to block your question with my problems 🙂 but this might be interesting for other people
Actually, take the head 1000000 out of that query and see what you get. That appears to be hindering the performance greatly.
I think this is a good question, we discussed a few performance issues with splunk and never got a clear answer what a good value is ... Splunk never come up with a command to measure the amount of events per second, so i think it doesnt exist.
what are your values ? we never reach 50'000 events per second, we get values around 1700 Events per second.
Yes, you can do the math in inspect, and get the same answers, but I was hoping there's an existing command that provides more info.
Is there a query that populates inspect?
Does the "Inspect search job" not work for you? ie. This search has completed and has returned 6 results by scanning 72 events in 0.149 seconds.