Standard Deviation for Security Events

ericl42 · ‎11-18-2019

I've read a few forum posts on using standard deviations or MLTK to alert on large increases in events but unfortunately the solutions vary greatly.

For the sake of this post, I'm using the general logic that is incorporated within various Security Essentials searches that look for significant increases and tailoring that to a password spraying scenario where one host is failing to login to multiple user accounts.

index=os_logs (source=WinEventLog:Security OR sourcetype=linux_secure) action=failure 
| bucket _time span=1h 
| stats dc(user) as count by _time src_ip 
| eventstats max(_time) as maxtime 
| stats count as num_data_samples max(eval(if(_time >= relative_time(maxtime, "-1h@h"), 'count',null))) as "count" avg(eval(if(_time<relative_time(maxtime,"-1h@h"),'count',null))) as avg stdev(eval(if(_time<relative_time(maxtime,"-1h@h"),'count',null))) as stdev by "src_ip"
| eval lowerBound=(avg-stdev*2), upperBound=(avg+stdev*2)

For this search, everything works the way I expect it to until I get into the eventstats maxtime line and then start doing the math. For every scenario I've seen via Security Essentials, they are either looking back 30 days and doing 1 day buckets or in one scenario it's going back 7 days and doing a 70 min bucket which makes no sense to me.

I essentially just want to see the following:

2 hours ago, 192.168.1.10 failed logging into 3 different user accounts
1 hour ago, 192.168.1.10 failed logging into 10 different user accounts

Therefore my num_data_samples should be two since I'm only looking at a two hour span, my count would be 13 I believe since that's the amount of users (assuming it's 10 different users and not 3 overlapping), and then the lower and upperbound would be the 3 and 10 and I could alert off of a stdev > X.

When I only try to do this type of logic for a 2 hr. time period I never get the correct stdev or upper/lower bounds. I'm not very familiar with the maxtime vs. the relative_time of -1h@h so I assume something is wrong here. In my mind I would think my time range would only be -2 hours and then I would bucket those into one hr. blocks to compare against.

The closest post I found was here and it has some similar logic but I'd like to base mine off of Security Essentials if that is truly the best way to do this.

David · ‎11-18-2019

Hello!

The foundation of standard deviation-based detections is that you are pulling a large baseline, and in the case of these detections that you're looking at a per-entity baseline. E.g., comparing one src_ip to its own baseline, rather than any other src_ips in the organization. This is why you need a long baseline, because if you don't know what "normal" looks like you can't define what "abnormal" looks like.

In your case, you probably want to compare a src_ip to all src_ips rather than comparing a src_ip to its own baseline. In that case, I would instead use the "Sources Sending Many DNS Requests" as a format rather than the standard time series, as it also looks over the short term (grouping by the hour) and compares across multiple different hosts.

https://docs.splunksecurityessentials.com/content-detail/showcase_huge_volume_dns_requests/

Minor note: You're using the "demo data" version of the SSE queries rather than the live data version (the live data version doesn't use the eventstats). This minor, but swapping maxtime for now() and getting rid of that eventstats may make the search easier to follow. (Tried to build it out but having issues with the formatting here on SplunkBase. I can always email it to you.)

Standard Deviation for Security Events

Modern way of developing distributed application using OTel

Enterprise Security Content Update (ESCU) | New Releases

Archived Metrics Now Available for APAC and EMEA realms