Splunk Search

How to combine rows in groups of 2, grouped by nearest time?

JJ_Yam
Explorer

Title may be a bit confusing, so here's an example of what I'm trying to achieve:

I want to convert a table that looks like this:

_timeuseraction
2022-01-01 10:00:00user_1login
2022-01-01 10:00:10user_2login
2022-01-01 11:30:20user_1logout
2022-01-01 11:40:00user_1login
2022-01-01 12:00:00user_1logout
2022-01-01 12:01:00user_2logout

 

Into this:

userlogin_timelogout_time
user_12022-01-01 10:00:002022-01-01 11:30:20
user_22022-01-01 10:00:102022-01-01 12:01:00
user_12022-01-01 11:40:002022-01-01 12:00:00

 

Labels (3)
0 Karma
1 Solution

johnhuang
Motivator

This looks like a good use case for transactions.

 

<base_search>
| transaction user startswith=(action="login") endswith=(action="logout")
| eval login_time=strftime(_time, "%Y-%m-%d %H:%M:%S") 
| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")
| table user login_time logout_time

View solution in original post

johnhuang
Motivator

This looks like a good use case for transactions.

 

<base_search>
| transaction user startswith=(action="login") endswith=(action="logout")
| eval login_time=strftime(_time, "%Y-%m-%d %H:%M:%S") 
| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")
| table user login_time logout_time

bowesmana
SplunkTrust
SplunkTrust

I would postulate that there is no good reason for "transaction" 😀 - although in this case, the example data is small,.

The general issue with transaction for long "lived' transactions is that if you do have a login period of 1 hour 30 minutes, then Splunk has to retain intervening data in memory until it finds the logout - if there is no logout, then you need to define a maximum span for a transaction and then handle open transactions, otherwise you can unpredictable results with large datasets.

Just in case, this is a simple example of a bigger picture, here is a different technique that will do the calculation. The first part sets up your example data, but the work starts at streamstats...

| makeresults
| eval _raw="_time	user	action
2022-01-01 10:00:00	user_1	login
2022-01-01 10:00:10	user_2	login
2022-01-01 11:30:20	user_1	logout
2022-01-01 11:40:00	user_1	login
2022-01-01 12:00:00	user_1	logout
2022-01-01 12:01:00	user_2	logout"
| multikv forceheader=1
| eval _time=strptime(time, "%F %T")
| table _time user action
``` -------- This is the logic to convert the table from here --------```
| streamstats window=2 global=f list(action) as actions min(_time) as min max(_time) as max by user
| where mvcount(mvdedup(actions))=2 AND mvindex(actions, 0)="login"
| eval duration=max-min
| eval login_time=strftime(min, "%F %T")
| eval logout_time=strftime(max, "%F %T")
| sort min
| table user login_time logout_time

 This will handle missing logout (i.e. sequential logins)

 

johnhuang
Motivator

@bowesmana, i was wondering how long it'll take until the transaction police reads this post. 😀

0 Karma

bowesmana
SplunkTrust
SplunkTrust

🚓🚓🚓🚓🚓🚓🚓🚓🚓

JJ_Yam
Explorer

Hey, thanks for the answer! However, the line:

| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")

Doesn't work for my case as the logs do not include a duration field. Instead, there are 2 _time values, 1 for login time, and 1 for logout time. Do you know how I could turn these 2 values into separate columns - login_time and logout_time, as shown in the question?

0 Karma

johnhuang
Motivator

The duration is a field that is automatically calculated by the transaction. By default, the transaction command will take the start and end time to calculate the duration in seconds. The _time value represents the begining of the transaction which is the user logon event.

Are you actually having issues?

0 Karma

JJ_Yam
Explorer

Apologies, I just realized the duration field exists. Thank you!

0 Karma
Get Updates on the Splunk Community!

Detecting Remote Code Executions With the Splunk Threat Research Team

WATCH NOWRemote code execution (RCE) vulnerabilities pose a significant risk to organizations. If exploited, ...

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

The Splunk Community Dashboard Challenge is underway! This is your chance to showcase your skills in creating ...

.conf24 | Session Scheduler is Live!!

.conf24 is happening June 11 - 14 in Las Vegas, and we are thrilled to announce that the conference catalog ...