Solved: Re: How to combine rows in groups of 2, grouped by...

JJ_Yam · ‎10-25-2022

Title may be a bit confusing, so here's an example of what I'm trying to achieve:

I want to convert a table that looks like this:

_time	user	action
2022-01-01 10:00:00	user_1	login
2022-01-01 10:00:10	user_2	login
2022-01-01 11:30:20	user_1	logout
2022-01-01 11:40:00	user_1	login
2022-01-01 12:00:00	user_1	logout
2022-01-01 12:01:00	user_2	logout

Into this:

user	login_time	logout_time
user_1	2022-01-01 10:00:00	2022-01-01 11:30:20
user_2	2022-01-01 10:00:10	2022-01-01 12:01:00
user_1	2022-01-01 11:40:00	2022-01-01 12:00:00

johnhuang · ‎10-25-2022

This looks like a good use case for transactions.

<base_search>
| transaction user startswith=(action="login") endswith=(action="logout")
| eval login_time=strftime(_time, "%Y-%m-%d %H:%M:%S") 
| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")
| table user login_time logout_time

View solution in original post

johnhuang · ‎10-25-2022

This looks like a good use case for transactions.

<base_search>
| transaction user startswith=(action="login") endswith=(action="logout")
| eval login_time=strftime(_time, "%Y-%m-%d %H:%M:%S") 
| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")
| table user login_time logout_time

bowesmana · ‎10-25-2022

I would postulate that there is no good reason for "transaction" 😀 - although in this case, the example data is small,.

The general issue with transaction for long "lived' transactions is that if you do have a login period of 1 hour 30 minutes, then Splunk has to retain intervening data in memory until it finds the logout - if there is no logout, then you need to define a maximum span for a transaction and then handle open transactions, otherwise you can unpredictable results with large datasets.

Just in case, this is a simple example of a bigger picture, here is a different technique that will do the calculation. The first part sets up your example data, but the work starts at streamstats...

| makeresults
| eval _raw="_time	user	action
2022-01-01 10:00:00	user_1	login
2022-01-01 10:00:10	user_2	login
2022-01-01 11:30:20	user_1	logout
2022-01-01 11:40:00	user_1	login
2022-01-01 12:00:00	user_1	logout
2022-01-01 12:01:00	user_2	logout"
| multikv forceheader=1
| eval _time=strptime(time, "%F %T")
| table _time user action
``` -------- This is the logic to convert the table from here --------```
| streamstats window=2 global=f list(action) as actions min(_time) as min max(_time) as max by user
| where mvcount(mvdedup(actions))=2 AND mvindex(actions, 0)="login"
| eval duration=max-min
| eval login_time=strftime(min, "%F %T")
| eval logout_time=strftime(max, "%F %T")
| sort min
| table user login_time logout_time

This will handle missing logout (i.e. sequential logins)

johnhuang · ‎10-26-2022

@bowesmana, i was wondering how long it'll take until the transaction police reads this post. 😀

bowesmana · ‎10-26-2022

🚓🚓🚓🚓🚓🚓🚓🚓🚓

JJ_Yam · ‎10-25-2022

Hey, thanks for the answer! However, the line:

| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")

Doesn't work for my case as the logs do not include a duration field. Instead, there are 2 _time values, 1 for login time, and 1 for logout time. Do you know how I could turn these 2 values into separate columns - login_time and logout_time, as shown in the question?

johnhuang · ‎10-25-2022

The duration is a field that is automatically calculated by the transaction. By default, the transaction command will take the start and end time to calculate the duration in seconds. The _time value represents the begining of the transaction which is the user logon event.

Are you actually having issues?

JJ_Yam · ‎10-25-2022

Apologies, I just realized the duration field exists. Thank you!

How to combine rows in groups of 2, grouped by nearest time?

join

stats

table

Detecting Remote Code Executions With the Splunk Threat Research Team

Enter the Splunk Community Dashboard Challenge for Your Chance to Win!

.conf24 | Session Scheduler is Live!!