Title may be a bit confusing, so here's an example of what I'm trying to achieve:
I want to convert a table that looks like this:
_time | user | action |
2022-01-01 10:00:00 | user_1 | login |
2022-01-01 10:00:10 | user_2 | login |
2022-01-01 11:30:20 | user_1 | logout |
2022-01-01 11:40:00 | user_1 | login |
2022-01-01 12:00:00 | user_1 | logout |
2022-01-01 12:01:00 | user_2 | logout |
Into this:
user | login_time | logout_time |
user_1 | 2022-01-01 10:00:00 | 2022-01-01 11:30:20 |
user_2 | 2022-01-01 10:00:10 | 2022-01-01 12:01:00 |
user_1 | 2022-01-01 11:40:00 | 2022-01-01 12:00:00 |
This looks like a good use case for transactions.
<base_search>
| transaction user startswith=(action="login") endswith=(action="logout")
| eval login_time=strftime(_time, "%Y-%m-%d %H:%M:%S")
| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")
| table user login_time logout_time
This looks like a good use case for transactions.
<base_search>
| transaction user startswith=(action="login") endswith=(action="logout")
| eval login_time=strftime(_time, "%Y-%m-%d %H:%M:%S")
| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")
| table user login_time logout_time
I would postulate that there is no good reason for "transaction" 😀 - although in this case, the example data is small,.
The general issue with transaction for long "lived' transactions is that if you do have a login period of 1 hour 30 minutes, then Splunk has to retain intervening data in memory until it finds the logout - if there is no logout, then you need to define a maximum span for a transaction and then handle open transactions, otherwise you can unpredictable results with large datasets.
Just in case, this is a simple example of a bigger picture, here is a different technique that will do the calculation. The first part sets up your example data, but the work starts at streamstats...
| makeresults
| eval _raw="_time user action
2022-01-01 10:00:00 user_1 login
2022-01-01 10:00:10 user_2 login
2022-01-01 11:30:20 user_1 logout
2022-01-01 11:40:00 user_1 login
2022-01-01 12:00:00 user_1 logout
2022-01-01 12:01:00 user_2 logout"
| multikv forceheader=1
| eval _time=strptime(time, "%F %T")
| table _time user action
``` -------- This is the logic to convert the table from here --------```
| streamstats window=2 global=f list(action) as actions min(_time) as min max(_time) as max by user
| where mvcount(mvdedup(actions))=2 AND mvindex(actions, 0)="login"
| eval duration=max-min
| eval login_time=strftime(min, "%F %T")
| eval logout_time=strftime(max, "%F %T")
| sort min
| table user login_time logout_time
This will handle missing logout (i.e. sequential logins)
@bowesmana, i was wondering how long it'll take until the transaction police reads this post. 😀
🚓🚓🚓🚓🚓🚓🚓🚓🚓
Hey, thanks for the answer! However, the line:
| eval logout_time=strftime(_time+duration, "%Y-%m-%d %H:%M:%S")
Doesn't work for my case as the logs do not include a duration field. Instead, there are 2 _time values, 1 for login time, and 1 for logout time. Do you know how I could turn these 2 values into separate columns - login_time and logout_time, as shown in the question?
The duration is a field that is automatically calculated by the transaction. By default, the transaction command will take the start and end time to calculate the duration in seconds. The _time value represents the begining of the transaction which is the user logon event.
Are you actually having issues?
Apologies, I just realized the duration field exists. Thank you!