Solved: Delta between timestamp within a transaction searc...

ndoshi · ‎07-02-2010

The transaction search command will automatically compute the duration from the first event to the last event within each grouping. I would like to create a new multivalue field which computes the difference in time between each event within a grouping. For example if I have

sourcetype=my_message |transaction req

I might get this as one transaction event:

Mon Jun 14 17:03:20.543 EDT 2010 req=43300 hi there
Mon Jun 14 17:03:20.743 EDT 2010 req=43300 Another message
Mon Jun 14 17:03:20.845 EDT 2010 req=43300 status=invalid_request

Now, I know the duration field for this would be set to 0.302 in this example. However, I would like time delta's between each subsequent event. In this example, I'd like a new multivalue field set to 0.200 and 0.102. If the number of events in the group had 4 entries, I'd have 3 time differences. If the number of events in the grouping had 5 entries, I'd have 4 time differences and so on. How can this be done?

Lowell · ‎07-02-2010

I was thinking that you could use the delta command to build a list of time differences between events before transaction, however I don't think this will work because delta is simply looking at each sequential events not sequential events for the same transaction necessarily.

Here is another thought. What if you simply make a copy of _time before transaction, that should give you a multi-valued field preserving all of your timestamps for the events within your transaction. Then you may be able to find a way to calculate the deltas from there.

You could try this and see how far it gets you:

sourcetype=my_message | eval timecopy=_time | transaction req mvlist="timecopy"

This should do the trick. It's pretty ugly and you lose tons a fields in the process (you'll have to probably do some tweaking).

sourcetype=my_message | eval timecopy=_time | transaction req mvlist=timecopy | eval txn_serial=mvindex(_serial,0) | eval txn_raw=_raw | fields + host source sourcetype timecopy txn_* duration | mvexpand timecopy | streamstats window=2 last(timecopy) as t1,first(timecopy) as t2 by txn_serial | eval timediff=t1-t2 | stats values(_time) as _time, values(txn_raw) as _raw, list(timediff) as timediff by txn_serial

Basic theory: We assign each transaction a unique number (based on the serial number of the first event within the transaction). Then we blow apart the events into multiple events based on the timecopy multivalued field. (It's possible that mvexpand squashes duplicate field values, so keep your eye out for that. It may or may not matter to your overall purpose.) Then we use the streamstats to grab the two side-by-side timestamp copies for a single transaction (that is know only associated together using the "txn_serial" value). An eval is used to get the the diff between the two side-by-side timestamps. (This part might not be quite right. I didn't have very good data to test this with.) Then we use a stats command to try to pull it all back together.

Unfortunately, due to the "stats" command (and probably a few others), Splunk displays the search output as a results table, when we would rather have it try to display it in "Event listing" mode, so you can't really see the raw event anymore (in other words, my attempt to save "_raw" and then restore it using values(txn_raw) as _raw does quite do the trick.)

NOTE: It's quite possible that wrong values are being returned in this query. It wouldn't surprise me if there are logical bugs that need to be squashed. I would reccomend that you rebuild this search step by step and confirm the data at even point.

Best of luck.

View solution in original post

carasso · ‎01-03-2011

Also see...

http://answers.splunk.com/questions/4381/can-splunk-help-me-further-analyze-refine-the-durations-of-...

gkanapathy · ‎07-02-2010

First of all, I would be interested to know why something thinks they need this data, i.e., what are they going to do once they have every single delta? The specific use cases might be much simpler and more efficient to solve than the general ones of getting every delta.

But try this:

sourcetype=my_message 
| eval etime=_time 
| transaction req 
| eval tr_id=mvindex(_serial,0) 
| mvexpand etime
| streamstats current=f global=f window=1 
    last(etime) as letime 
  by tr_id
| eval dtime=coalesce(etime-letime,0) 
| fields - letime,etime 
| eventstats 
    list(dtime) as dtime 
  by tr_id
| dedup tr_id
| fields - tr_id

Instead of adding (and removing) a field tr_id, you could use req in your specific example, but if you had transactions where the id field wasn't unique and unitary, you need something like tr_id.

gkanapathy · ‎07-03-2010

changed. old one worked, but this is probably more efficient.

gkanapathy · ‎07-02-2010

i didn't test with subseconds, so adjust the eval etime as necessary.

Lowell · ‎07-02-2010

I was thinking that you could use the delta command to build a list of time differences between events before transaction, however I don't think this will work because delta is simply looking at each sequential events not sequential events for the same transaction necessarily.

Here is another thought. What if you simply make a copy of _time before transaction, that should give you a multi-valued field preserving all of your timestamps for the events within your transaction. Then you may be able to find a way to calculate the deltas from there.

You could try this and see how far it gets you:

sourcetype=my_message | eval timecopy=_time | transaction req mvlist="timecopy"

This should do the trick. It's pretty ugly and you lose tons a fields in the process (you'll have to probably do some tweaking).

sourcetype=my_message | eval timecopy=_time | transaction req mvlist=timecopy | eval txn_serial=mvindex(_serial,0) | eval txn_raw=_raw | fields + host source sourcetype timecopy txn_* duration | mvexpand timecopy | streamstats window=2 last(timecopy) as t1,first(timecopy) as t2 by txn_serial | eval timediff=t1-t2 | stats values(_time) as _time, values(txn_raw) as _raw, list(timediff) as timediff by txn_serial

Basic theory: We assign each transaction a unique number (based on the serial number of the first event within the transaction). Then we blow apart the events into multiple events based on the timecopy multivalued field. (It's possible that mvexpand squashes duplicate field values, so keep your eye out for that. It may or may not matter to your overall purpose.) Then we use the streamstats to grab the two side-by-side timestamp copies for a single transaction (that is know only associated together using the "txn_serial" value). An eval is used to get the the diff between the two side-by-side timestamps. (This part might not be quite right. I didn't have very good data to test this with.) Then we use a stats command to try to pull it all back together.

Unfortunately, due to the "stats" command (and probably a few others), Splunk displays the search output as a results table, when we would rather have it try to display it in "Event listing" mode, so you can't really see the raw event anymore (in other words, my attempt to save "_raw" and then restore it using values(txn_raw) as _raw does quite do the trick.)

NOTE: It's quite possible that wrong values are being returned in this query. It wouldn't surprise me if there are logical bugs that need to be squashed. I would reccomend that you rebuild this search step by step and confirm the data at even point.

Best of luck.

Lowell · ‎07-02-2010

I added another more complete solution. Good luck.

Lowell · ‎07-02-2010

I tried it again and got the double-decimal thing to. I dunno very weird.

Lowell · ‎07-02-2010

I'm not sure how you got the subseconds value in there twice, I wasn't seeing them on a test search I ran, so I added it on. timecopy should always be either integer or decimal format, that's weird. I think splunk does some special behind the sceans voodoo with _time which may explain this... You probably want the subseconds, otherwise your difference will be 0 between all your events, at least in your example case. (BTW. "UTC" is a timezone, not a timestamp "format". I think the therm your looking for is "unix epoch" timestamp, which is based on the UTC timezone.)

ndoshi · ‎07-02-2010

The coalesce adds the subsecond again to the _time result making Splunk think it is not in UTC format (e.g., 1276549400.543.543), which will keep you doing any more math upon it. The best I could come up with is to find deltas between any times in the transaction manually:

That works for specific queries, but it won't automate the query where you want to find all "mydelta" within a transaction > .xxx.

Delta between timestamp within a transaction search

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk