I have the following output:
DEV#: 0 DEVICE NAME: vpath0 TYPE: 2107900 POLICY: Optimized
SERIAL: 123bac
=======================================================================
Path# Adapter/Hard Disk State Mode Select Errors
0 fscsi0/hidsk22 Open NORMAL 123456 0
1 fscsi0/hidsk29 Open NORMAL 456789 0
I would like to extract four fields from this: The "path" numbers (in this case 0 and 1). Fields should be named path0 and path1. The "select" values (in this case 123456 and 456789). Fields should be named select0 and select1.
I can't figure out how to get a regex to separate two lines and create the field extraction for me.
My ultimate goal is to be able to compare the two select fields against a common vpath (DEVICE NAME).
Is this possible?
Thanks!
You may have some luck with the multikv command.
You could do something like this:
sourcetype=your_source_type | rex "^DEV#:\s+(?<dev_no>\d+)\s+DEVICE NAME:\s+(?<device_name>\S+)\s+TYPE:\s+(?<type>\d+)\s+POLICY:\s+(?<policy>\S+)" | rex "SERIAL:\s+(?<serial>\S+)" | multikv | stats list(Select), list(Disk), sum(Errors) by device_name, serial
The stats
operation is pretty bogus at this point; it's mostly just demoing which fields you have after the multikv
command.
Some of the column names are less than ideal, but you can always rename them if you really need to.
Basically the multikv
search command looks for a header line (the 4th line in your example) and then it looks for fixed-width rows beneath that. In you case you have two rows of data, and each row will be transformed into it's own event. (This is why it's important to extract the top-level fields (like dev_no
, device_name
, serial
...) prior to use the multikv
command. Because after you call multikv
everything but the individual "row" is removed from your raw event. But all the fields are kept.)
So if you look at the fields that exist after the multikv
command, you'll see that the "Path#" column gets named "Path_" (because "#" is not valid in afield name, so it's replaced with a "_"). In the case of the next column, it's called just "Disk" (looks like it is just dropping off the "Adapter/Hard" portion prefix for whatever reason, due to spaces I guess--like I said, it's kind of kludgey command). The remaining columns (State, Mode, Select, and Errors) are all very straight forward to see and the fields are named exactly as the column names appear in the text.
So essentially you are now looking at multiple events. So instead of having "path0" and "path1" as you originally talked about, you will now have a single field called "Path_" and the first event will have the value "0" and the second will have the value "1". So how you combine these back together will be completely determined by what you are trying do with your data. You can recombine your events using stats
or transaction
, but without a specific example of how you would like to interact with your fields, it's hard to give a usable example. If you never want to be able to deal with your fields individually like this, then perhaps the mulit-line regex approach is the best for you.
If you're still struggling with figuring out how all of this works. You may find it helpful to recreate the search I've shown above one search command at a time while looking one event at a time. (Sometimes just simplifying the problem into it's smallest parts will help you see what's going on.) If you're very new to splunk, then the whole thing can seem like voodoo (I've been there), I suggest just taking it one step at a time and eventually it will all make sense.
You may have some luck with the multikv command.
You could do something like this:
sourcetype=your_source_type | rex "^DEV#:\s+(?<dev_no>\d+)\s+DEVICE NAME:\s+(?<device_name>\S+)\s+TYPE:\s+(?<type>\d+)\s+POLICY:\s+(?<policy>\S+)" | rex "SERIAL:\s+(?<serial>\S+)" | multikv | stats list(Select), list(Disk), sum(Errors) by device_name, serial
The stats
operation is pretty bogus at this point; it's mostly just demoing which fields you have after the multikv
command.
Some of the column names are less than ideal, but you can always rename them if you really need to.
Basically the multikv
search command looks for a header line (the 4th line in your example) and then it looks for fixed-width rows beneath that. In you case you have two rows of data, and each row will be transformed into it's own event. (This is why it's important to extract the top-level fields (like dev_no
, device_name
, serial
...) prior to use the multikv
command. Because after you call multikv
everything but the individual "row" is removed from your raw event. But all the fields are kept.)
So if you look at the fields that exist after the multikv
command, you'll see that the "Path#" column gets named "Path_" (because "#" is not valid in afield name, so it's replaced with a "_"). In the case of the next column, it's called just "Disk" (looks like it is just dropping off the "Adapter/Hard" portion prefix for whatever reason, due to spaces I guess--like I said, it's kind of kludgey command). The remaining columns (State, Mode, Select, and Errors) are all very straight forward to see and the fields are named exactly as the column names appear in the text.
So essentially you are now looking at multiple events. So instead of having "path0" and "path1" as you originally talked about, you will now have a single field called "Path_" and the first event will have the value "0" and the second will have the value "1". So how you combine these back together will be completely determined by what you are trying do with your data. You can recombine your events using stats
or transaction
, but without a specific example of how you would like to interact with your fields, it's hard to give a usable example. If you never want to be able to deal with your fields individually like this, then perhaps the mulit-line regex approach is the best for you.
If you're still struggling with figuring out how all of this works. You may find it helpful to recreate the search I've shown above one search command at a time while looking one event at a time. (Sometimes just simplifying the problem into it's smallest parts will help you see what's going on.) If you're very new to splunk, then the whole thing can seem like voodoo (I've been there), I suggest just taking it one step at a time and eventually it will all make sense.
Yeah, that sounds doable. Best of luck! Glad I could help.
I get it now. 🙂 I really appreciate you taking the time to explain it. My next goal is to take the two "select" values and determine the difference between them. If the difference is greater than 20%, I need an alert. Sounds reasonable, right? I'm going to try to tackle that one on my own for now. Sounds like a good challenge. 🙂
I've added some additional explanation; hopefully this will help.
Okay, I'll just be brutally honest here: Your example worked great. But I don't understand WHY it worked. 🙂
I mean it extracted the Select field... But I don't see how/where you extracted it in your code. I'll have to look at it more closely.
One more question: using that, how I pick out/separate the two different "selects"? Suppose I want to take the difference of the two values or something like that... how are they called?
Thank you so much, you both have been great. 🙂
Yeah, 'multikv' can be intimating at first. I generally stay away from it as much as possible myself, but there are times where it is the most direct option; and your given example is the classic use case that multikv
. I've updated my answer to include an example search, hope it helps.
I looked at multikv, but even after reading the document I don't understand how to apply it in this case. From what I read, it seems to assume the fields have already been defined.
Try this regex:
\s+(?P<path0>[\d]+)\s+\S+\s+\S+\s+\S+\s+(?P<select0>[\d]+)\s+[\d]+\n\s+(?P<path1>[\d]+)\s+\S+\s+\S+\s+\S+\s+(?P<select1>[\d]+)\s+[\d]+
Note, I'm assuming that there's not going to be more then 2 paths specified..
We could also capture more information if necessary...
Brian
Thank you for the response. Strange that it works on your end but not mine. I tried a few variants too with no success, including just extracting the fields from just one of the two lines. Still doesn't work... strange.
Hrrm.. It worked on my machine. Granted, I'm using the same test files I did for your other answer.
I am treating the output as one big string, using the \n (newline) as a delimiter for the lines.
Lowell's answer is more elegant then my regex happy self.
Hmm... Didn't seem to work. It didn't error, but I don't see the fields extracted.
So are you basically treating the last two lines of the output as one big string?
(And, no, there will never be more than 2 paths specified. Not unless we redo our entire SAN configuration. :-))