I'm having trouble crafting a regex that would pull key=value pairs where the pairs are separated by a character sequence, "+++" for example. I'd like to use a sequence because its way less likely to show up in my log events than a single character delimiter. Each log event is single line. here is a sample:
ts=2011-11-16T21:41:21Z+++aid=1167949209+++ip=1.1.1.2+++g=ir4205+++id=http://xyz.com/not/Import/10+++t="The Best ++Pt. 1 Book:Roland"+++sz=13880020+++pl=X_YZ+++pc=1+++pvid=1001+++rg=jfkdjd+++pid=asdf_1234+++rs=720+++rt=2
i've been testing with a transforms.conf entry like:
REGEX = (\w+)=([^\+]+(?!\+{3}))
FORMAT = $1::$2
but this regex leaves off the last character of every value
I am essentially trying to get around the same issue listed here- which is I started with DELIMS but I can't guarantee that my delimiter won't appear in my log entry and there doesn't appear to be a way to escape the delimiter.
http://splunk-base.splunk.com/answers/3231/escaping-characters-in-an-event
I tweaked Ayn's regex a little bit and now it captures the last key=value pair
(\w+)=(.+?)(?:\+{3}|$)
here's another way to do it:
(\w+)=(.+?(?:(?=\+{3})|$))
How about
(\w+)=(.+?)\+{3}
?
hee hee we got to the same place
Ah, forgot about that case. Well just have the regex match either +++ or end-of-line ($):
(\w+)=(.+?)(?:\+{3}|$)
actually Ayn your regex leaves off the last key=value pair because that last pair is not followed by the delimiter sequence. it is a good solution if i get my dev's to end every log event with the delimiter sequence though.
nice that works! This also works: (\w+)=(.+?(?:(?=+{3})|$))
not sure which is less resource intensive
hmm. i tried making it a code block but it still looks pretty nasty. that whole block should be a single line
I started to clean up your question and put the event data into a code block so it formats easier to read and you don't have to escape any characters. But, I was afraid I'd mess up the context of your fairly complex event data. You might want to re-paste it and put it in a code block so that folks can decipher it more easily.
note: i purposefully put a couple of "+"'s in the value for key "t" to make sure the regex ignores them (since they aren't 3 consecutive "+"'s)