|
Hi, I'm trying to parse some logs generated by Broadsoft SIP servers. The log formats follow a general pattern but the detail can vary from event to event and field meanings can be context-sensitive. The events are multiline broken by datetime string and the first portion is pipe-separated. The fields here can differ in number and meaning, and if I use DELIMS on the pipe character it works except for the last field which flows into the remainder of the event. The first thing I'd like to do is stop the delims at a defined point which seems to be a newline character. The following transform using "| or newline" doesn't work. If I make it "| or tab", it works better for the first line but also matches unwanted fields in the remainder of the event (many of which start with tab).
Event sample:
|
|
I think there are several options here as you seem to have variable number of varying fields in each event. One solution is to use a combination of props & transforms definitions to pull out major/high-level extractions on first pass and then pull out additional fields in second pass. You could have a props.conf like this to efficiently break events, extract timestamp, and call the field extraction pieces::
and a corresponding transforms.conf like this to first pull out static known fields (pass_one) and then pull out colon separated values (pass_two) and finally add additional passes against sipFields (extracted in pass_one) to handle anything else
Excellent, thanks. I came across a "2-phase" similar strategy in a question about FIX logs. Its a really useful way of working with ugly log formats. I can pull out other values with rex in the search command. You also resolved some other issues on linebreaking I was having.
(27 Jun '12, 04:12)
inglisn
|