Hi,
Another regex problem I'm afraid.....
I've got a very long event with 37 fields where all the fields are quoted and separated by comma. Also there are no key=value
pairs.
For the most part my regex works nicely with the event data, but there are occasions where a quote also appears in the actual field data thereby breaking my regex separator character.
Working example (extremely simplified regex and event):
^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"]+)","(?P<response>[^\n]+)"$
Data:
"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/lpbm/select?fl=city","Logging rate limit reached"
No problem with this, all the fields parse out OK. However, this next event fails - note the additional "
in fourth field:-
"192.0.0.20","80","fl=city,name,code,group=true&group.field=city","GET /solr/"lpbm"/select?fl=city","Logging rate limit reached"
This now breaks the [^"]+)","
part of my regex and distorts the field extractions.
Is there a way to do the equivalent of:-
......","(?P<request>[^","]+)",".......
I know that this is invalid, but I don't know what the alternative looks like 😞 !!
Thanks for any help,
Mark.
Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the [^"]
syntax. The advantage is, that you can use the whole pattern ","
as seperator instead of just [^"]
. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:
^"(?P<dest_ip>.+?)","(?P<dest_port>.+?)","(?P<uri>.+?)","(?P<request>.+?)","(?P<response>[^\n]+)"$
What's the difference:
[^"]
syntax is "old school". The parser is consuming just everything until an "
is found."
but no ","
as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)/edit & just as info: a ?
makes an quantifier lazy (here: .+?
: "Consume lazy at least one character").
Try the following:
^"(?P<dest_ip>[^"]+)","(?P<dest_port>[^"]+)","(?P<uri>[^"]+)","(?P<request>[^"][^,]+)","(?P<response>[^\n]+)"$
You can test it here: https://regex101.com/r/nD3sL1/2
Your problem should be solvable by using non greedy (or lazy) quantifiers instead of the [^"]
syntax. The advantage is, that you can use the whole pattern ","
as seperator instead of just [^"]
. How ever, I'm not sure if the Splunk RegEx works as I expect to do, but try (something like) this:
^"(?P<dest_ip>.+?)","(?P<dest_port>.+?)","(?P<uri>.+?)","(?P<request>.+?)","(?P<response>[^\n]+)"$
What's the difference:
[^"]
syntax is "old school". The parser is consuming just everything until an "
is found."
but no ","
as the pattern would no longer match as a whole. (And it should be a little bit slower, again, in theory)/edit & just as info: a ?
makes an quantifier lazy (here: .+?
: "Consume lazy at least one character").