Splunk newbie in search of advise. Here's the situation:
I have two sources that provide e-mail info: tag::host="es1" and source="/data/elog.txt". One source reports SMTP_RCPT_TO and the other reports MAIL_TO. (the values stored in each are all over the place, e.g. "foo user ", FOO@user.org, foo@smtp.user.org...)
I want to find all lines that match a set of users, e.g. "foo, bar, and baz" (including any permutation of the receiving domain like /.*user.org/i and any capitalization of username)
The simple search: tag::host="es1" OR source="/data/elog.txt" (foo OR bar OR baz) does the trick (although you get hits on other fields as well)
Now expand that list of users to 40 or 50 and I'm starting to look for a better way. inputlookups seem promising, but fail due to the myriad of ways the email agents stuff address data into splunk. It seems that lookups are exact match. I could create various permutations in the lookup csv but that would be brittle and tedious.
So masters of splunk-fu, are there other approaches you would recommend? Something obvious that I've overlooked?
If you don't care about the domain (as in, it's always going to be *user.org and you're just looking for jsmith), I would probably go the route of pulling out just the user addresses. Depending on the variance in your logs, you could either go generically:
YourSearch | rex field=_raw "(?<Username>\S*)@\S*"
or more specifically:
YourSearch | rex field=MAIL_TO "(?<Username>\S*)@" | rex field=SMTP_RCPT_TO "(?<Username>\S*)@"
You can also convert the username to lowercase:
YourSearch | rex field=MAIL_TO "(?<Username>\S*)@"
| rex field=SMTP_RCPT_TO "(?<Username>\S*)@"
| eval Username=lower(Username)
If you are concerned about grabbing other domains, and really only care about a particular domain, you could alter the regex:
YourSearch | rex field=MAIL_TO "(?<Username>\S*)@\S*user.org"
| rex field=SMTP_RCPT_TO "(?<Username>\S*)@\S*user.org"
| eval Username=lower(Username)
That doesn't get you 100% of the way there, as you'll still need a | search Username=foo OR Username=bar
at the end, but it should get you closer, certainly.
If you don't care about the domain (as in, it's always going to be *user.org and you're just looking for jsmith), I would probably go the route of pulling out just the user addresses. Depending on the variance in your logs, you could either go generically:
YourSearch | rex field=_raw "(?<Username>\S*)@\S*"
or more specifically:
YourSearch | rex field=MAIL_TO "(?<Username>\S*)@" | rex field=SMTP_RCPT_TO "(?<Username>\S*)@"
You can also convert the username to lowercase:
YourSearch | rex field=MAIL_TO "(?<Username>\S*)@"
| rex field=SMTP_RCPT_TO "(?<Username>\S*)@"
| eval Username=lower(Username)
If you are concerned about grabbing other domains, and really only care about a particular domain, you could alter the regex:
YourSearch | rex field=MAIL_TO "(?<Username>\S*)@\S*user.org"
| rex field=SMTP_RCPT_TO "(?<Username>\S*)@\S*user.org"
| eval Username=lower(Username)
That doesn't get you 100% of the way there, as you'll still need a | search Username=foo OR Username=bar
at the end, but it should get you closer, certainly.
Thanks, David. I will give this a try.
You could certainly combine this method with a lookup table where you "| lookup" after the manipulation of the user fields.
Todd,
Ultimately a lookup table would be the best mechanism for doing something like this. Unfortunately, partial result matching is not possible with out of the box csv files. There are alternatives however...including custom python.
See also: