I want to anonymize user data (for example email adresses) at searchtime and tried a couple of ways.
I tried the rex command
rex mode=sed "s/(\w+.?\w+?)@mydoamin.net/xxxxx@mydomain.net/g"
which works but does not modify the raw event at search time. The result is, that if a user selects "show source" he can see the original mail address again. Also a defined field will show the original mail address.
The other problem is, that all the reports a boring because all our internal mail adresses will be replaced with xxxxx.
I'm looking for some way replace the username of the mail address with a hash code of the username. But could not find anything like this. I also saw in the splunkbase a solution to do a des or 3des encryption of a specific field (http://splunk-base.splunk.com/apps/22393/encrypt-and-decrypt-data-within-events) but this will not work in my environment because all events came in from forwarders or by syslog and on the forwarders I'm not allowed to install such functions because of performance issues.
In version 4.2 I found a new command mappy which allows to run short python scripts but looks like it does not support all python modules and options. I tried to use mappy and the python command re.sub but could not find any working "one line" command which will replace the string extracted by the rex with it's hash code.
Does anyone found a way to anonymize user data in splunk with hash codes or something like this.
... View more