I have a json event with an id which I want to anonymize. However, I have to be able to perform stats/count/grouping and other analytics on this id later. In short, I want to hide this id for the users but should be able to be used internally by Splunk. Is this possible?
My event looks something like this:
{"duration":0.33,"a":"login","i":"50050","d":"2055502349","c":"LIVE","@timestamp":"2020-05-22T01:59:59.601Z"}
I want to anonymize "d" id.
UPDATED:
props.conf
[anony_json]
INDEXED_EXTRACTIONS = json
KV_MODE = none
TRANSFORMS-anony = anony, anony_raw
TRUNCATE = 0
TIME_PREFIX = timestamp\":\"
SHOULD_LINEMERGE = false
transforms.conf
[anony]
INGEST_EVAL = d:=md5(d)
WRITE_META = true
[anony_raw]
REGEX = (?m)(.*\"d\":\s*\"\d{4})\d+\"(.*)
FORMAT = $1XXXXXX"$2
DEST_KEY =_raw
https://docs.splunk.com/Documentation/Splunk/latest/Data/Anonymizedata
https://docs.splunk.com/Documentation/Splunk/latest/Data/IngestEval
In my splunk(ver 8), this setting works.
I have a few mistakes. I fix them.
How about this?
Thank you for your answer.
d is single valued.
However, I cannot use this solution as I would not be able to perform commands like "|stats count by d" since the indexed value of d will be changed. I want d to be anonymized for all the users but splunk should be able to internally use it.
[anony]
INGEST_EVAL = d=md5(d)
WRITE_META = true
[anony_raw]
REGEX = (\"d\":\s*\")(\d{4})\d+\"
FORMAT = $1$2XXXXXX"
DEST_KEY = _raw
use hash
I exactly want this. I changed anony_raw so as to include data before and after. However, the hash is not applied. The script only adds XXX to d instead of calculating hash.
props.conf
INDEXED_EXTRACTION = json
KV_MODE = none
TRANSFORMS-anony = anony, anony_raw
Transforms.conf
[anony]
INGEST_EVAL = d=md5(d)
WRITE_META = true
[anony_raw]
REGEX = (?m)^(.*)(\"d\":\s*\")(\d{4})\d+\"(.*)
FORMAT = $1$2$3XXXXXX"$4
DEST_KEY =_raw
https://answers.splunk.com/answers/614339/transform-field-in-sha256-before-indexation.html suggests this cannot be done.
INGEST_EVAL = d=substr(d,5,10).substr(d,1,6).(d%2).(d%3)
How's this?
This does not work. anony_raw overrides anony so the end result is d: 2055XXXXXX. I want to use md5 so that I can still co-relate data-.
For props.conf even if I change order of the two properties the end result stays the same. Removing anony_raw makes no changes to the original information.
My answer is updated. please confirm.
Hi,
I have the same issue BUT little more complex.
This is an example of a json event return in splunk :
{ [-]
CodeSha256: 2+1ndsvhz23R2VD42
CodeSize: 1909
Description: None
Environment: { [-]
Variables: { [-]
CLUSTER_NAME: Cluster
ENVIRONMENT: dev
USER_NAME: tata
PASSWD: toto!
}
}
LastModified: 2019-12-05T10:58:05.308+0000
MemorySize: 128
RevisionId: f0d723sdf6-c000edfzf
Runtime: python3.6
Timeout: 180
TracingConfig: { [+]
}
Version: $LATEST
region: eu-east-1
}
The problem is that sensitive data appear in clear specifically in Environment>Variables
In this section, we have variables : we can not create a regex with specific key name because it always changes.
How can I mask all values in the Environment>Variables WITHOUT masking the key ?
Example of result I want :
{ [-]
CodeSha256: 2+1ndsvhz23R2VD42
CodeSize: 1909
Description: None
Environment: { [-]
Variables: { [-]
CLUSTER_NAME:
ENVIRONMENT:
USER_NAME:
PASSWD:
}
}
LastModified: 2019-12-05T10:58:05.308+0000
MemorySize: 128
RevisionId: f0d723sdf6-c000edfzf
Runtime: python3.6
Timeout: 180
TracingConfig: { [+]
}
Version: $LATEST
region: eu-east-1
}
Hello @AnujaJ
Though I haven't tried this yet, I think this can be achieved by forwarding the anonymized event at index-time to the intended customer index and forward a separate non-anonymized event on an admin-only index.
Caveat for this is it would double your license usage.
Please see link below if my answer is what you're aiming for:
https://answers.splunk.com/answers/690291/one-source-to-two-indexes.html
EDIT:
You can actually achieve the "one data source (anonymized and non-anonymized) to two indexes solution" without hitting a double license usage:
(check woodcock's answer on the link below)
https://answers.splunk.com/answers/567223/how-to-send-same-data-source-to-two-or-multiple-in-1.html
Hope it helps!
Since the actual data is only available to the admin, does it mean that only admin will create the dashboards while other users use customer index?