I am indexing email data that Splunk reads from an inbox folder (via TA-mailclient). Those emails contain a csv file that comes as file attachment to the email.
Below is an example where the field name of the attachment is file_content and the field value is below:
Stopped by Reputation Filtering,Stopped as Invalid Recipients,Spam Detected,Virus Detected,Stopped by Content Filter,Total Threat Messages,Clean Messages,Total Attempted Messages
9.28068485506,0.0,45.1350500141,0.00114191624597,1.53311465023,55.9499914356,44.0500085644,--
251946,0,1225297,31,41620,1518894,1195841,2714735
I want to be able to manipulate the results to look like below:
Stopped by Reputation Filtering | Stopped as Invalid Recipients | Spam Detected | Virus Detected | Stopped by Content Filter | Total Threat Messages | Clean Messages | Total Attempted Messages |
9.280684855 | 0 | 45.13505001 | 0.001141916 | 1.53311465 | 55.94999144 | 44.05000856 | -- |
251946 | 0 | 1225297 | 31 | 41620 | 1518894 | 1195841 | 2714735 |
Can someone please advise how to achieve this ?
I could swear that I had seen the same question recently but cannot find it. I also do not know an elegant solution. (The ask is really to extract fields from a field of CSV.) So, here we go:
| eval file_content = split(file_content, "
") ``` chop field into lines ```
| eval header = mvindex(file_content, 0)
| eval file_content = mvindex(file_content, 1, -1) ``` get values ```
| mvexpand file_content ``` one value per line ```
| eval _raw = mvzip(split(header, ","), split(file_content, ","), "=") ``` pair headers with values ```
| kv pairdelim="
" kvdelim="="
| fields - _raw file_content header
The sample data will give
Clean Messages | Detected | Filter | Filtering | Messages | Recipients | Spam_Detected | Stopped as Invalid_Recipients | Stopped_by_Content_Filter | Stopped_by_Reputation_Filtering | Total_Attempted_Messages | Total_Threat_Messages | Virus_Detected |
44.0500085644 | 45.1350500141 | 1.53311465023 | 9.28068485506 | 55.9499914356 | 0.0 | 45.1350500141 | 0.0 | 1.53311465023 | 9.28068485506 | -- | 55.9499914356 | 0.00114191624597 |
1195841 | 1225297 | 41620 | 251946 | 1518894 | 0 | 1225297 | 0 | 41620 | 251946 | 2714735 | 1518894 | 31 |
@yuanliu Thanks for your prompt reply.
I am trying to understand your query.
When I fully ran your query, the file_content field and values entirely disappeared. Results were just empty lines.
Then I tried to execute query line by line.
| eval file_content = split(file_content, "") ``` chop field into lines ```
1st attempt with the above returned file_content values with each character as single value like,
s
t
o
p
p
e
d
After I added a "," between double quotes | eval file_content = split(file_content, ",") ``` chop field lines ```, it returned each cell value as single value
like
Stopped by Reputation Filtering
Stopped as Invalid Recipients
etc
still figuring out...
@dm1 As ITWhisperer explained, the split character is a literal newline, which you can enter in search window by holding down Ctrl (or Mac 'control') key while pressing "Enter" (or Mac 'return'). Similarly, pairdelim is also a literal newline.
The last line removes file_content just to show results more clearly. It is not part of the solution.
As @yuanliu showed, the split has to be by new line, you have removed this in your search
| eval file_content = split(file_content, "
") ``` chop field into lines ```
This assume that file_content is a single value with new lines delimiting the rows of the csv, not a multi-value field. If it is a multi-value field already, then the split is unnecessary.