Good morning Splunk Community
I'm currently working on a way on how to use splunk so that it can show the most popular words used in a series of Emails listed in a CSV file. The file has three main columns subject, description, and topic. What I'm doing is clasifying each Email with a Topic i see fitted for the content it has in the subject and description. Now that I have several "Manual" classifications I want to use splunk so that it can let me know the set of words with more popularity by Topic, excluding Pronouns, and prepositions, or any word I see is not an important word for that topic.
I was able to find this content https://answers.splunk.com/answers/62413/how-to-extract-most-popular-words-from-the-source-data.html?sort=newest in the community, but it reaches as far as listing the words and counting them, but the problem still recides on counting words like "the, work, call" that I do not need, so I started to do it manually basically right clicking and selecting "Exclude from search". This basically resolves in some part, but we are talking about 9000 words, it will take for ever. I then did another approach and use the "*" so that a word and a set of wordlike words are also exluded, but it is not going down as I would imagine.
My idea to resolve this, is:
Use the Search Filter in the content posted in this article.
Create a Table, List, load an additional CSV file with this words I dont want (What ever is best) in Splunk.
Do a type of operation ((SearchFilter) - (Table OR List OR File) = (Result)) ( (A, B, C) - (C) = (A, B))
My question is, How can I create a table, file or use a loaded CSV file to remove the words I dont want from the result it shows the filter in the article?
Search query in article:
source=mybook | sort -_time | rex mode=sed "s/(.|,|;|=|\"|'|(|)|[|]| -|!|^-)/ /g" | eval word=_raw | makemv delim=" " word | mvexpand word | eval word=lower(word) | eval position=1 | streamstats sum(position) AS position | table position word | stats count min(position) max(position) by word
Best regards and I hope there is an answer to this question.
Thank you
... View more