I am using the Google Analytics Data Export API to pull some data down into a log file so it can be indexed by splunk. Data is printed out in the following format:
ga:eventCategory=ViewChange | ga:eventAction=Grid ga:totalEvents=6 ga:uniqueEvents=2
ga:eventCategory=SessionEvent | ga:eventAction=UserLogin ga:totalEvents=13 ga:uniqueEvents=9
My search for counting logins is:
source="googleanalytics.txt" "ga:eventCategory=SessionEvent | ga:eventAction=UserLogin" | extract kvdelim="=" | timechart span=1d sum(totalEvents) as "Total Logins", sum(uniqueEvents) as "Unique Logins"
The problem is that the search is taking the first occurrence of ga:totalEvents, regardless of if it is a UserLogin event or not.
Edit: To be more clear, for the above example the timechart displays 6 total, 2 unique logins instead of the expected 13 total, 9 unique. The pipe inside the quotes is read as a search character, but I have removed it just to make sure, am seeing the same result when just searching for "ga:eventAction=UserLogin"
I ended up using a regex after spending way too much time messing with sourcetypes and props.conf, this is my final search:
source="googleanalytics.txt" "ga:eventCategory=SessionEvent | ga:eventAction=UserLogin" | rex field=_raw "ga:eventAction=UserLogin[\s]ga:totalEvents=(?
The eval _time statement is because I haven't gotten splunk to pick up the timestamp in the log file properly, instead it timestamps the date when the script is run. GA data isnt guaranteed accurate until 48 hours later so the script pulls from the 24 period starting 3 days ago.
I ended up using a regex after spending way too much time messing with sourcetypes and props.conf, this is my final search:
source="googleanalytics.txt" "ga:eventCategory=SessionEvent | ga:eventAction=UserLogin" | rex field=_raw "ga:eventAction=UserLogin[\s]ga:totalEvents=(?
The eval _time statement is because I haven't gotten splunk to pick up the timestamp in the log file properly, instead it timestamps the date when the script is run. GA data isnt guaranteed accurate until 48 hours later so the script pulls from the 24 period starting 3 days ago.
I've run this in the search window and it does work, because it is in quotes splunk must recognize that it is a literal string and not a pipe
Why don't you try this instead:
source="googleanalytics.txt" (ga:eventCategory="SessionEvent" OR ga:eventAction="UserLogin") | extract kvdelim="=" | timechart span=1d sum(totalEvents) as "Total Logins", sum(uniqueEvents) as "Unique Logins"
I may switch to this syntax since it is more clear and doesn't use the pipe, but this doesn't fix my issue.
You must have mispasted your search - the "|ga:eventAction" would be a syntax error as Splunk would try to interpret that as a search command. Please check.