I'm using python SDK to query splunk.
below are how data looks like:
I'm running query from as following on web, _raw was displayed correctly.
index=vgw "Session 25907" source="20130315.log" "end reason"|table _raw
result:
2013-03-15 08:42:41 : Session 25907 VGWSession:: end reason: ep disconnect
however same query running from python SDK (I'm following example for "oneshot search" and "normal search" at http://dev.splunk.com/view/SP-CAAAEE5#oneshotjob
I was running same query (without table), it returns:
OrderedDict([('_bkt', 'vgw~490~1EF8E9B1-5238-48F9-8B5A-2B768B4DB0E8'), ('_cd', '490:29401397'), ('_indextime', '1363362162'), ('_raw', '2013-03-15 08:42:41 : '), ('_serial', '0'), ('_si', ['splunk4', 'vgw']), ('_sourcetype', 'vgw'), ('_time', '2013-03-15T08:42:41.000-07:00'), ('host', 'vgw5'), ('index', 'vgw'), ('linecount', '1'), ('source', '20130315.log'), ('sourcetype', 'vgw'), ('splunk_server', 'splunk4')])
the _raw filed did't have everything, it's only part of it.
anyone know why? or experience same? how to fix it?
This is caused by the insertion of special tags in the event raw data to highlight matched search terms in Splunkweb. This is not an appropriate default behavior for the SDK result-fetching method and there is currently a bug opened to fix this (internal reference DVPL-1519).
Fortunately, avoiding this problem is fairly trivial: One simply needs to pass segmentation='none'
as an argument to the job.results()
method. Here's a code example that will fetch the 5 most recent events from the _internal
index matching the term "queue" but won't truncate the _raw
field right where the matched term appears:
#!/usr/bin/python
import splunklib.client as client
import splunklib.results as results
service = client.connect(username='admin',password='b33rm3')
kwargs_blocking = { "field_list": "_raw", "earliest_time": "-461", "exec_mode": "blocking", "max_count": "80" }
query = "search index=_internal source=*/metrics.log group=queue queue | head 5"
job = service.jobs.create(query, **kwargs_blocking)
rr = results.ResultsReader(job.results(segmentation='none'))
for result in rr:
print result
The important part here is job.results(segmentation='none')
.
Which version of Splunk are you running this code against? The 'segmentation' argument for the /services/search/jobs/{sid}/events
endpoint is only available on Splunk 5.0 and onwards.
added segmentation='none' still shows the same.
below are my code for testing it:
#!/opt/python/bin/python
import sys, ConfigParser
from splunklib.binding import HTTPError
import splunklib.client as client
import splunklib.results as results
def getconf():
configuration = {}
config = ConfigParser.RawConfigParser()
config.read('splunk.cfg')
for session in config.sections():
if not configuration.has_key(session):
configuration[session]={}
for options in config.options(session):
if not configuration[session].has_key(options):
configuration[session][options]=config.get(session, options)
return configuration['splunk']
def format_time(timestamp):
"""
splunk format: 2013-03-18T12:00:00.000-00:00
"""
datetime=timestamp.split('/')
timestamp={'earliest_time' : '%s-%s-%sT00:00:00.000-00:00' %(datetime[2], datetime[0], datetime[1]),
'latest_time' : '%s-%s-%sT23:59:59.999-00:00' %(datetime[2], datetime[0], int(datetime[1])+1)}
return timestamp
def search(configuration, query, timestamp):
configuration.update(timestamp)
service = client.connect(**configuration)
try:
job=service.jobs.create(query, **configuration)
except HTTPError, e:
print ("query '%s' is invalid:\n\t%s" %(search, e.message))
return
rr = results.ResultsReader(job.results(segmentation='none'))
return rr
if __name__ == "__main__":
sys.argv.append('42248278')
sys.argv.append('3/15/2013')
if len(sys.argv)<3:
print "%s callid, datetime (1/1/2013)" %sys.argv[0]
sys.exit()
query="""search index=vgw "Session 25907" source="/opt/ec/vgw/logs/vgw_g2m_live_vgw5.sjc.expertcity.com_2-20130315.log" "end reason" """
result = search(getconf(), query, format_time(sys.argv[2]))
for item in result:
print item['_raw']
query="""search index=vgw "Session 25907" source="/opt/ec/vgw/logs/vgw_g2m_live_vgw5.sjc.expertcity.com_2-20130315.log" "end reason" | rex field=_raw "reason(?<reason>.*)"|table * """
result = search(getconf(), query, format_time(sys.argv[2]))
print '*'*80
for item in result:
print item['reason']
I replace original "service.jobs.oneshot" to "service.job.create" and added segmentation='none'
the result still shows same. the 2nd query is the "fix" I use now to get data out. I regex and create a new field to get the data.
here is the output:
[root@asg1-mpostgres splunk]# ./xx.py
2013-03-15 08:42:41 :
********************************************************************************
: ep disconnect
[root@asg1-mpostgres splunk]#
the 1st result is _raw data (with segmentation='none')
the 2nd line is by using regex to fix it.