Splunk Enterprise

Splunk DB Connect timeout - Unable to read next record

edoardo_vicendo
Contributor

Hello,

We had this error on an output query set-up on Splunk DB Connect.

Basically the Splunk query is inserting data into an external database.

 

 

2023-11-08 01:58:32.712 +0100  [QuartzScheduler_Worker-9] ERROR org.easybatch.core.job.BatchJob - Unable to read next record
java.lang.RuntimeException: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[836463,5]
Message: Premature EOF
                at com.splunk.ResultsReaderXml.getNextEventInCurrentSet(ResultsReaderXml.java:128)
                at com.splunk.ResultsReader.getNextElement(ResultsReader.java:87)
                at com.splunk.ResultsReader.getNextEvent(ResultsReader.java:64)
                at com.splunk.dbx.server.dboutput.recordreader.DbOutputRecordReader.readRecord(DbOutputRecordReader.java:82)
                at org.easybatch.core.job.BatchJob.readRecord(BatchJob.java:189)
                at org.easybatch.core.job.BatchJob.readAndProcessBatch(BatchJob.java:171)
                at org.easybatch.core.job.BatchJob.call(BatchJob.java:101)
                at org.easybatch.extensions.quartz.Job.execute(Job.java:59)
                at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
                at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[836463,5]
Message: Premature EOF
                at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:599)
                at com.sun.xml.internal.stream.XMLEventReaderImpl.nextEvent(XMLEventReaderImpl.java:83)
                at com.splunk.ResultsReaderXml.getResultKVPairs(ResultsReaderXml.java:306)
                at com.splunk.ResultsReaderXml.getNextEventInCurrentSet(ResultsReaderXml.java:124)
                ... 9 common frames omitted

 

 

The issue was related to a query timeout. We have set-up the upsert_id in the Splunk DB Connect output configuration so that Splunk can go in insert_update.

Looking into _internal log we understood that Splunk, when using the upsert_id, performs a select query for each record it has to insert and then commits every 1000 records (by default):

 

2023-11-10 01:22:28.215 +0100  [QuartzScheduler_Worker-12] INFO  com.splunk.dbx.connector.logger.AuditLogger - operation=dboutput connection_name=SPLUNK_CONN stanza_name=SPLUNK_OUTPUT state=success sql='SELECT FIELD01,FIELD02,FIELD03 FROM MYSCHEMA.MYTABLE WHERE UPSERT_ID=?'

 

 

 

2023-11-10 01:22:28.258 +0100  [QuartzScheduler_Worker-12] INFO  com.splunk.dbx.connector.logger.AuditLogger - operation=dboutput connection_name=SPLUNK_CONN stanza_name=SPLUNK_OUTPUT state=success sql='INSERT INTO MYSCHEMA.MYTABLE (FIELD01,FIELD02,FIELD03) values (?,?,?)'

 

 

Upsert_id is very useful to avoid an sql duplicate key error, and whenever you want to recover data in case the output is failing for some reason. You basically re-run the output query and if the record already exists it is replaced in the sql table.

But the side effect is that the WHERE condition of the SELECT statement can be very inefficient if the Database table start to be huge.

The solution is to create in the output Database table an SQL index on the upsert_id field.

 

The output run passed from 11 minutes to 11 seconds, avoiding to hit the timeout of the Splunk DB Connect (30 seconds by default, calculated for every commit).

 

Best Regards,

Edoardo

_JP
Contributor

Thanks for the tip!  Non-streaming  type pushes like this are often a challenge, and this is one way to manage the coupling of something that likes to be working in the real-time space (Splunk) versus  more of a batch space (the DB).

Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Built-in Service Level Objectives Management to Bridge the Gap Between Service & ...

Wednesday, May 29, 2024  |  11AM PST / 2PM ESTRegister now and join us to learn more about how you can ...

Get Your Exclusive Splunk Certified Cybersecurity Defense Engineer Certification at ...

We’re excited to announce a new Splunk certification exam being released at .conf24! If you’re headed to Vegas ...