Getting Data In

How to fix CSV ingestion random cutoffs?

yuanliu
SplunkTrust
SplunkTrust

I notice that CSV ingestion (from Splunk Web file upload) sometimes cuts off an event, possibly because one field is extra lengthy.  In one example, I see that Splunk only gets roughly 8,160 characters of a column that has 8,615 characters.  That field and any column after the column, are not extracted. (Those ~8100 characters remain in Splunk's raw event.)

When I took the same CSV file to a similarly configured instance, however, ingestion was successful for this event.  No missing fields.  Particularly surprising is that I have increased [kv]maxchars in the instance that had this trouble.  So, I suspect that if I ingest it again in the same instance, it may succeed as well.  In other words, this seems rather random. (Even without increasing maxchars, the length in this column is still smaller than default (10,240).)

 Instance 1 (dropped part of event)Instance 2 (event ingestion complete)
limits.conf [kv]

From local/limits.conf

indexed_kv_limit = 1000
maxchars = 40960

From default/limits.conf

indexed_kv_limit = 200
maxchars = 10240
RAM

16 GB

8 GB

What else should I check?  Both instances run Splunk Enterprise 9.1.1.  

Labels (2)
0 Karma

richgalloway
SplunkTrust
SplunkTrust

Please share the props.conf stanza for both instances.

Is it possible the dropped field has embedded newlines?

---
If this reply helps you, Karma would be appreciated.
0 Karma

yuanliu
SplunkTrust
SplunkTrust

The two instances use the same props.conf for this sourcetype.  Relevant entries would be

 

BREAK_ONLY_BEFORE_DATE = 
DATETIME_CONFIG = 
INDEXED_EXTRACTIONS = csv
KV_MODE = none
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = true
SHOULD_LINEMERGE = false
category = Structured
disabled = false
pulldown_type = true

 

Yes, the field in question nearly always have linebreaks, so are many other fields.  I never observed problem caused by newlines from these ingestions. (The CSV is properly formatted, with proper quotation marks, etc.)

Specifically, I noticed a number of events that are nearly identical except this field.  This field in other events in this group has length < 2,000; they are all ingested completely.

The troubled value looks like

goroutine profile: total 16
1 @ 0x4115c2 0x444fb6 0x4e13f2 0x45e5a1
# 0x444fb5 os/signal.signal_recv+0xa5 /usr/local/go/src/runtime/sigqueue.go:131
# 0x4e13f1 os/signal.loop+0x21  /usr/local/go/src/os/signal/signal_unix.go:22

1 @ 0x42f25c 0x42a36a 0x429967 0x494d6e 0x494ded 0x495b8a 0x5d3222 0x5e575d 0x6c1255 0x4ef23a 0x4f00ec 0x4f0354 0x65a810 0x65a61b 0x6bb479 0x6c255c 0x6c635e 0x45e5a1
# 0x429966 internal/poll.runtime_pollWait+0x56  /usr/local/go/src/runtime/netpoll.go:173
# 0x494d6d internal/poll.(*pollDesc).wait+0xad  /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x494dec internal/poll.(*pollDesc).waitRead+0x3c  /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x495b89 internal/poll.(*FD).Read+0x189   /usr/local/go/src/internal/poll/fd_unix.go:126
# 0x5d3221 net.(*netFD).Read+0x51    /usr/local/go/src/net/fd_unix.go:202
# 0x5e575c net.(*conn).Read+0x6c    /usr/local/go/src/net/net.go:176
# 0x6c1254 net/http.(*connReader).Read+0x104  /usr/local/go/src/net/http/server.go:753
# 0x4ef239 bufio.(*Reader).fill+0x119   /usr/local/go/src/bufio/bufio.go:97
# 0x4f00eb bufio.(*Reader).ReadSlice+0x2b   /usr/local/go/src/bufio/bufio.go:338
# 0x4f0353 bufio.(*Reader).ReadLine+0x33   /usr/local/go/src/bufio/bufio.go:367
# 0x65a80f net/textproto.(*Reader).readLineSlice+0x6f /usr/local/go/src/net/textproto/reader.go:55
# 0x65a61a net/textproto.(*Reader).ReadLine+0x2a  /usr/local/go/src/net/textproto/reader.go:36
# 0x6bb478 net/http.readRequest+0x98   /usr/local/go/src/net/http/request.go:925
# 0x6c255b net/http.(*conn).readRequest+0x17b  /usr/local/go/src/net/http/server.go:933
# 0x6c635d net/http.(*conn).serve+0x50d   /usr/local/go/src/net/http/server.go:1739

1 @ 0x42f25c 0x42a36a 0x429967 0x494d6e 0x494ded 0x497242 0x5d3c42 0x5ef9ee 0x5ee009 0x6ca702 0x6c95e3 0xaa2e92 0xaa2dcc 0x45e5a1
# 0x429966 internal/poll.runtime_pollWait+0x56     /usr/local/go/src/runtime/netpoll.go:173
# 0x494d6d internal/poll.(*pollDesc).wait+0xad     /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x494dec internal/poll.(*pollDesc).waitRead+0x3c     /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x497241 internal/poll.(*FD).Accept+0x1e1     /usr/local/go/src/internal/poll/fd_unix.go:335
# 0x5d3c41 net.(*netFD).accept+0x41      /usr/local/go/src/net/fd_unix.go:238
# 0x5ef9ed net.(*TCPListener).accept+0x2d      /usr/local/go/src/net/tcpsock_posix.go:136
# 0x5ee008 net.(*TCPListener).Accept+0x48      /usr/local/go/src/net/tcpsock.go:247
# 0x6ca701 net/http.(*Server).Serve+0x1b1      /usr/local/go/src/net/http/server.go:2695
# 0x6c95e2 net/http.Serve+0x72       /usr/local/go/src/net/http/server.go:2323
# 0xaa2e91 github.com/influxdata/influxdb/services/httpd.(*Service).serve+0x61 /go/src/github.com/influxdata/influxdb/services/httpd/service.go:226
# 0xaa2dcb github.com/influxdata/influxdb/services/httpd.(*Service).serveTCP+0x3b /go/src/github.com/influxdata/influxdb/services/httpd/service.go:214

1 @ 0x42f25c 0x42a36a 0x429967 0x494d6e 0x494ded 0x497242 0x5d3c42 0x5ef9ee 0x5ee009 0x8b6ec7 0x45e5a1
# 0x429966 internal/poll.runtime_pollWait+0x56   /usr/local/go/src/runtime/netpoll.go:173
# 0x494d6d internal/poll.(*pollDesc).wait+0xad   /usr/local/go/src/internal/poll/fd_poll_runtime.go:85
# 0x494dec internal/poll.(*pollDesc).waitRead+0x3c   /usr/local/go/src/internal/poll/fd_poll_runtime.go:90
# 0x497241 internal/poll.(*FD).Accept+0x1e1   /usr/local/go/src/internal/poll/fd_unix.go:335
# 0x5d3c41 net.(*netFD).accept+0x41    /usr/local/go/src/net/fd_unix.go:238
# 0x5ef9ed net.(*TCPListener).accept+0x2d    /usr/local/go/src/net/tcpsock_posix.go:136
# 0x5ee008 net.(*TCPListener).Accept+0x48    /usr/local/go/src/net/tcpsock.go:247
# 0x8b6ec6 github.com/influxdata/influxdb/tcp.(*Mux).Serve+0x96 /go/src/github.com/influxdata/influxdb/tcp/mux.go:75

1 @ 0x42f25c 0x42f34e 0x4064e4 0x40618b 0xc0f3be 0xc0ef7a 0x42eda6 0x45e5a1
# 0xc0f3bd main.(*Main).Run+0x38d /go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:90
# 0xc0ef79 main.main+0x169  /go/src/github.com/influxdata/influxdb/cmd/influxd/main.go:45
# 0x42eda5 runtime.main+0x225 /usr/local/go/src/runtime/proc.go:195

1 @ 0x42f25c 0x43f4e9 0x45b1f0 0x45e5a1
# 0x42f25b runtime.gopark+0x12b  /usr/local/go/src/runtime/proc.go:287
# 0x43f4e8 runtime.selectgo+0x1148  /usr/local/go/src/runtime/select.go:395
# 0x45b1ef runtime.ensureSigM.func1+0x21f /usr/local/go/src/runtime/signal_unix.go:511

1 @ 0x42f25c 0x43f4e9 0x8b7f9e 0x961b5b 0x45e5a1
# 0x8b7f9d github.com/influxdata/influxdb/tcp.(*listener).Accept+0x16d   /go/src/github.com/influxdata/influxdb/tcp/mux.go:236
# 0x961b5a github.com/influxdata/influxdb/services/snapshotter.(*Service).serve+0x7a /go/src/github.com/influxdata/influxdb/services/snapshotter/service.go:94

1 @ 0x42f25c 0x43f4e9 0x952cc4 0x45e5a1
# 0x952cc3 github.com/influxdata/influxdb/tsdb.(*Store).monitorShards+0x213 /go/src/github.com/influxdata/influxdb/tsdb/store.go:1690

1 @ 0x42f25c 0x43f4e9 0x9bf962 0x45e5a1
# 0x9bf961 github.com/influxdata/influxdb/services/continuous_querier.(*Service).backgroundLoop+0x1e1 /go/src/github.com/influxdata/influxdb/services/continuous_querier/service.go:215

1 @ 0x42f25c 0x43f4e9 0xab240c 0x45e5a1
# 0xab240b github.com/influxdata/influxdb/services/precreator.(*Service).runPrecreation+0x14b /go/src/github.com/influxdata/influxdb/services/precreator/service.go:76

1 @ 0x42f25c 0x43f4e9 0xab5266 0xab5567 0x45e5a1
# 0xab5265 github.com/influxdata/influxdb/services/retention.(*Service).run+0x1c85  /go/src/github.com/influxdata/influxdb/services/retention/service.go:78
# 0xab5566 github.com/influxdata/influxdb/services/retention.(*Service).Open.func1+0x56 /go/src/github.com/influxdata/influxdb/services/retention/service.go:51

1 @ 0x42f25c 0x43f4e9 0xbf0911 0xbf3f07 0x45e5a1
# 0xbf0910 github.com/influxdata/influxdb/services/subscriber.(*Service).waitForMetaUpdates+0x110 /go/src/github.com/influxdata/influxdb/services/subscriber/service.go:165
# 0xbf3f06 github.com/influxdata/influxdb/services/subscriber.(*Service).Open.func2+0x56  /go/src/github.com/influxdata/influxdb/services/subscriber/service.go:102

1 @ 0x42f25c 0x43f4e9 0xbf1ac4 0xbf3e87 0x45e5a1
# 0xbf1ac3 github.com/influxdata/influxdb/services/subscriber.(*Service).run+0x2f3  /go/src/github.com/influxdata/influxdb/services/subscriber/service.go:239
# 0xbf3e86 github.com/influxdata/influxdb/services/subscriber.(*Service).Open.func1+0x56 /go/src/github.com/influxdata/influxdb/services/subscriber/service.go:98

1 @ 0x42f25c 0x43f4e9 0xc0264f 0x45e5a1
# 0xc0264e github.com/influxdata/influxdb/cmd/influxd/run.(*Command).monitorServerErrors+0x1ce /go/src/github.com/influxdata/influxdb/cmd/influxd/run/command.go:165

1 @ 0x47f665 0x47d3b5 0x47b809 0x495b25 0x5d3222 0x5e575d 0x6c0d52 0x45e5a1
# 0x47f664 syscall.Syscall+0x4    /usr/local/go/src/syscall/asm_linux_amd64.s:18
# 0x47d3b4 syscall.read+0x54    /usr/local/go/src/syscall/zsyscall_linux_amd64.go:756
# 0x47b808 syscall.Read+0x48    /usr/local/go/src/syscall/syscall_unix.go:162
# 0x495b24 internal/poll.(*FD).Read+0x124   /usr/local/go/src/internal/poll/fd_unix.go:122
# 0x5d3221 net.(*netFD).Read+0x51    /usr/local/go/src/net/fd_unix.go:202
# 0x5e575c net.(*conn).Read+0x6c    /usr/local/go/src/net/net.go:176
# 0x6c0d51 net/http.(*connReader).backgroundRead+0x61 /usr/local/go/src/net/http/server.go:660

1 @ 0xa81362 0xa81160 0xa7dcbb 0xa89e08 0xa8a10b 0xa99d16 0xa8df7e 0x6ca374 0x6c656d 0x45e5a1
# 0xa81361 runtime/pprof.writeRuntimeProfile+0xa1      /usr/local/go/src/runtime/pprof/pprof.go:637
# 0xa8115f runtime/pprof.writeGoroutine+0x9f      /usr/local/go/src/runtime/pprof/pprof.go:599
# 0xa7dcba runtime/pprof.(*Profile).WriteTo+0x3aa      /usr/local/go/src/runtime/pprof/pprof.go:310
# 0xa89e07 net/http/pprof.handler.ServeHTTP+0x1b7      /usr/local/go/src/net/http/pprof/pprof.go:237
# 0xa8a10a net/http/pprof.Index+0x1da       /usr/local/go/src/net/http/pprof/pprof.go:248
# 0xa99d15 github.com/influxdata/influxdb/services/httpd.(*Handler).handleProfiles+0x95 /go/src/github.com/influxdata/influxdb/services/httpd/pprof.go:32
# 0xa8df7d github.com/influxdata/influxdb/services/httpd.(*Handler).ServeHTTP+0x30d /go/src/github.com/influxdata/influxdb/services/httpd/handler.go:309
# 0x6ca373 net/http.serverHandler.ServeHTTP+0xb3      /usr/local/go/src/net/http/server.go:2619
# 0x6c656c net/http.(*conn).serve+0x71c       /usr/local/go/src/net/http/server.go:1801

-CR-GET /debug/pprof/ HTTP/1.1

If I use "Show Source", line breaks are invisible.

In that nearly identical event group, this field would have values like

<html>
<head>
<title>/debug/pprof/</title>
<style>
.profile-name{
display:inline-block;
width:6rem;
}
</style>
</head>
<body>
/debug/pprof/ 
 
Types of profiles available:
<table>
<thead><td>Count</td><td>Profile</td></thead>

<tr>
<td>46547</td><td>full goroutine stack dump (goroutine?debug=2)
 
 
Profile Descriptions:
 

 <div class=profile-name>allocs:</div> A sampling of all past memory allocations 

 <div class=profile-name>block:</div> Stack traces that led to blocking on synchronization primitives 

 <div class=profile-name>cmdline:</div> The command line invocation of the current program 

 <div class=profile-name>goroutine:</div> Stack traces of all current goroutines 

 <div class=profile-name>heap:</div> A sampling of memory allocations of live objects. You can specify the gc GET parameter to run GC before taking the heap sample. 

 <div class=profile-name>mutex:</div> Stack traces of holders of contended mutexes 

 <div class=profile-name>profile:</div> CPU profile. You can specify the duration in the seconds GET parameter. After you get the profile file, use the go tool pprof command to investigate the profile. 

 <div class=profile-name>threadcreate:</div> Stack traces that led to the creation of new OS threads 

 <div class=profile-name>trace:</div> A trace of execution of the current program. You can specify the duration in the seconds GET parameter. After you get the trace file, use the go tool trace command to investigate the trace. 

 
</p>
</body>
</html>
-CR-GET /debug/pprof/ HTTP/1.1

 

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...

Updated Data Management and AWS GDI Inventory in Splunk Observability

We’re making some changes to Data Management and Infrastructure Inventory for AWS. The Data Management page, ...