Getting Data In

sourcetype "too_small" and log rotation on rsyslog server

grundsch
Communicator

I'm collecting all syslog messages from my datacenter on a central rsyslog server.
rsyslog splits the messages following the following directory structure:
/log/yyyy/mm/dd/host/service.log

service is extracted from the syslog message, grouping messages from the same daemon in one log file.
I have one monitor input, looking for the whole tree. The host is extracted from the path. The sourcetype is left as "automatic". The idea being that Splunk could analyse every log file, and finds out if it is a postfix/apache/snmp/cron, .... logfile.

It works quite well, but all sourcetypes are xxx-too_small

(i.e. postfix-too_small, snmpd-too-small, ...)

I'm suspecting that as we are starting a new logfile for every host, service and day, at midnight there will be only one or two events in a new file. Splunk sees this new file, tries to find out what it is, get it quite right, but tags the sourcetype with "too_small", as there are less than 100 events.

My questions:

  • how can I suppress this "too_small"?
  • how you guys with central syslog servers are handling such setup? (I suppose I'm not alone indexing central syslog server) Especially, how are you handling the creation of new log files (i.e new sources from a point of view of Splunk) with few events in it?

Many thanks in advance for any tips & tricks!

grundsch
Communicator

Couple of months later, I learned some more.

  • the above file split for the central syslog proved to be a disaster for splunk. Somehow, it generated thousands of sourcetypes (because syslog generated thousands of different service names). -> This lead Splunk indexes to be completely fubar (any single search just consumed all CPU)

  • Fresh start: we are now keeping standard syslog messages in a separate tree (for archiving purposes), and dumping everything else in one syslog file per host. These files are then regularly rotated, and after two rotation discarded (data is in Splunk, and in separate archive)

This looks now much better. Sourcetype is fixed to be syslog. Not as fun as automatic sourcetype detection, but hey, these are really syslog messages...

I've also just read the following blog entry: http://blogs.splunk.com/2010/02/11/sourcetypes-gone-wild/ which explain how I could now extract from this single stream of syslog different sourcetype per event. And probably reroute them to different indexes...

Question: how expensive is it to run regexp on every event during indexing?

Get Updates on the Splunk Community!

Stay Connected: Your Guide to May Tech Talks, Office Hours, and Webinars!

Take a look below to explore our upcoming Community Office Hours, Tech Talks, and Webinars this month. This ...

They're back! Join the SplunkTrust and MVP at .conf24

With our highly anticipated annual conference, .conf, comes the fez-wearers you can trust! The SplunkTrust, as ...

Enterprise Security Content Update (ESCU) | New Releases

Last month, the Splunk Threat Research Team had two releases of new security content via the Enterprise ...