Getting Data In

Why do variations in sourcetype appear?

lguinn2
Legend

I have an Splunk indexer that receives IIS input from several sources. Why is the sourcetype set to "iis.1" instead of "iis"?

How can I ensure that all the IIS input is labeled with the same sourcetype ("iis")?

I do not want my users to have to specify sourcetype="iis*" in their searches.

Tags (3)
1 Solution

jrodman
Splunk Employee
Splunk Employee

It's certainly true that if splunk encounters 'someapp.log' without configuration, likely to create a new sourcetype called 'someapp'. Later, as the file rolls, splunk may not be able to correctly guess that the new rolled files are the same, and create a new sourcetype 'someapp-1', and then 2 and so on, as you say.

However, IIS gets these sourcetypes for another reason. IIS is a sourcetype with positional field names in a header at the top of the file. However, since each file lists the fields present, Splunk assumes that not all files of this type will necessarily have the same list of fields. Therefore a new sourcetype is generated whenever the list of fields must be stored, and the list is inserted into a field extraction configuration for each sourcetype in turn.

This works fine in a simple splunk environment, although it does look a bit confusing. However, because it creates configuration at index time intended to be used at search time it can break in distributed search environments, or in situations where data is forwarded after it is parsed.

Incidentally we have a proposal to make searches work for all 'sourcetype=iis' which is to add the configuration 'rename=iis' to each of the autogenerated sourcetypes. This can be done manually for now, but I hope this starts happening automatically in a release in the near future.

View solution in original post

SplunkSE
Splunk Employee
Splunk Employee

I think given there are multiple versions of IIS being run in the data center, it isn't as easy as the pull down auto-sourcetyping splunk provides for "IIS". In cases where W3C exists, there is also normally W3SVC1 as well. So, first define your problem. Do you have IIS6, IIS7 and/or IIS7.5 as you would in 2008 environments. Each of these look a little different: IIS for 6.0 and IIS-n, or IIS-n+1 when auto-sourcetyped for IIS 7.0 or IIS 7.5.

For example - here is IIS7.0 logging example: 2010-01-08 03:28:31 W3SVC1 WS1 GET /favicon.ico - 80 - 10.3.200.2 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+6.1;+en-US;+rv:1.9.1.6)+Gecko/20091201+Firefox/3.5.6 - - 10.20.100.10 404 0 2 1405 356 15 2010-01-08 03:28:31 W3SVC1 WS1 GET /favicon.ico - 80 - 10.3.200.2 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+6.1;+en-US;+rv:1.9.1.6)+Gecko/20091201+Firefox/3.5.6 - - 10.20.100.10 404 0 2 1405 356 31 2010-01-08 03:28:31 W3SVC1 WS1 GET /favicon.ico - 80 - 10.3.200.2 HTTP/1.1 Mozilla/5.0+(Windows;+U;+Windows+NT+6.1;+en-US;+rv:1.9.1.6)+Gecko/20091201+Firefox/3.5.6 - - 10.20.100.10 404 0 2 1405 356 31

note the replacement of whitespace with a '+' and also what you don't see here, is additional values attached to the HTTP Status codes: http://support.microsoft.com/kb/943891 So, don't get frustrated if you find your auto-sourcetyping isn't working in your ~/local/props.conf. Make sure you are taking into account the new delimiter for each sourcetype.

lguinn2
Legend

Excellent advice. Given the environment that evoked the original post, the answer to the question "Do you have IIS6..." is probably "all of the above"
Thanks very much for explaining the issue thoroughly.

0 Karma

jrodman
Splunk Employee
Splunk Employee

It's certainly true that if splunk encounters 'someapp.log' without configuration, likely to create a new sourcetype called 'someapp'. Later, as the file rolls, splunk may not be able to correctly guess that the new rolled files are the same, and create a new sourcetype 'someapp-1', and then 2 and so on, as you say.

However, IIS gets these sourcetypes for another reason. IIS is a sourcetype with positional field names in a header at the top of the file. However, since each file lists the fields present, Splunk assumes that not all files of this type will necessarily have the same list of fields. Therefore a new sourcetype is generated whenever the list of fields must be stored, and the list is inserted into a field extraction configuration for each sourcetype in turn.

This works fine in a simple splunk environment, although it does look a bit confusing. However, because it creates configuration at index time intended to be used at search time it can break in distributed search environments, or in situations where data is forwarded after it is parsed.

Incidentally we have a proposal to make searches work for all 'sourcetype=iis' which is to add the configuration 'rename=iis' to each of the autogenerated sourcetypes. This can be done manually for now, but I hope this starts happening automatically in a release in the near future.

jrodman
Splunk Employee
Splunk Employee

The behavior for csv is pretty much identical to iis with the same cause. It's how our AutoHeader / CHECK_FOR_HEADER logic works. Again, you can mitigate with rename=original_souretype in the autogenerated sourcetypes.

0 Karma

Chris_R_
Splunk Employee
Splunk Employee

What's the fix of this behaviour in non iis sourcetypes?
I've seen this happen with a external script pushing a csv file monitored by splunk. Although i think upon every push the headers might have been getting cutoff, so the sourcetype read after each new script push was getting assigned somefile-1, somefile-2. Not sure root cause.

0 Karma

lguinn2
Legend

Will try using rename=iis
Thanks

0 Karma

hulahoop
Splunk Employee
Splunk Employee

If the sourcetype is not explicitly set when a data input is created, Splunk tries to automatically assign a sourcetype. In observation, it sometimes uses the file name followed by a sequence number, which may explain why you see iis.1 as the sourcetype.

To ensure the data is sourcetyped as you want it, always set the sourcetype. This can be done in a number of ways.

If the data input is being added via the Manager, then choose a sourcetype from the list of pre-configured sourcetypes or set it manually using a custom name.

If adding data via inputs.conf, ensure the sourcetype parameter is defined for the input stanza. For example:

[monitor:///.../iis.log]
sourcetype = iis
disabled = false

For flexible sourcetyping use props.conf and wildcards, for example:

[source::.../iis.log]
sourcetype = iis

IMHO, getting the sourcetyping right is one of the essential tasks in any Splunk implementation so that you don't need to have users search with sourcetype=iis*.

HeavyHats
Explorer

14 years later, I'm coming here to say THANK YOU! One of my backburner projects has been trying to figure out why we've been getting "cron-2" and "error.log-too_small" type sourcetypes for over a year now. Simply defining the sourcetype for each file, as you suggested, has fixed the issue. 

You, @hulahoop, are a lifesaver.

0 Karma

lguinn2
Legend

I agree: getting the sourcetype right is key. But there is apparently a particular issue with IIS that persists even when sourcetype=iis is specified!

0 Karma
Get Updates on the Splunk Community!

Index This | I am a number, but when you add ‘G’ to me, I go away. What number am I?

March 2024 Edition Hayyy Splunk Education Enthusiasts and the Eternally Curious!  We’re back with another ...

What’s New in Splunk App for PCI Compliance 5.3.1?

The Splunk App for PCI Compliance allows customers to extend the power of their existing Splunk solution with ...

Extending Observability Content to Splunk Cloud

Register to join us !   In this Extending Observability Content to Splunk Cloud Tech Talk, you'll see how to ...