How to resolve SPL fetching more number of data po...

Taruchit · ‎02-14-2024

Hello All,

I have the below SPL to compare hourly event data and indexed data to find if they follow similar pattern and if there big gap.

|tstats count where index=xxx sourcetype=yyy BY _indextime _time span=1h
|bin span=1h _indextime as itime
|bin span=1h _time as etime
|eventstats sum(count) AS indexCount BY itime
|eventstats sum(count) AS eventCount BY etime
|timechart span=1h max(eventCount) AS event_count max(indexCount) AS index_count

However, when I compare hourly results, I get more number of data points in indexed data than event data.

Thus, can you please guide to resolve the problem.

Thank you

Taruchit

Taruchit · ‎02-15-2024

@ITWhisperer : Just to get your last word on this, the SPL I have shared in description is correct in your view to get event data points and index data points for each hour, given that due to delay ingestion we will see different patterns?

Thank you

ITWhisperer · ‎02-15-2024

The SPL will show something, although I am not sure of the value of it on its own, nor what it demonstrates without correlating the results back with the original source events.

That being said, by comparing the pattern for different times of the day with the same time on different days, you might be able to discern a change which is significant. Again, you would need to investigate the reason for the change to determine whether it is useful to detect such a change.

ITWhisperer · ‎02-15-2024

Are you saying that for every hour there are more index data points than event data points, or that it happens sometimes?

Even then, lets say you have a lag between the event time and the index time and that indexing happens at 5 minutes past, but the events picked up are timestamped from 5 minutes before to 5 minutes past. The count for that index time will include events which are not in that hour.

Index time and event time are two different scales running independently of each other. Depending on your source data, events may be indexed before or after their event time.

What problem is it that you are trying to solve?

Taruchit · ‎02-15-2024

Hello @ITWhisperer,

Thank you for your response.

If I take duration of last 10 days, for some hours I get more index data points than event data points, and over the time it changes to more event data points to index data points. Additionally, there is no clear cyclic pattern observed during the day when this change happens, that some duration of time former is observed and some other duration latter is observed.

The problem I am trying to solve is to identify potential data ingestion issue that may exist with respect to a given data source.

That is the anticipated pattern would have event data points and indexed data points closely follow same pattern and the event data points volume will be slightly greater than or equal to indexed data points.

But, when the event data points pattern and indexed data points pattern are not same over the same period of time or we may have more number of indexed data points than event data points, that is when the issue may be occurring. This is the problem I am trying to identify at run time or close to run time and later address with respect to each data source.

Please share if the above helps to answer your questions to seek more guidance on the topic.

Thank you

ITWhisperer · ‎02-15-2024

As I said, it depends on your data. For example, Apache HTTPD logs (and other HTTPD servers) log transactions using the timestamp for when the request was received, but it is added to the log when the response is sent back. This means that the event time could be minutes out from the index time even if the log was indexed instantaneously (which it isn't as there will always be a lag between when the log is written and when it reaches the indexers). However, in this instance, the time the response was sent could be inferred from the request time and the duration, so this could be used to compare against the index time to give you a better idea about the lag.

Perhaps what might be more useful to you is the difference between successive index times? This might show you when either there was a pause in logging or when there was a breakdown in transmission of the log events to the indexers. However, this would need to be compared with the actual rate at which the events were written to the log, so, again, it depends on your data.

Taruchit · ‎02-15-2024

Thank you @ITWhisperer for your detailed inputs.

I have different types of data sources to monitor.

The one I observed and and shared is of AWS Key Management Service, where we write logs when the Key management service calls an AWS service on behalf of application (or user).

Thus, please help to confirm if the approach you shared to compute and compare index datapoint counts at successive hours will enable better understanding of data ingestion issues in Splunk? And in that approach does event datapoint counts have any application or usage?

Thank you

ITWhisperer · ‎02-15-2024

I do not know AWS Key Management Service data, so can't comment on that.

Given that you have different data sources, you might need a different approach for each data source.

I think you need to find an example in your data of the issue you are trying to detect, then determine what the best way of finding the issue in the future is.

By using different approaches (as I have hinted at), you might find one which matches your requirement. There is unlikely to be one answer which fits all, although theoretically there could be, so I would suggest you experiment.

Taruchit · ‎02-15-2024

Thank you @ITWhisperer for sharing your inputs.

How to resolve SPL fetching more number of data points for indexed data than event data?

fields

other

timechart

tstats

Introducing the Splunk Community Dashboard Challenge!

Get the T-shirt to Prove You Survived Splunk University Bootcamp

Wondering How to Build Resiliency in the Cloud?