Getting Data In

Data onboarding, splunk-server field massing up the whole data

yohhpark
Path Finder
can someone help me with this issue where splunk is reading the file, but 'adding' a information that is NOT in the original file.
 

If you search below

index="acob_controls_summary" sourcetype=acob:json source="/var/log/acobjson/*100223*rtm*"

|search system=CHE control_id=AU2_A_2_1 compliance_status=100%

You will get two result, and mainly separated by “last_test_date”


one showing
"2023-10-02 15:42:30.784049"
and other showing

"2023-10-02 14:56:45.047265"


ironically,

 

attached file is the SAME file (just changed the file name after copied onto my machine), that we are seeing from the splunk,

yet there is only ONE entry which is the second the one "2023-10-02 14:56:45.047265

where does that “2023-10-02 15:42:30.784049” came from?

 

we have a cluster environment therefore many splunk-server auto creates but why is it making a new 'test date' which actually separates one entry into two, AND give one a good return yet another one with wrong info. 

 

Labels (1)
0 Karma
1 Solution

_JP
Contributor

I am not seeing attachments or other screenshots.  

Usually when I see duplicate events like this it has been because a file was replicated somehow "underneath" Splunk within a directory where Splunk thinks it is a new file and starts indexing it again. Or, I've seen this happen if you have the log files going to a shared mount point and two different Forwarders are pointing at the same files. 

A few questions to help you troubleshoot:
- You mention splunk-server.  What does the splunk_server field, along with values of things like host, sourcetype, and source look like for these events? 

- Your timestamps are wildly off, and not necessarily in a predictable way (e.g. just by 1 hour).  Does your log data have timestamps within it, or are you relying on the timestamp being derived from when Splunk "sees" your log?
- Have you poked around in the _internal index to see where Splunk "saw" any files matching the following:

/var/log/acobjson/*100223*rtm*

 NOTE:  Don't look for source=/var/log/acobjson/*100223*rtm* in index=_internal, because the source= in this context refers to the Splunk log files that were indexed.  You can start without specifying a field, but you can also try something like index=_internal series=/var/log/acobjson/*100223*rtm* since that is one field Splunk will log this info in as it is monitoring files.

View solution in original post

0 Karma

PickleRick
SplunkTrust
SplunkTrust

1. There are no samples of neither orignal data nor search results so we can't know what you mean,

2. Splunk does not manipulate data on its own unless it's configured to do so. We don't know your configuration so we can't tell you what's going on during the onboarding process.

Did you check the configuration for sourcetype, source and host in question? Do you even refer to raw data, search-time extracted fields or indexed fields?

We have no idea what's going on because you haven't shown anything apart from a simple search (which we have no idea of knowing what to expect from not knowing the events) and some random timestamps.

 

0 Karma

_JP
Contributor

I am not seeing attachments or other screenshots.  

Usually when I see duplicate events like this it has been because a file was replicated somehow "underneath" Splunk within a directory where Splunk thinks it is a new file and starts indexing it again. Or, I've seen this happen if you have the log files going to a shared mount point and two different Forwarders are pointing at the same files. 

A few questions to help you troubleshoot:
- You mention splunk-server.  What does the splunk_server field, along with values of things like host, sourcetype, and source look like for these events? 

- Your timestamps are wildly off, and not necessarily in a predictable way (e.g. just by 1 hour).  Does your log data have timestamps within it, or are you relying on the timestamp being derived from when Splunk "sees" your log?
- Have you poked around in the _internal index to see where Splunk "saw" any files matching the following:

/var/log/acobjson/*100223*rtm*

 NOTE:  Don't look for source=/var/log/acobjson/*100223*rtm* in index=_internal, because the source= in this context refers to the Splunk log files that were indexed.  You can start without specifying a field, but you can also try something like index=_internal series=/var/log/acobjson/*100223*rtm* since that is one field Splunk will log this info in as it is monitoring files.

0 Karma

yohhpark
Path Finder

I think the issue is more complicated than that. I understand not to look for internal.

 

that is not the issue. the issue is that splunk generates different data from the orginal source with different test date. which is NOT in the file. 

it has to do with the cluster environment. anyone super expert in such?

0 Karma

PickleRick
SplunkTrust
SplunkTrust

No. Unless explicitly configured to do so (which may be the case but it's beyond us to know how your environment is configured) splunk doesn't "generate" data. It ingests (and possibly modifies if it's configured that way) data it's given for ingestion.

And clustering doesn't change the data. It merely replicates (if needed) data to other nodes.

0 Karma

_JP
Contributor

Can you provide a screenshot of the event data within Splunk, and what it looks like within the file?  If necessary redact anything private. It would also help if you could have the Splunk default fields selected so they appear in-line with your event data (host, index, linecount, punct, source, sourcetype, splunk_server, timestamp)

I'm having a difficult time visualizing only the timestamp portion being different between two events and one log file.  

0 Karma
Get Updates on the Splunk Community!

Introducing the Splunk Community Dashboard Challenge!

Welcome to Splunk Community Dashboard Challenge! This is your chance to showcase your skills in creating ...

Get the T-shirt to Prove You Survived Splunk University Bootcamp

As if Splunk University, in Las Vegas, in-person, with three days of bootcamps and labs weren’t enough, now ...

Wondering How to Build Resiliency in the Cloud?

IT leaders are choosing Splunk Cloud as an ideal cloud transformation platform to drive business resilience,  ...