Hi all,
I am at an impasse, and I do need some ideas how to overcome that.
So here is the challenge.
1)I have CSV data coming into SPLUNK via UF(50+ sources);
2)Data is slightly restructured (not even always) , and later sent as DB_Output via SPLUNK DBConnect.
3)Data is output either using upsert(for several of the DBs, that need it), or without upsert(for the rest)
4)The automated search(for the DBoutput) is performed periodically (about once/hour, for X-amount of hours back), and it covers our needs regarding newly added data into previously added data sources.
HOWEVER, and here is the main challenge,
5)When a new source is added, the CSV source files are already full of old data( older than X amount of hours, some times months) , thus though the data is indexed, it does not get DB-outputted(great word). However I have to send the old data as well.
6) Automated search for "all time" all the time is not applicable( 50+ sources, with often 10+ source files each) , the amount of "select"s sent to the target DB is too much and causes too much time for splunk to search plus the target DB is overwhelmed and stops responding.
At this moment, I have a workaround solution , that involves modifying the automated output to search "All time" for that particular source, for 1 iteration, and then I change it back to normal, but, the sources keep increasing their numbers, and this solution is more and more inconvenient. I have a feeling that there is a better solution(easy solution) but it eludes me.
If anyone could share ideas, that would be helpful.
Have a great day!