Ossec 2.6, Splunk splunk-4.2.3-105575 and Splunk app ossec-1.1.88
Recently upgraded Ossec from 2.0 to 2.6 and added Splunk. Both reside on the same server which has over 900 active agents. When using the Splunk web interface only a few (36) agents show up and as part of troubleshooting running the ossec_agent_status.py I see that it gets an error.
Also, even when in the web interface under "Agent Status" the status column drops letters (doesn't finish the word like - "disco" instead of "disconnected" or "Never con" instead of "Never connected" for some of the agents. I don't know if that is part of this same problem or something different.
I hope someone can help with this as I would really like to show off Splunk using an existing Ossec installation base.
Thanks,
Here is the output from the ossec_agent_status.py
splunk@n1pvir006 > ./ossec_agent_status.py -v
Server config:
{'n1pvir006': {'AGENT_CONTROL': 'sudo /opt/ossec/bin/agent_control -l', 'MANAGE_AGENTS': 'sudo /opt/ossec/bin/manage_agents'}}
Querying n1pvir006
OSSEC interface initialized.
Server: n1pvir006, Error: Unable to run data collection. Timeout exceeded in expect_any().
version: 2.3 ($Revision: 399 $)
command: /usr/bin/sudo
args: ['/usr/bin/sudo', '/opt/ossec/bin/agent_control', '-l']
searcher: searcher_re:
0: re.compile("ID:(.*)List of agentless devices:")
1: re.compile("(?i)password")
buffer (last 100 chars): pvap020, IP: 10.180.5.151, Active
ID: 1036, Name: w1pvap003, IP: 10.180.5.152, Active
ID: 10
before (last 100 chars): pvap020, IP: 10.180.5.151, Active
ID: 1036, Name: w1pvap003, IP: 10.180.5.152, Active
ID: 10
after:
match: None
match_index: None
exitstatus: None
flag_eof: False
pid: 11189
child_fd: 3
closed: False
timeout: 5
delimiter:
logfile: None
logfile_read: None
logfile_send: None
maxread: 2000
ignorecase: False
searchwindowsize: None
delaybeforesend: 0.05
delayafterclose: 0.1
delayafterterminate: 0.1
Timeout issue
The most likely reason for the timeout is just the relatively large number of agents - the default timeout is 5 seconds. I've made a note to increase that and/or make it configurable.
In the mean time, try editing bin/pyOSSEC.py. At line 331, change:
p = pexpect.spawn(cmd, timeout=5)
to:
p = pexpect.spawn(cmd, timeout=30)
and see if that solves the timeout concern.
Truncation Issue
There's a good chance that the truncation is caused by the timeouts. The partial output from pexpect will print out 100 characters of context for diagnostic purposes.
In the example you posted above, it's the section that looks like this:
before (last 100 chars): pvap020, IP: 10.180.5.151, Active
ID: 1036, Name: w1pvap003, IP: 10.180.5.152, Active
ID: 10
Here, it truncated what would have been the ID value. If the truncation had occurred toward the end of a line instead of toward the beginning, you would get something ending with a partial word that would be extracted into the Status column.
It might still be something else, but the best approach will be to fix the truncation issue and look more closely at this if the problem persists.
If you search on sourcetype=oseec_agent_control
, I would expect to see see lines where the raw data shows the same truncation. (Assuming that's the case, it supports the conjecture that it's a data collection problem and not a field extraction problem).
The timeout was increased to 30 seconds in version 1.1.89.