Hello,
I've been trying to parse logs from Docker and used this Splunk answer (https://answers.splunk.com/answers/611715/docker-logs-produced-in-raw.html) to extract the underlying logs from the Docker JSON.
The underlying logs are also in JSON, so I'm trying to get Splunk to recognize the opening "{" as the start of the event. However, I'm finding that some sources are still dividing each line of the log into a separate event, while some sources are creating a single event with multiple JSON blobs.
Here is my props.conf:
[source::/var/log/containers/*]
SHOULD_LINEMERGE = false
NO_BINARY_CHECK = true
LINE_BREAKER = ([\n\r]+){"log":"{\n # setting line break as opening "{" in underlying JSON
CHARSET = UTF-8
disabled = false
[container_json]
CHARSET=UTF-8
SHOULD_LINEMERGE=false
NO_BINARY_CHECK = true
SEDCMD-1_unjsonify = s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2_unescapequotes = s/\\"/"/g
category = Custom
disabled = false
pulldown_type = true
TRUNCATE=150000
TZ=UTC
KV_MODE = json
This is the log sent from Docker:
{"log":"{\n","stream":"stdout","time":"2018-03-06T18:56:08.648972915Z"}
{"log":" \"time\": \"2018-03-06 18:56:08.648636Z\",\n","stream":"stdout","time":"2018-03-06T18:56:08.649029831Z"}
{"log":" \"nothing_to_update\": true,\n","stream":"stdout","time":"2018-03-06T18:56:08.64903929Z"}
{"log":" \"events\": [\n","stream":"stdout","time":"2018-03-06T18:56:08.649045009Z"}
{"log":"\n","stream":"stdout","time":"2018-03-06T18:56:08.649050131Z"}
{"log":" ]\n","stream":"stdout","time":"2018-03-06T18:56:08.649054914Z"}
{"log":"}\n","stream":"stdout","time":"2018-03-06T18:56:08.649059571Z"}
This is the extracted source in Splunk, but each line is showing up as individual events:
{
"time": "2018-03-06 18:56:08.648636Z",
"nothing_to_update": true,
"events": [
]
}
I have other source files that seem to be working, but they are concatenating several JSON logs together. This source file shows up as one single event:
{
"time": "2018-03-06 18:56:18.507756Z",
"events": [
"No emails to send"
]
}
{
"time": "2018-03-06 18:56:18.514313Z",
"events": [
"No emails to send"
]
}
I've tried many different props.conf configurations, and this is the closest I've gotten to parsing the JSON properly. The extracted source for both examples is valid JSON, so I'm not sure why some source files are divided into line-by-line events but others are combining multiple JSON events into one.
Any help would be greatly appreciated!
Looks like you were going in the right direction with your props settings. Try changing your LINE_BREAKER to this:
LINE_BREAKER = ([\n\r]+)\s*{"log":"{\\n
I'm not sure if you literally have a comment on the same line, but move it to the line before or after. Besides that, I: (1) Added escaping for the literal \n
in your data, and (2) allow spaces before the starting {
; this could simple be a copy-n-paste error when posting to this page, but shouldn't hurt either way.
If you want to use the original timestamps, instead of the ones from the wrapper process, then you could use something like this: (I used this combo on the test data provided.)
[your_sourcetype]
SHOULD_LINEMERGE=false
CHARSET=UTF-8
LINE_BREAKER=([\n\r]+)\s*{"log":"{\\n
SEDCMD-1=s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2=s/\\"/"/g
TIME_PREFIX=\\"time\\": \\"
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%6N%Z
KV_MODE=json
TRUNCATE=150000
TZ=UTC
Note that TIME_PREFIX
expected to see a literal \"
, because time processing happens before SEDCMD*
take effect.
Looks like you were going in the right direction with your props settings. Try changing your LINE_BREAKER to this:
LINE_BREAKER = ([\n\r]+)\s*{"log":"{\\n
I'm not sure if you literally have a comment on the same line, but move it to the line before or after. Besides that, I: (1) Added escaping for the literal \n
in your data, and (2) allow spaces before the starting {
; this could simple be a copy-n-paste error when posting to this page, but shouldn't hurt either way.
If you want to use the original timestamps, instead of the ones from the wrapper process, then you could use something like this: (I used this combo on the test data provided.)
[your_sourcetype]
SHOULD_LINEMERGE=false
CHARSET=UTF-8
LINE_BREAKER=([\n\r]+)\s*{"log":"{\\n
SEDCMD-1=s/{"log":"(?:\\u[0-9]+)?(.*?)\\n","stream.*/\1/g
SEDCMD-2=s/\\"/"/g
TIME_PREFIX=\\"time\\": \\"
TIME_FORMAT=%Y-%m-%d %H:%M:%S.%6N%Z
KV_MODE=json
TRUNCATE=150000
TZ=UTC
Note that TIME_PREFIX
expected to see a literal \"
, because time processing happens before SEDCMD*
take effect.
I am pulling the logs from Universal Forwarder. Is this to be changed on the UForwarder?
it would need to be on any Heavy Forwarder/Indexers, depending on your environment.
If you are willing to try alternative ways to send the logs to Splunk, you can take a look at our solution for monitoring docker ( https://www.outcoldsolutions.com/ ). Our solution provides advanced configuration for joining log lines into one, base on the pattern, how would you expect your line to look like.
If you will take a look at our example how to run our collector https://www.outcoldsolutions.com/docs/monitoring-docker/ , the only two lines you will need to add are
--env "COLLECTOR__SPLUNK_JSON1=pipe.join::json__patternRegex=^{" \
--env "COLLECTOR__SPLUNK_JSON2=pipe.join::json__matchRegex.docker_container_image=^ubuntu:14\.04$" \
which matches to our configuration ( https://www.outcoldsolutions.com/docs/monitoring-docker/configuration/ ), if you want to configure it with the file, and not using the environment variables
[pipe.join::json]
#Set match pattern for the fields
matchRegex.docker_container_image = ^ubuntu:14\.04$
# All events start from '[<digits>'
patternRegex = ^{
That will tell our collector that all messages you expect to start from {
for containers running from image matches regex ^ubuntu:14\.04$
, the full command to run
docker run -d \
--name collectorfordocker \
--volume /sys/fs/cgroup:/rootfs/sys/fs/cgroup:ro \
--volume /proc:/rootfs/proc:ro \
--volume /var/log:/rootfs/var/log:ro \
--volume /var/lib/docker/containers/:/var/lib/docker/containers/:ro \
--volume /var/run/docker.sock:/var/run/docker.sock:ro \
--volume collector_data:/data/ \
--cpus=1 \
--cpu-shares=102 \
--memory=256M \
--restart=always \
--env "COLLECTOR__SPLUNK_URL=output.splunk__url=https://input.splunk.outcold.net/services/collector/event/1.0" \
--env "COLLECTOR__SPLUNK_TOKEN=output.splunk__token=670DD88D-AFB5-4DCE-B0C5-F7AD0A7A2FB8" \
--env "COLLECTOR__EULA=general__acceptEULA=true" \
--env "COLLECTOR__SPLUNK_JSON1=pipe.join::json__patternRegex=^{" \
--env "COLLECTOR__SPLUNK_JSON2=pipe.join::json__matchRegex.docker_container_image=^ubuntu:14\.04$" \
--privileged \
outcoldsolutions/collectorfordocker:3.0.86.180207
After that if I will try to do
docker run ubuntu:14.04 bash -c "sleep 5 && echo '{
\"a\": \"b\"
}
'"
That will result json log lines as
{"log":"{\n","stream":"stdout","time":"2018-03-07T05:49:45.33335549Z"}
{"log":" \"a\": \"b\"\n","stream":"stdout","time":"2018-03-07T05:49:45.333434251Z"}
{"log":" }\n","stream":"stdout","time":"2018-03-07T05:49:45.333448344Z"}
{"log":"\n","stream":"stdout","time":"2018-03-07T05:49:45.333459838Z"}
And with our solution they will be delivered as one event