Refine your search:

2
3

I am trying to index an XML file which looks like this:

 <?xml version="1.0" encoding="utf-8" ?> 
 <Posts2Votes>
  <row>
   <Id>1</Id> 
   <PostId>7</PostId> 
   <UserId>2</UserId> 
   <VoteTypeId>2</VoteTypeId> 
   <CreationDate>2009-11-06T02:22:37.063</CreationDate> 
   <TargetUserId>7</TargetUserId> 
   <TargetRepChange>10</TargetRepChange> 
   <IPAddress>64.127.105.60</IPAddress> 
  </row>
  <row>
   <Id>2</Id> 
   <PostId>6</PostId> 
   <UserId>2</UserId> 
   <VoteTypeId>2</VoteTypeId> 
   <CreationDate>2009-11-06T02:22:38.25</CreationDate> 
   <TargetUserId>31</TargetUserId> 
   <TargetRepChange>10</TargetRepChange> 
   <IPAddress>64.127.105.60</IPAddress> 
  </row>
  <!-- more "row" elements go here -->
 </Posts2Votes>

Splunk's default parser will recognizes the timestamps correctly but does not split the events on each <row> element, and no fields are extracted by default. OK, now I need to figure out how to extract these fields and break the lines correctly. Any ideas?

asked 13 Mar '10, 22:45

Justin%20Grant's gravatar image

Justin Grant
1.5k6740
accept rate: 50%

edited 07 Sep '11, 15:30

jlaw's gravatar image

jlaw ♦
20113


5 Answers:

props.conf

TIME_PREFIX = \<CreationDate\>
TIME_FORMAT = %Y-%m-%dT%H:%M:%S.%3N
SHOULD_LINEMERGE = false
LINE_BREAKER = \>\s*(?=\<row\>)
REPORT-xmlext = xml-extr

transforms.conf

[xml-extr]
REGEX = \<(\w+)\>([^\>]*)\<\1\>
FORMAT = $1::$2
MV_ADD = true
REPEAT_MATCH = true

should do it.

link

answered 14 Mar '10, 07:56

gkanapathy's gravatar image

gkanapathy ♦
26.5k1622
accept rate: 42%

Where you able to get this work? I tried it but it does not break the events from one another cleanly.

I do have a subdata within the top group, so after the row group, I have a subrow that contains data for the row group, so that might be what's skewing me.

(16 Mar '10, 12:31) BunnyHop

There is a small error in above regex, correct one is
REGEX = <(w+)>([^<]*)</1>

(18 Oct '10, 09:18) gljiva

The post-your-answer text pre-processor will NOT ALLOW (even with verbatim/pre/code block markers) the correct line to be input!
The line in the correction before this one IS WRONG. To fix it, add a backslash before the 'w' and add a backslash before the forwardslash. Maybe some Splunk wiz can figure out a way to post a followup to my answer with the correct text shown.

link

answered 06 Dec '11, 13:54

woodcock's gravatar image

woodcock
803
accept rate: 6%

The problem with submitting the correct text was the less-than '<' character which got treated as an initial token of an html tag and this could not be directly escaped. The way to do it is to make sure that any less-than characters are HTML-ified as ampersand/letter-L/letter-t/semi-colon.

REGEX=<([w-]+)>([^<]*)</1>

link

answered 06 Dec '11, 16:35

woodcock's gravatar image

woodcock
803
accept rate: 6%

Wow, crazy, my last followup fixed one problem but it is also wrong!

The corrected version of the original poster's line is this:
REGEX = <(w+)>([^<]*)</1>

However I sought to included hyphens in my tag names so I changed it to this:
REGEX = <([w-]+)>([^<]*)</1>

link

answered 07 Dec '11, 07:50

woodcock's gravatar image

woodcock
803
accept rate: 6%

OK, I give up. Even though my text looked correct in the preview window, it still ate two of my backslashes on each line. To make sure the lines correct there should be a backslash before every 'w' and also before every '1'.

link

answered 07 Dec '11, 07:52

woodcock's gravatar image

woodcock
803
accept rate: 6%

Post your answer
toggle preview

Follow this question

Log In to enable email subscriptions

RSS:

Answers

Answers + Comments

Markdown Basics

  • *italic* or _italic_
  • **bold** or __bold__
  • link:[text](http://url.com/ "Title")
  • image?![alt text](/path/img.jpg "Title")
  • numbered list: 1. Foo 2. Bar
  • to add a line break simply add two spaces to where you would like the new line to be.
  • basic HTML tags are also supported

Tags:

×354
×30

Asked: 13 Mar '10, 22:45

Seen: 1,962 times

Last updated: 07 Dec '11, 07:52

Copyright © 2005-2012 Splunk, Inc. All rights reserved.