I have several linux boxes that are being monitored by splunk and a few of them are having problems running the "cpu.sh" script.
The problem is that on some of the boxes, there is not "steal" column in the sar output:
uname -a
Linux hostname 2.6.9-89.35.1.ELsmp #1 SMP Tue Jan 4 22:30:58 EST 2011 i686 i686 i386 GNU/Linux
$ sar -P ALL 1 1
Linux 2.6.9-89.35.1.ELsmp (fralnxnmsapp5) 11/10/2011
02:21:05 PM CPU %user %nice %system %iowait %idle
02:21:06 PM all 0.00 0.00 0.00 0.00 100.00
02:21:06 PM 0 0.00 0.00 0.00 0.00 100.00
02:21:06 PM 1 0.00 0.00 0.00 0.00 100.00
02:21:06 PM 2 0.00 0.00 0.00 0.00 100.00
02:21:06 PM 3 0.00 0.00 0.00 0.00 100.00
Average: CPU %user %nice %system %iowait %idle
Average: all 0.00 0.00 0.00 0.00 100.00
Average: 0 0.00 0.00 0.00 0.00 100.00
Average: 1 0.00 0.00 0.00 0.00 100.00
Average: 2 0.00 0.00 0.00 0.00 100.00
Average: 3 0.00 0.00 0.00 0.00 100.00
One more piece of info, the version of sar on the box's are pretty old:
$ sar -V
sysstat version 5.0.5
(C) Sebastien Godard
Usage: sar [ options... ] [[ ] ]
Splunk's cpu script assumes the "steal" column is there. As a result, all the values end up in the wrong column.
Is there a fix for this, or should I change the cpu script myself?
Thanks
-Kevin
Wow, that is a really old version of sar! I don't think in our testing we encountered a version quite that old.
Assuming you are using the latest version of the Unix/Linux app, on line 30 of cpu.sh, it seems like you should change this:
FORMAT='{cpu=$(NF-6); pctUser=$(NF-5); pctNice=$(NF-4); pctSystem=$(NF-3); pctIowait=$(NF-2); pctIdle=$NF}'
To this:
FORMAT='{cpu=$(NF-5); pctUser=$(NF-4); pctNice=$(NF-3); pctSystem=$(NF-2); pctIowait=$(NF-1); pctIdle=$NF}'
This is purely an eyeball, so your mileage may vary. I will make a note to triage this issue for the next maintenance release of the app, though I can't guarantee it will make the cut.
Wow, that is a really old version of sar! I don't think in our testing we encountered a version quite that old.
Assuming you are using the latest version of the Unix/Linux app, on line 30 of cpu.sh, it seems like you should change this:
FORMAT='{cpu=$(NF-6); pctUser=$(NF-5); pctNice=$(NF-4); pctSystem=$(NF-3); pctIowait=$(NF-2); pctIdle=$NF}'
To this:
FORMAT='{cpu=$(NF-5); pctUser=$(NF-4); pctNice=$(NF-3); pctSystem=$(NF-2); pctIowait=$(NF-1); pctIdle=$NF}'
This is purely an eyeball, so your mileage may vary. I will make a note to triage this issue for the next maintenance release of the app, though I can't guarantee it will make the cut.
The second option seems viable, but the first option seems super easy to implement. The app is open sourced under the Apache license, and it should be up on Splunk's github account pretty soon, so feel free to contribute your fix when that day comes.
Thanks,
That's close to what I was thinking, but since we deploy the same app to several forwarders, I was thinking about something like:
if (NF == 😎
FORMAT='{cpu=$(NF-5); pctUser=$(NF-4); pctNice=$(NF-3); pctSystem=$(NF-2); pctIowait=$(NF-1); pctIdle=$NF}'
else
(what is currently there)
The other thing I though about was to just grab the header from the output (change % to pct) and use that for a header.
For now I'll probably go with the first option, but do you see a problem with the second?