Solved: Re: pingstatus command: Is it possible to run more...

arkadyz1 · ‎03-30-2015

I started using pingstatus command in our application. It works great, but I do have a couple of problems:

My first problem is the fact that it runs just one ping. Our networks are somewhat shaky, so running several pings and getting a package loss percentage would be great. The ideal would be to add a parameter to the command such as count=... (so that I can run it like this: pingstatus url as IP count=5), and get back both pingdelay and pingloss or similar. I understand that it will create delays, but I can live with it.

My other problem is the pingdelay values that I'm getting: when I tested it on my Windows machine, I got all kinds of delays, usually quite small, similar to 0.000015 or even lower. However, when I moved it to the main network which consists of many (dozens to hundreds) Linux machines, most of the pingdelay values I'm getting are 0.0. Yes, that's right - a plain zero! Those which are not look like 0.00136 or so, but there are only a couple of percents of such non-zero records. What's worse - I know that I'm dealing with quite a wide network there with multiple locations and routers, so I would expect values in 20-100 ms range, not 1 ms or below.

Now, for the first problem, I should probably look into pingstatus.py - but I'm far from being a Python expert, let alone Splunk Python. A push in the right direction is all I would like.

For the second one: is there a big difference between Windows and Linux ping.py behavior?

ndoshi · ‎03-30-2015

Since you don't know Python, I'm going to give you some sample code to change the pingstatus.py

count=1
if len(sys.argv)>1 and len(sys.argv) != 4 and len(sys.argv)!=5:
    print "Usage |pingstatus url as <local-field> (or have url field name in da\
ta) <optional-count>"
    sys.exit()
elif len(sys.argv) == 4:
    urlfield=sys.argv[3]
elif len(sys.argv) == 5:
    urlfield=sys.argv[3]
    count=sys.argv[4]

That will get you your count argument as a number added to your pingstatus command. Don't use count=5 as input as you'll have to parse that. Just put in 5. For example: |pingstatus url as ip 5|table ip pingstatus*

Next, for the pingdelay field, you can use this approach.

if urlfield in r:
        for i in range(1, count+1):
                    try:
                        delay = ping.do_one(r[urlfield], timeout=2)
            if count=1:
                            r["pingdelay"] = delay
                            continue:
                        else:
                            pingdelay="pingdelay" + str(i)
                            r[pingdelay] = delay
                    except socket.error, e:
                        if count=1:
                            r["pingdelay"] = 10000000
                        else:
                            pingdelay="pingdelay" + str(i)
                            r[pingdelay] = 10000000

This will will created fields pringdelay1, pingdelay2, etc if your count is greater than 1. This has not been tested, so you'll have to play it. Also, don't just copy and paste from this answers post as the formatting may be wrong. In Python, proper indentation matters. In Splunk to print your results, do:

|table pingdelay*

As for Windows vs Linux, I'm not sure why this is different as I used a public domain ping.py program to get my results. For Windows you may have to find a version that is better suited for it. Keep in mind this is a reference implementation to get you an idea how to do this. It is used as is.

View solution in original post

arkadyz1 · ‎03-31-2015

To summarize our discussion and provide back a modified pingstatus command:

Attached is a pingstatus.py which adds an optional count parameter to ping more than once, and outputs the number of unsuccessful and successful pings as well as computers the average time. It's very crude (you can see I'm no Python programmer), but it gets its job done. pingsuccess and pingfail are the counts of successful/unsuccessful pings.

My version of pingstatus has four invocation formats:
pingstatus (the expected field containing IP/hostname is url, count defauts to 1)
pingstatus count (uses url input field and the provided count; example pingstatus 5)
pingstatus url as ipfield (uses provided field for IP/hostname, count is 1; example pingstatus url as IP)
pingstatus url as ipfield count (uses as IP/hostname, pings times; i.e. pingstatus url as IP 10)

See whether you like it and make any changes you feel are needed. I guess def usage would be in order: it could be used in try blocks surrounding count = int(sys.argv[?]) to catch ValueError.

Edit: I attached the file, but I don't see how it can be downloaded. Let me know if you want the file and I'll paste the source somewhere.

ndoshi · ‎03-31-2015

Here's what we'll do. Paste the source somewhere with comments that attribute you as the change agent name to the file. I'll add this as experimental_pingstatus.py to the distribution in the bin directory and let those who want to explore using it the options continue. This way, as you said, they can use the plain vanilla method of pingstatus as provided and look into more advanced stuff as needed. Let me know where you pasted it. If there is enough character space, you can even paste here on answers.

arkadyz1 · ‎03-31-2015

Here it goes (part one - I hope it formats correctly):

# Copyright (C) 2005-2011 Splunk Inc.  All Rights Reserved.  Version 4.x
# Author: Nimish Doshi
# Modified by: Arkady Zilberberg, 2015-03-31
# Change history:
#   Added an optional count of pings (see Usage: comment below)
#   Adds fields:
#       pingdelay (just as the original, though now averaged between successful pings)
#       pingsuccess - the count of successful pings
#       pingfail - the count of failed pings
#       pingdelay1 through pingdelay<n> - actual pingdelays for each ping
import sys,splunk.Intersplunk
import string
import ping
import socket


urlfield="url"
count = 1

# Usage:
# pingstatus (ping once, generate pingdelay)
# pingstatus count (ping count of times, generate pingdelay - average, pingloss - tally the losses)
# pingstatus url as local-field (ping once, getting url from local-field)
# pingstatus url as local-field count (ping count of times, generate pingdelay - average, pingloss, get url from local-field)
if len(sys.argv) == 1:
    pass
elif len(sys.argv) == 2:
    count = int(sys.argv[1])
elif len(sys.argv) == 4:
    urlfield=sys.argv[3]
elif len(sys.argv) == 5:
    urlfield = sys.argv[3]
    count = int(sys.argv[4])
else:
    print "Usage | pingstatus [url as <local-field>] [count] (or have field named 'url' in data)"
    sys.exit()

results = []

try:

    results,dummyresults,settings = splunk.Intersplunk.getOrganizedResults()

arkadyz1 · ‎03-31-2015

...and part two:

    for r in results:
        if urlfield in r:
            total_delay = 0
            pingsuccess = 0
            pingfail = 0
            for i in range(1, count + 1):
                try:
                    delay = ping.do_one(r[urlfield], timeout=2)
                    total_delay += delay
                    pingsuccess += 1
                    pingdelay = "pingdelay" + str(i)
                    r[pingdelay] = delay
                    del delay
                except NameError:
                    pingfail += 1
                except TypeError:
                    pingfail += 1
                except socket.error, e:
                    pingfail += 1
                r["pingdelay"] = 10000000 if pingsuccess == 0 else total_delay / pingsuccess
                r["pingsuccess"] = pingsuccess
                r["pingfail"] = pingfail
except:
    import traceback
    stack =  traceback.format_exc()
    results = splunk.Intersplunk.generateErrorResults("Error : Traceback: " + str(stack))

splunk.Intersplunk.outputResults( results )

ndoshi · ‎03-31-2015

Loaded version 1.2.1 with your code and release notes.

arkadyz1 · ‎03-31-2015

If you feel like improving the code, please do so and send back the result.

ndoshi · ‎03-30-2015

Since you don't know Python, I'm going to give you some sample code to change the pingstatus.py

count=1
if len(sys.argv)>1 and len(sys.argv) != 4 and len(sys.argv)!=5:
    print "Usage |pingstatus url as <local-field> (or have url field name in da\
ta) <optional-count>"
    sys.exit()
elif len(sys.argv) == 4:
    urlfield=sys.argv[3]
elif len(sys.argv) == 5:
    urlfield=sys.argv[3]
    count=sys.argv[4]

That will get you your count argument as a number added to your pingstatus command. Don't use count=5 as input as you'll have to parse that. Just put in 5. For example: |pingstatus url as ip 5|table ip pingstatus*

Next, for the pingdelay field, you can use this approach.

if urlfield in r:
        for i in range(1, count+1):
                    try:
                        delay = ping.do_one(r[urlfield], timeout=2)
            if count=1:
                            r["pingdelay"] = delay
                            continue:
                        else:
                            pingdelay="pingdelay" + str(i)
                            r[pingdelay] = delay
                    except socket.error, e:
                        if count=1:
                            r["pingdelay"] = 10000000
                        else:
                            pingdelay="pingdelay" + str(i)
                            r[pingdelay] = 10000000

This will will created fields pringdelay1, pingdelay2, etc if your count is greater than 1. This has not been tested, so you'll have to play it. Also, don't just copy and paste from this answers post as the formatting may be wrong. In Python, proper indentation matters. In Splunk to print your results, do:

|table pingdelay*

As for Windows vs Linux, I'm not sure why this is different as I used a public domain ping.py program to get my results. For Windows you may have to find a version that is better suited for it. Keep in mind this is a reference implementation to get you an idea how to do this. It is used as is.

arkadyz1 · ‎03-31-2015

Looking closer at your pingstatus.py:
I wonder why is there if "_raw" in r part in the code? Does it mean that any filtered search without _raw (say, someone has | fields - _raw somewhere in the chain before piping it to pingstatus) will not ping at all? It's not an empty question - when one is dealing with summary searches, it is often necessary to remove the original _raw (and sometimes remove or replace _time as well) and just use some fields (one of which might be pingdelay generated by pingstatus). If that _raw removal happens earlier in the chain then pingstatus will not work (I guess).

Also, as a potential future improvement - ping.py does not return anything from do_one in a few cases, definitely when the pinged host is unreacheable. I think you'll do great if you catch NameError separately (as delay might not be defined at all after do_one call) and del delay in between do_one invocations.

ndoshi · ‎03-31-2015

Again, this was a reference implementation. It was meant to test against _raw to see if a machine responds to a ping given an address to check. Nothing more. You are free to remove that if statement and simply look for the presence of the field that represents the host you are pinging (URL, hostname, IP, etc). Since I didn't want this to be tied to ping.py as users are free to add their own ping module, I didn't catch any specific exceptions from it, other than simply timing out.

arkadyz1 · ‎03-31-2015

First of all let me tell you that it was a very useful "reference" implementation - thank you! I will try my changes and let you know how they work.

What I like about your pingstatus command is that it is absolutely minimalistic and gives the user a full freedom to put it anywhere in the chain and modify the event as necessary. There are no forms or dashboards to manage, no permissions to give or take - just the functionality, pure and simple :).

pingstatus command: Is it possible to run more than one ping and why are most pingdelay values in my network showing 0.0?

Detecting Remote Code Executions With the Splunk Threat Research Team

Observability | Use Synthetic Monitoring for Website Metadata Verification

More Ways To Control Your Costs With Archived Metrics | Register for Tech Talk