Small python logwatch-watching script

Logwatch is a little Debian program that does a little server log analysis and sends root an email with a report everyday.

I check these every day, I love them, but there is a section that lists IPs that "probed" the http server, it looks like this:

 A total of 44 sites probed the server

On one of our servers this list can have like a thousand IPs. It would be nice to know if any of these IPs return every day or access every one of our servers. So I made a little script in python that checks that.

It's mainly here for you python hackers to get a look at what a bad, sparsely commented script looks like and maybe get some ideas.

If you do want to run it, save each of your logwatch emails as a text file and name it like servername_date, like ren22 for February 22nd, etc. To use this script verbatim, make sure the names are all less than 8 characters long.

When you have a bunch, change the line in the script that designates the path to your log file and away you go!

You will get a listing that looks like this: is in hyla20hyla22hyla23hyla21

This means the IP listed probed hyla from the 20th through the 23rd. If you look up these IPs you'll find some are search engine spiders, which seems fine. Some aren't and you may want to take a look at the logs a bit more carefully, maybe block them.

You can download the script here, called or copy and paste it from below. It's really here as a simple example maybe python beginners can use to get some hints from, and for myself to get later, and remember what it does.


import os

ips = dict()

def ingest(ffile):
    with open(ppath + ffile, 'r') as f:
        line = f.readline()
        #throw away lines until you find "probed"
        while not "probed" in line:
            line = f.readline()

        # get one more line

        line = f.readline()

        # print lines until you don't find a digit

        while "." in line:
            if line.strip() in ips:
                ips[line.strip()] = ips[line.strip()] + ffile
                #  print "duplicate for " + line.strip()
                ips[line.strip()] = ffile
            line = f.readline()

ppath = "/change/this/to/your/path/to/data/"

for filename in os.listdir(ppath):
    print filename + " processed."

for kkey in sorted(ips):
    if len(ips[kkey]) > 8:
        print kkey.rjust(16) + " is in " + ips[kkey]

Most of the program is a function called ingest which opens a log file, gets all the IPs in the "probed" section. Then stuffs them into a dictionary with the IP as the key, and the name of the log file(s) it was seen in as the value. Enjoy!

Valid HTML 4.01