SEARCHING FOR STRINGS IN TEXT FILES

The purpose of this is to write a tool that can search text files for a certain word. Recently I've had to write a script that searches every text file in a certain directory for instances of a username. We have 10 text files, each one a dump of usernames from our licenced accounts, and with these we can return useful data.

DIRECTORY & FILE SETUP

The first thing to do is create a new directory and add some text files. Tempting as it is to dump our whole user base online in plaintext, it's proibably better to use some dummy files. Click here to download a pre-filled directory.

This sets a function find_word() in which the user is prompted for an input. Using with open("file.txt") is a better way of opening files. The r flag specifies that it is opened in Read mode. After the user is prompted for a search string the script opens the specified text file and goes through every line, one by one, to see if it contains that target word. If it does, the script prints the line in which the word appears.

def find_word():
    choice = raw_input("Enter word : ")
    with open("file.txt", 'r') as fp:
        for line in fp:
            if choice in line:
                print line
        raw_input('Press Enter')  

find_word()

This can be easily modified to search every file within a certain directory. For this the os module needs to be imported, and the only real change to the script is an extra for loop.

import os

directory = '/path/to/directory'

def find_word():
    choice = raw_input("Enter word : ")
    for filename in os.listdir(directory):
        filepath = os.path.join(directory, filename)
        with open(filepath, 'r') as fp:
            for line in fp:
                if choice in line:
                    print filename, line
    raw_input('Press Enter')  

The directory name is specified and then passed to the os.listdir function. This gets every filename in the directory and creates a path for each.

for filename in os.listdir(directory):
        filepath = os.path.join(directory, filename)

In the first script the file to read was specified directly with "file.txt", but this time the script will open every a file from every path its created, one after the other, and go through each with the normal line check loop. It will print both the line and the file containinig the line.

Knowing that maximum available licences = 300, and with one username per line, the total number of users can be counted and subtracted from 300 giving us the available licence count. This is set inside a function available().

def available():
    for filename in os.listdir(directory):
        filepath = os.path.join(directory, filename)
        with open(filepath, 'r') as fp:
            num_lines = sum(1 for line in fp)
            if num_lines < 300:
                print filename, num_lines
                print 300-num_lines, 'licences available','(', num_lines, '/ 300',')'
                print
    raw_input('Press Enter')  

STRIPPING WHITESPACE

The above scripts only need to find a word and/or return the file in which is is located. The archive crack script, however, uses text from a file as passwords, so by stripping accidental whitespace with entry.strip('\n') it means that leading/trailing spaces are knocked off. A password entry of "Password   " or "  Password" would be stripped to "Password".

with open("words.txt", "r") as the_text:
    for entry in the_text.readlines():
        password = entry.strip('\n')

The disadvantage is that any deliberate spaces would be also included, so "this is a password" would become "thisisapassword". If the finduser() scripts were interacting with something else, or returning a value, then it would be a good idea to strip leading/trailing whitespaces with line.strip('\n')