Python module for processing a file line-by-line

Note: Since writing this post, I've learned about the fileinput module, which turns most of the following into a oneliner:

import fileinput
for line in fileinput.input():
    process(line)

It works both with stuff you pipe into the program, or if you use a filename as argument.

Read on for my old post

In this post I'll show how you could process a file, line-by-line with some "plugable" Python code (see code listings below):

cat some_file | line_by_line.py <line_proc_module> <line_proc_module> ...

In the example above, a "line proc module" is any python module that contains a function called proc_line that takes a string, and returns another string. Such as the following:

# Example of proc_line
def proc_line(line):
	return line # Hint: Do something to the line before returning

Example: Uppercasing

In this example the user has written a "line proc module" called "uppercase" (see code listings below):

$ echo "bla bla bla" | ./line_by_line.py uppercase
BLA BLA BLA

Example: Chaining

Here the user has written two "line proc modules", and chained them together (see code listings below):

echo "bla bla bla" | ./line_by_line.py uppercase leet
BL4 BL4 BL4

First "uppercase" is applied, then "leet" is applied, to each line.

Code listings

uppercase.py:

def proc_line(line):
	return line.upper()

leet.py:

def proc_line(line):
	return line.replace("A", "4").replace("a", "4")

line_by_line.py:

#!/usr/bin/python
 
import sys
 
def main(argv):
 
	usermodules = []
	if len(argv) < 2:
		print "Usage: line_by_line.py <name-of-module> ..."
		print "module most contain a function proc_line(line)"
		return 1
	try:
		for i in range(1, len(argv)):
			usermodules.append(__import__( argv[i] ))
	except:
		print 'Failed to import module "%s"' % (argv[1])
		return 1
	line = sys.stdin.readline()
	try:
		while line:
			# do something to the line and print the result
			for mod in usermodules:
				line = mod.proc_line(line)
 
			print line
 
			# fetch new line
			line = sys.stdin.readline()
	except EOFError:
		return None
	return 0
 
if __name__ == "__main__":
	main(sys.argv)

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.